feat: Add Native Support for In-Memory Cache#4591
Conversation
|
Hi @andygrove Could you kindly, if possible, provide any update on this? thnx. |
Hi @pchintar. I haven't had time to review yet, but I will. I am working on some more urgent items for the 0.17.0 release currently. Unfortunately we have limited review bandwidth. |
Comparison of #4569 and #4591These two PRs both close #2391 and take the same fundamental approach, so cross-linking a comparison here for visibility. Shared goal and mechanicsBoth solve the same problem: Comet does not treat Both share the same core building blocks:
Key differences
Architectural distinctionThe main difference is the integration strategy:
Suggested path forwardA strong combined outcome would pair #4591's dedicated scan operator and explicit fallback reasons with #4569's stats-based |
Which issue does this PR close?
Closes #2391 .
Rationale for this change
Comet currently has limited support for Spark's in-memory cache.
When a table is cached and later read, the cached data cannot be consumed directly by Comet operators. Instead, the execution plan falls back to Spark's cache scan path and introduces an additional
CometSparkColumnarToColumnarconversion before execution can continue in Comet.This extra conversion adds overhead to cached table scans and prevents cached data from remaining on a native Comet execution path.
This PR adds native support for in-memory cached tables so that cached data written in a Comet-compatible format can be read directly by Comet operators.
What changes are included in this PR?
This PR introduces a native cache path for in-memory cached tables behind a new configuration:
spark.comet.exec.inMemoryCache.enabledWhen enabled:
CometCachedBatch.CometInMemoryTableScanExec.CometSparkColumnarToColumnarconversion.When disabled:
How are these changes tested?
Added
CometInMemoryCacheSuitecovering:CometCachedBatchVerified with:
./mvnw -pl spark -DskipTests test-compile ./mvnw test -pl spark \ -DwildcardSuites=org.apache.comet.exec.CometInMemoryCacheSuite