DRILL-8543: Add Support for Materialized Views by cgivre · Pull Request #3036 · apache/drill

cgivre · 2026-02-02T17:39:55Z

DRILL-8543: Add Support for Materialized Views

Description

This PR adds materialized view support to Apache Drill, enabling users to store pre-computed query results for improved query performance.

Features

SQL Commands: CREATE [OR REPLACE] MATERIALIZED VIEW, DROP MATERIALIZED VIEW, and REFRESH MATERIALIZED VIEW
Query Rewriting: Automatic query optimization using Calcite's SubstitutionVisitor to transparently rewrite queries to use materialized views when beneficial
Parquet Storage: MV data stored as Parquet files for efficient columnar access
Metastore Integration: Optional synchronization of MV metadata to Drill Metastore (Iceberg, RDBMS, MongoDB backends)

Implementation

New SQL parser classes for MV statements
MaterializedView data model with JSON serialization (.materialized_view.drill files)
MaterializedViewHandler for CREATE/DROP/REFRESH operations
MaterializedViewRewriter for query plan substitution
DrillMaterializedViewTable implementing Calcite's TranslatableTable
Metastore API extensions: MaterializedViews interface and MaterializedViewMetadataUnit
Iceberg metastore backend implementation for MV metadata

Configuration

planner.enable_materialized_view_rewrite (default: true) - Controls automatic query rewriting

Documentation

Added docs/dev/MaterializedViews.md with complete feature documentation

Testing

Added additional unit tests.

letian-jiang

LGTM. Materialized view is a powerful feature for analytic engine. 🥳

letian-jiang

We could also add a plan-asserting test to ensure the query is correctly rewrite using MV.

cgivre · 2026-02-08T05:21:11Z

@letian-jiang I believe I addressed your review comments. Could you please mark the review as complete so we can merge?
Thanks!

letian-jiang

LGTM

rymarm

@cgivre Thank you for implementing this feature! It looks great overall. I found a few issues from my point of view - please check them out. They relate to:

MV dataStoragePath: making its format strict and accessing it from the object instead of relying on duck typing.
Using hardcoded backticks during query building.
Optimizing code syntax.

rymarm · 2026-06-19T12:25:37Z

+  /** The relative path where the materialized data is stored (typically the view name) */
+  @JsonInclude(Include.NON_NULL)
+  private String dataStoragePath;


In exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMaterializedViewTable.java you specified, that the MV data is stored within {name}_mv_data/:

* A materialized view stores: * <ul> * <li>Definition file (.materialized_view.drill) - JSON with name, SQL, schema info</li> * <li>Data directory ({name}_mv_data/) - Parquet files with pre-computed results</li> * </ul>

In the unit tests, you used the MV name for the data storage, but at the same time, you used {name}_mv_data in the following places:

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMaterializedViewTable.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/MaterializedViewRewriter.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java

We need to agree on what name pattern to use: {name}_mv_data or simply {name}.

Good catch — these were inconsistent. The data is physically written to {name}_mv_data, so that's the pattern we keep. dataStoragePath now defaults to name + DATA_DIR_SUFFIX (new constant) and is the single source of truth — every reader/writer goes through getDataStoragePath() instead of re-deriving the suffix. The stray setDataStoragePath(viewName) that stored the wrong value is gone.

rymarm · 2026-06-19T12:27:03Z

+            .map(f -> new View.Field(f.getName(), f.getType()))
+            .collect(Collectors.toList()),
+        workspaceSchemaPath,
+        name,  // data storage path defaults to view name


We need to agree on what name pattern to use by default: {name}_mv_data or simply {name}.

Settled on {name}_mv_data (where the data is actually written). The default is now name + DATA_DIR_SUFFIX and all call sites use getDataStoragePath().

rymarm · 2026-06-19T12:32:45Z

+   * We explicitly select the MV's columns to ensure proper schema matching.
+   */
+  private String buildDataScanSql() {
+    String dataTableName = materializedView.getName() + "_mv_data";


I believe materializedView.getDataStoragePath() should be used there.

Done — buildDataScanSql() now uses materializedView.getDataStoragePath().

rymarm · 2026-06-19T12:48:52Z

+        .collect(java.util.stream.Collectors.toList());
+    if (fieldNames.isEmpty()) {
+      // Fallback to SELECT * if no fields defined (shouldn't happen for non-dynamic MVs)
+      return "SELECT * FROM `" + dataTableName + "`";


Apache Drill allows to configure the identifier quote character. Won't it fail if planner.parser.quoting_identifiers is set to double quotes?
https://drill.apache.org/docs/lexical-structure#identifier-quotes

And doesn't it should include either the workspace name?

Fixed. Added quoteIdentifier(Quoting, id) which wraps using the session's configured quoting char (escaping embedded quotes); the session Quoting is threaded in from WorkspaceSchemaFactory. On the workspace-name point: this SQL is expanded against materializedView.getWorkspaceSchemaPath(), so the table is already resolved within the correct workspace and doesn't need re-qualifying here.

rymarm · 2026-06-19T12:49:33Z

+      }
+      sql.append("`").append(fieldNames.get(i)).append("`");
+    }
+    sql.append(" FROM `").append(dataTableName).append("`");


The same:

Won't it fail if planner.parser.quoting_identifiers is set to double quotes?
https://drill.apache.org/docs/lexical-structure#identifier-quotes

Same fix — now quoted via quoteIdentifier(...) using the session quoting character, so it no longer breaks under double-quote quoting.

rymarm · 2026-06-22T14:05:36Z

+    @Override
+    public void dropMaterializedView(String viewName) throws IOException {
+      Path viewPath = getMaterializedViewPath(viewName);
+      Path dataPath = getMaterializedViewDataPath(viewName);


Use materializedView.getDataStoragePath() instead.

Done. dropMaterializedView only has the name, so it reads the definition to get the data path (falling back to the default naming only if the definition is unreadable).

rymarm · 2026-06-22T14:06:04Z

+            .build(logger);
+      }
+
+      Path dataPath = getMaterializedViewDataPath(viewName);


Use materializedView.getDataStoragePath() instead.

Done — uses getMaterializedViewDataPath(materializedView) / getDataStoragePath().

rymarm · 2026-06-22T14:20:15Z

+    public CreateTableEntry createMaterializedViewDataWriter(String viewName) {
+      // Use Parquet format for storing materialized view data
+      FormatPlugin formatPlugin = plugin.getFormatPlugin("parquet");
+      if (formatPlugin == null) {
+        throw UserException.unsupportedError()
+            .message("Parquet format plugin not available for materialized view storage")
+            .build(logger);
+      }
+
+      // Store data in a directory with _mv_data suffix to avoid name collision
+      // with the materialized view lookup (which uses the same base name)
+      String dataLocation = config.getLocation() + Path.SEPARATOR + viewName + "_mv_data";
+      return new FileSystemCreateTableEntry(
+          (FileSystemConfig) plugin.getConfig(),
+          formatPlugin,
+          dataLocation,
+          Collections.emptyList(),  // No partition columns for MVs
+          StorageStrategy.DEFAULT);
+    }


I think, the method should have MaterializedView parameter and call materializedView.getDataStoragePath() to retrieve the dataLocation. Or at least this method shouldn't try to guess the data path and get the date path as argument.

Done — createMaterializedViewDataWriter now takes the MaterializedView and uses getDataStoragePath(); the interface in AbstractSchema and the handler caller were updated accordingly.

rymarm · 2026-06-22T14:23:30Z

+      for (DotDrillFile f : files) {
+        if (f.getType() == DotDrillType.MATERIALIZED_VIEW) {
+          return f.getMaterializedView(mapper);
+        }
+      }


Don't you want to replace it with a more declarative syntax to get first MATERIALIZED_VIEW:

return files.stream() .filter(f -> f.getType() == DotDrillType.MATERIALIZED_VIEW) .findFirst() .map(f -> f.getMaterializedView(mapper)) .orElse(null);

Used the declarative form, with one tweak: getMaterializedView(mapper) declares throws IOException, so it can't go inside .map(...). Kept filter(...).findFirst() and read the file just outside the stream so the checked exception still propagates.

rymarm · 2026-06-22T15:20:10Z

+          if (table instanceof DrillMaterializedViewTable) {
+            DrillMaterializedViewTable mvTable = (DrillMaterializedViewTable) table;


Use pattern matching feature: https://docs.oracle.com/en/java/javase/17/language/pattern-matching-instanceof.html. Drill use Java 17+

Done — using instanceof DrillMaterializedViewTable mvTable. Side note: the build was still pinned to -source 11 via the jdk9+ profile, which rejects this; bumped maven.compiler.* to 17 to match the existing requireJavaVersion [17,24) enforcer rule and the 17/21 CI matrix.

The MV rewriter had three bugs preventing query rewriting from working: 1. Schema discovery used lazy-loaded schema tree (always empty) - now iterates StoragePluginRegistry for FileSystemPlugin instances 2. SubstitutionVisitor arguments were swapped - now uses Calcite's RelOptMaterializations.useMaterializedViews() API which handles normalization and correct argument order internally 3. buildMvScanRel used SELECT * causing DYNAMIC_STAR type mismatch - now selects explicit columns from the MV field definitions Also adds plan verification tests to both TestMaterializedViewSupport and TestMaterializedViewRewriting to assert that query plans actually reference _mv_data (Parquet) or region.json as expected.

- Make MaterializedView.dataStoragePath the single source of truth for the data directory ({name}_mv_data) and use getDataStoragePath() everywhere instead of hardcoding the _mv_data suffix. - Quote generated identifiers using the session's configured quoting character so MV data scans work when planner.parser.quoting_identifiers is not the default backtick. - createMaterializedViewDataWriter now takes the MaterializedView object. - Simplify isTable check and use a declarative findFirst in getMaterializedView. - Use pattern-matching instanceof in RecordCollector. - Bump maven.compiler release/source/target 11 -> 17 (Drill no longer supports Java 11; matches the existing enforcer rule and CI matrix).

cgivre · 2026-06-22T21:31:13Z

@rymarm Thanks for the review. I believe I addressed all your comments.

cgivre self-assigned this Feb 2, 2026

cgivre added enhancement PRs that add a new functionality to Drill doc-impacting PRs that affect the documentation performance PRs that Improve Performance major-update labels Feb 2, 2026

letian-jiang reviewed Feb 6, 2026

View reviewed changes

letian-jiang approved these changes Feb 6, 2026

View reviewed changes

cgivre requested a review from pjfanning February 8, 2026 15:16

letian-jiang approved these changes Feb 9, 2026

View reviewed changes

cgivre requested a review from rymarm June 15, 2026 13:28

rymarm requested changes Jun 22, 2026

View reviewed changes

cgivre added 10 commits June 22, 2026 17:29

WIP

687c589

Materialized Views Working

aaeadfb

Final work

d7dbda9

Add to INFO schema

b1714fb

Fix Unit Tests

898039c

Fixed JDBC tests

b973af0

Fixed final unit test

6623be8

Addressed Review Comments

cd45ce9

cgivre force-pushed the views branch from a0eda58 to cf26170 Compare June 22, 2026 21:29

		if (table instanceof DrillMaterializedViewTable) {
		DrillMaterializedViewTable mvTable = (DrillMaterializedViewTable) table;

Uh oh!

Conversation

cgivre commented Feb 2, 2026

DRILL-8543: Add Support for Materialized Views

Description

Documentation

Testing

Uh oh!

letian-jiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

letian-jiang left a comment

Choose a reason for hiding this comment

Uh oh!

cgivre commented Feb 8, 2026

Uh oh!

letian-jiang left a comment

Choose a reason for hiding this comment

Uh oh!

rymarm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cgivre commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants