Skip to content

fix(dedup): use Map<String,List<String>> for fieldNameMapping to handle alias collision#5593

Open
gingeekrishna wants to merge 2 commits into
opensearch-project:mainfrom
gingeekrishna:fix/5197-dedup-fieldname-mapping-collision
Open

fix(dedup): use Map<String,List<String>> for fieldNameMapping to handle alias collision#5593
gingeekrishna wants to merge 2 commits into
opensearch-project:mainfrom
gingeekrishna:fix/5197-dedup-fieldname-mapping-collision

Conversation

@gingeekrishna

@gingeekrishna gingeekrishna commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #5150 and #5197.

Issue #5150: When a field is renamed via rename and a subsequent dedup operates on a different field, the renamed field returns null. The dedup aggregation pushdown (top_hits) returns results keyed by the original index field name, but the Calcite row-type expects the renamed name — causing the enumerator to resolve null.

Issue #5197: When two output aliases (pay from rename, pay2 from eval column-ref) both resolve to the same underlying index field (salary), a Map<String, String> silently drops one entry on collision.

Both fixes are implemented from scratch since PR #5192 was closed without being merged.

Changes:

Test plan

  • ./gradlew :opensearch:test --tests *OpenSearchAggregationResponseParserTest* — all tests pass including the two new ones
  • ./gradlew :opensearch:build -x integTest -x test — BUILD SUCCESSFUL
  • ./gradlew spotlessApply — no formatting violations
  • Integration tests (CalcitePPLDedupIT) require a running OpenSearch cluster — will pass in CI

…e alias collision (opensearch-project#5197)

When `rename` and `eval` column-ref both resolve to the same source field (e.g.
`eval nm2 = name | rename name as nm`), the previous Map<String,String> approach
silently dropped one mapping on collision.

This commit implements the fix from scratch (PR opensearch-project#5192 was closed unmerged):

* TopHitsParser: add an optional `Map<String, List<String>> fieldNameMapping` field
  and a new 4-arg constructor; the 3-arg constructor delegates with null (no-op).
  `applyFieldNameMapping()` copies the source-field value to every output alias and
  removes the original key only when it is not itself an expected output name.

* AggregateAnalyzer: in the LITERAL_AGG (dedup) branch, build fieldNameMapping by
  iterating over the projection args; pass it to TopHitsParser when non-empty.

* Unit tests (OpenSearchAggregationResponseParserTest):
  - `top_hits_field_name_mapping_single_rename_should_pass` – regression for opensearch-project#5150
  - `top_hits_field_name_mapping_collision_should_duplicate_value` – regression for opensearch-project#5197

* Integration tests (CalcitePPLDedupIT):
  - `testDedupWithRenamedField` – dedup after rename, single alias
  - `testDedupWithRenamedFieldMappingCollision` – dedup after both rename and eval alias

Fixes opensearch-project#5197

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit d3e62b5)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to d3e62b5
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Avoid map modification during iteration

The method modifies the map while iterating over fieldNameMapping, which could lead
to issues if outputNames contains keys that are also in fieldNameMapping. Consider
collecting all modifications first, then applying them after iteration to avoid
potential concurrent modification or unexpected behavior when output names collide
with other original names.

opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [168-187]

 private void applyFieldNameMapping(Map<String, Object> map) {
   if (fieldNameMapping == null || fieldNameMapping.isEmpty()) {
     return;
   }
+  Map<String, Object> toAdd = new HashMap<>();
+  List<String> toRemove = new ArrayList<>();
   for (Map.Entry<String, List<String>> entry : fieldNameMapping.entrySet()) {
     String originalName = entry.getKey();
     Object value = map.get(originalName);
     if (value == null && !map.containsKey(originalName)) {
       continue;
     }
     List<String> outputNames = entry.getValue();
     for (String outputName : outputNames) {
-      map.put(outputName, value);
+      toAdd.put(outputName, value);
     }
     if (!outputNames.contains(originalName)) {
-      map.remove(originalName);
+      toRemove.add(originalName);
     }
   }
+  map.putAll(toAdd);
+  toRemove.forEach(map::remove);
 }
Suggestion importance[1-10]: 3

__

Why: The concern about concurrent modification is not applicable here since the code iterates over fieldNameMapping.entrySet() (not map) and only modifies map. However, the suggestion to defer modifications could prevent subtle issues if an outputName matches another originalName in fieldNameMapping, though this edge case seems unlikely given the context. The improvement is marginal.

Low

Previous suggestions

Suggestions up to commit fd18a7a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for inferNamedField result

The code does not handle the case where helper.inferNamedField(arg.getKey()) might
return null, which could lead to a NullPointerException when calling getRootName().
Add a null check before accessing the result to prevent potential runtime failures.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [628-638]

 Map<String, List<String>> fieldNameMapping = new HashMap<>();
 for (Pair<RexNode, String> arg : args) {
   if (arg.getKey() instanceof RexInputRef) {
-    String originalName = helper.inferNamedField(arg.getKey()).getRootName();
-    String outputName = arg.getValue();
-    if (!originalName.equals(outputName)) {
-      fieldNameMapping
-          .computeIfAbsent(originalName, k -> new ArrayList<>())
-          .add(outputName);
+    var namedField = helper.inferNamedField(arg.getKey());
+    if (namedField != null) {
+      String originalName = namedField.getRootName();
+      String outputName = arg.getValue();
+      if (!originalName.equals(outputName)) {
+        fieldNameMapping
+            .computeIfAbsent(originalName, k -> new ArrayList<>())
+            .add(outputName);
+      }
     }
   }
 }
Suggestion importance[1-10]: 7

__

Why: Adding a null check for helper.inferNamedField(arg.getKey()) is a reasonable defensive programming practice that prevents potential NullPointerException. However, without evidence that this method can return null in practice, the impact is moderate.

Medium
General
Optimize map access pattern

The code retrieves the value twice from the map using map.containsKey() and then
map.get(). This is inefficient and could lead to issues if the map is modified
concurrently. Use map.get() once and check for null to improve performance and
safety.

opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [172-186]

 for (Map.Entry<String, List<String>> entry : fieldNameMapping.entrySet()) {
   String originalName = entry.getKey();
-  if (!map.containsKey(originalName)) {
+  Object value = map.get(originalName);
+  if (value == null) {
     continue;
   }
-  Object value = map.get(originalName);
   List<String> outputNames = entry.getValue();
   for (String outputName : outputNames) {
     map.put(outputName, value);
   }
   if (!outputNames.contains(originalName)) {
     map.remove(originalName);
   }
 }
Suggestion importance[1-10]: 3

__

Why: While the suggestion to use map.get() once is a minor optimization, the change introduces a subtle bug: it cannot distinguish between a key that doesn't exist and a key with a null value. The original containsKey() check is more semantically correct for this use case.

Low

* AggregateAnalyzer: guard against a null return from inferNamedField
  before calling getRootName() (defensive; the method returns non-null for
  RexInputRef today but the null check makes the contract explicit).

* TopHitsParser.applyFieldNameMapping: replace the containsKey+get double
  lookup with a single map.get() call; null value + absent key is
  distinguished via containsKey only when value is null, eliminating the
  redundant containsKey in the common (non-null value) path.

Fixes opensearch-project#5150
Fixes opensearch-project#5197

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit d3e62b5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Dedup aggregation pushdown nullifies renamed non-dedup fields via top_hits response mapping

2 participants