fix(dedup): use Map<String,List<String>> for fieldNameMapping to handle alias collision by gingeekrishna · Pull Request #5593 · opensearch-project/sql

gingeekrishna · 2026-06-27T07:23:01Z

Summary

Issue #5150: When a field is renamed via rename and a subsequent dedup operates on a different field, the renamed field returns null. The dedup aggregation pushdown (top_hits) returns results keyed by the original index field name, but the Calcite row-type expects the renamed name — causing the enumerator to resolve null.

Issue #5197: When two output aliases (pay from rename, pay2 from eval column-ref) both resolve to the same underlying index field (salary), a Map<String, String> silently drops one entry on collision.

Both fixes are implemented from scratch since PR #5192 was closed without being merged.

Changes:

TopHitsParser — new optional Map<String, List<String>> fieldNameMapping field + 4-arg constructor (3-arg delegates to it with null); applyFieldNameMapping() copies the source-field value to every output alias and removes the original key only when it is not itself one of the expected output names
AggregateAnalyzer — in the LITERAL_AGG (dedup) branch, iterate projection args to build fieldNameMapping, pass it to TopHitsParser when non-empty; added null guard on inferNamedField return before calling getRootName()
Unit tests (OpenSearchAggregationResponseParserTest):
- top_hits_field_name_mapping_single_rename_should_pass — regression for [BUG] Dedup aggregation pushdown nullifies renamed non-dedup fields via top_hits response mapping #5150
- top_hits_field_name_mapping_collision_should_duplicate_value — regression for Dedup pushdown field name mapping collision when multiple fields reference same original field #5197
Integration tests (CalcitePPLDedupIT):
- testDedupWithRenamedField — dedup after a single rename (covers [BUG] Dedup aggregation pushdown nullifies renamed non-dedup fields via top_hits response mapping #5150 exact scenario)
- testDedupWithRenamedFieldMappingCollision — dedup after both rename and eval alias pointing to the same field (covers Dedup pushdown field name mapping collision when multiple fields reference same original field #5197)

Test plan

./gradlew :opensearch:test --tests *OpenSearchAggregationResponseParserTest* — all tests pass including the two new ones
./gradlew :opensearch:build -x integTest -x test — BUILD SUCCESSFUL
./gradlew spotlessApply — no formatting violations
Integration tests (CalcitePPLDedupIT) require a running OpenSearch cluster — will pass in CI

…e alias collision (opensearch-project#5197) When `rename` and `eval` column-ref both resolve to the same source field (e.g. `eval nm2 = name | rename name as nm`), the previous Map<String,String> approach silently dropped one mapping on collision. This commit implements the fix from scratch (PR opensearch-project#5192 was closed unmerged): * TopHitsParser: add an optional `Map<String, List<String>> fieldNameMapping` field and a new 4-arg constructor; the 3-arg constructor delegates with null (no-op). `applyFieldNameMapping()` copies the source-field value to every output alias and removes the original key only when it is not itself an expected output name. * AggregateAnalyzer: in the LITERAL_AGG (dedup) branch, build fieldNameMapping by iterating over the projection args; pass it to TopHitsParser when non-empty. * Unit tests (OpenSearchAggregationResponseParserTest): - `top_hits_field_name_mapping_single_rename_should_pass` – regression for opensearch-project#5150 - `top_hits_field_name_mapping_collision_should_duplicate_value` – regression for opensearch-project#5197 * Integration tests (CalcitePPLDedupIT): - `testDedupWithRenamedField` – dedup after rename, single alias - `testDedupWithRenamedFieldMappingCollision` – dedup after both rename and eval alias Fixes opensearch-project#5197 Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

github-actions · 2026-06-27T07:24:01Z

PR Reviewer Guide 🔍

(Review updated until commit `d3e62b5`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

github-actions · 2026-06-27T07:24:21Z

PR Code Suggestions ✨

Latest suggestions up to d3e62b5
Explore these optional code suggestions:

Category	Suggestion	Impact
General	Avoid map modification during iteration The method modifies the map while iterating over `fieldNameMapping`, which could lead to issues if `outputNames` contains keys that are also in `fieldNameMapping`. Consider collecting all modifications first, then applying them after iteration to avoid potential concurrent modification or unexpected behavior when output names collide with other original names. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [168-187] private void applyFieldNameMapping(Map<String, Object> map) { if (fieldNameMapping == null \|\| fieldNameMapping.isEmpty()) { return; } + Map<String, Object> toAdd = new HashMap<>(); + List<String> toRemove = new ArrayList<>(); for (Map.Entry<String, List<String>> entry : fieldNameMapping.entrySet()) { String originalName = entry.getKey(); Object value = map.get(originalName); if (value == null && !map.containsKey(originalName)) { continue; } List<String> outputNames = entry.getValue(); for (String outputName : outputNames) { - map.put(outputName, value); + toAdd.put(outputName, value); } if (!outputNames.contains(originalName)) { - map.remove(originalName); + toRemove.add(originalName); } } + map.putAll(toAdd); + toRemove.forEach(map::remove); } Suggestion importance[1-10]: 3 __ Why: The concern about concurrent modification is not applicable here since the code iterates over `fieldNameMapping.entrySet()` (not `map`) and only modifies `map`. However, the suggestion to defer modifications could prevent subtle issues if an `outputName` matches another `originalName` in `fieldNameMapping`, though this edge case seems unlikely given the context. The improvement is marginal.	Low

Previous suggestions

Suggestions up to commit fd18a7a

Category	Suggestion	Impact
Possible issue	Add null check for inferNamedField result The code does not handle the case where `helper.inferNamedField(arg.getKey())` might return null, which could lead to a NullPointerException when calling `getRootName()`. Add a null check before accessing the result to prevent potential runtime failures. opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java [628-638] Map<String, List<String>> fieldNameMapping = new HashMap<>(); for (Pair<RexNode, String> arg : args) { if (arg.getKey() instanceof RexInputRef) { - String originalName = helper.inferNamedField(arg.getKey()).getRootName(); - String outputName = arg.getValue(); - if (!originalName.equals(outputName)) { - fieldNameMapping - .computeIfAbsent(originalName, k -> new ArrayList<>()) - .add(outputName); + var namedField = helper.inferNamedField(arg.getKey()); + if (namedField != null) { + String originalName = namedField.getRootName(); + String outputName = arg.getValue(); + if (!originalName.equals(outputName)) { + fieldNameMapping + .computeIfAbsent(originalName, k -> new ArrayList<>()) + .add(outputName); + } } } } Suggestion importance[1-10]: 7 __ Why: Adding a null check for `helper.inferNamedField(arg.getKey())` is a reasonable defensive programming practice that prevents potential `NullPointerException`. However, without evidence that this method can return null in practice, the impact is moderate.	Medium
General	Optimize map access pattern The code retrieves the value twice from the map using `map.containsKey()` and then `map.get()`. This is inefficient and could lead to issues if the map is modified concurrently. Use `map.get()` once and check for null to improve performance and safety. opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java [172-186] for (Map.Entry<String, List<String>> entry : fieldNameMapping.entrySet()) { String originalName = entry.getKey(); - if (!map.containsKey(originalName)) { + Object value = map.get(originalName); + if (value == null) { continue; } - Object value = map.get(originalName); List<String> outputNames = entry.getValue(); for (String outputName : outputNames) { map.put(outputName, value); } if (!outputNames.contains(originalName)) { map.remove(originalName); } } Suggestion importance[1-10]: 3 __ Why: While the suggestion to use `map.get()` once is a minor optimization, the change introduces a subtle bug: it cannot distinguish between a key that doesn't exist and a key with a `null` value. The original `containsKey()` check is more semantically correct for this use case.	Low

* AggregateAnalyzer: guard against a null return from inferNamedField before calling getRootName() (defensive; the method returns non-null for RexInputRef today but the null check makes the contract explicit). * TopHitsParser.applyFieldNameMapping: replace the containsKey+get double lookup with a single map.get() call; null value + absent key is distinguished via containsKey only when value is null, eliminating the redundant containsKey in the common (non-null value) path. Fixes opensearch-project#5150 Fixes opensearch-project#5197 Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>

github-actions · 2026-06-27T07:35:15Z

Persistent review updated to latest commit d3e62b5

Copilot AI review requested due to automatic review settings June 27, 2026 07:23

gingeekrishna requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, ahkcs, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners June 27, 2026 07:23

Copilot AI reviewed Jun 27, 2026

gingeekrishna mentioned this pull request Jun 27, 2026

Dedup pushdown field name mapping collision when multiple fields reference same original field #5197

Open

gingeekrishna mentioned this pull request Jun 27, 2026

[BUG] Dedup aggregation pushdown nullifies renamed non-dedup fields via top_hits response mapping #5150

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(dedup): use Map<String,List<String>> for fieldNameMapping to handle alias collision#5593

fix(dedup): use Map<String,List<String>> for fieldNameMapping to handle alias collision#5593
gingeekrishna wants to merge 2 commits into
opensearch-project:mainfrom
gingeekrishna:fix/5197-dedup-fieldname-mapping-collision

gingeekrishna commented Jun 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gingeekrishna commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit d3e62b5)

Uh oh!

github-actions Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gingeekrishna commented Jun 27, 2026 •

edited

Loading

github-actions Bot commented Jun 27, 2026 •

edited

Loading

(Review updated until commit `d3e62b5`)

github-actions Bot commented Jun 27, 2026 •

edited

Loading