Skip to content

test(ppl): remove invalid Spark SQL assertions from correlated-aggregate subquery tests#5594

Open
gingeekrishna wants to merge 1 commit into
opensearch-project:mainfrom
gingeekrishna:fix/5470-invalid-spark-sql-in-scalar-subquery-test
Open

test(ppl): remove invalid Spark SQL assertions from correlated-aggregate subquery tests#5594
gingeekrishna wants to merge 1 commit into
opensearch-project:mainfrom
gingeekrishna:fix/5470-invalid-spark-sql-in-scalar-subquery-test

Conversation

@gingeekrishna

Copy link
Copy Markdown
Contributor

Summary

Fixes #5470.

The verifyPPLToSparkSQL assertions in testCorrelatedScalarSubqueryInWhere and testCorrelatedScalarSubqueryInSelect pin SQL that Spark 4.1 rejects. SALGRADE has no SAL or EMPNO column, so those names bind to the outer EMP table as correlated outer references per SQL-92 scoping rules. The SQL serializer correctly emits AVG(EMP.SAL) and MIN(EMP.EMPNO) inside the subquery aggregates — but Spark refuses to execute them:

[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.CORRELATED_REFERENCE]
Unsupported subquery expression: Expressions referencing the outer query are not
supported outside of WHERE/HAVING clauses: "avg(SAL) AS AVG(SAL)". SQLSTATE: 0A000

The logical plans (verified by verifyLogical) are correct and match the correlated semantics. Only the verifyPPLToSparkSQL call is misleading — verifyPPLToSparkSQL only compares the serialized SQL string; it never executes against a real Spark engine, so the test passed while pinning invalid SQL.

Changes:

  • testCorrelatedScalarSubqueryInWhere: remove verifyPPLToSparkSQL; keep verifyLogical; add comment explaining the omission
  • testCorrelatedScalarSubqueryInSelect: same treatment — same root cause (MIN(EMP.EMPNO) in aggregate)
  • Remove testCorrelatedScalarSubqueryInWhereMaxOut: it is an exact duplicate of testCorrelatedScalarSubqueryInWhere (same PPL, same expected logical plan, same invalid expected SQL)

Tests not touched: The disjunctive tests (testDisjunctiveCorrelatedScalarSubqueryInWhere*) use COUNT() which does not reference any outer column; their correlated reference lives only in the WHERE clause of the subquery. Spark supports correlated WHERE-clause references, so those verifyPPLToSparkSQL assertions remain valid and are untouched.

Test plan

  • ./gradlew :ppl:test --tests "*CalcitePPLScalarSubqueryTest*"BUILD SUCCESSFUL; all remaining tests pass
  • ./gradlew spotlessApply — no formatting violations

…ate subquery tests (opensearch-project#5470)

The verifyPPLToSparkSQL assertions in testCorrelatedScalarSubqueryInWhere and
testCorrelatedScalarSubqueryInSelect pin SQL that Spark (4.1) refuses to execute:
SALGRADE has no SAL or EMPNO column, so those references bind to the outer EMP
table as correlated outer references. The SQL serializer emits AVG(`EMP`.`SAL`)
and MIN(`EMP`.`EMPNO`) inside subquery aggregate functions. Spark rejects this with
UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.CORRELATED_REFERENCE because outer-column
references in aggregate functions are not supported outside WHERE/HAVING clauses.

Changes:
* testCorrelatedScalarSubqueryInWhere: remove verifyPPLToSparkSQL; keep verifyLogical
  (the logical plan correctly models the correlated semantics per SQL-92 scoping rules)
* testCorrelatedScalarSubqueryInSelect: same — remove verifyPPLToSparkSQL; keep verifyLogical
* Remove testCorrelatedScalarSubqueryInWhereMaxOut: exact duplicate of
  testCorrelatedScalarSubqueryInWhere (same PPL, same expected logical, same invalid SQL)

The disjunctive tests (testDisjunctiveCorrelatedScalarSubqueryInWhere*) are unaffected:
their COUNT() aggregate does not reference outer columns, so the generated Spark SQL is valid.

Fixes opensearch-project#5470

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@github-actions

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

verifyPPLToSparkSQL pins invalid Spark SQL in CalcitePPLScalarSubqueryTest.testCorrelatedScalarSubqueryInWhere

2 participants