Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip table qualifiers from schema in UNION ALL #10707

Conversation

phillipleblanc
Copy link
Contributor

Which issue does this PR close?

Closes #10706

Rationale for this change

The schema that is the result of a UNION ALL should not have any table qualifiers, as the table information has effectively been erased and is no longer a valid reference.

Consider the following SQL:

SELECT table1.foo FROM table1
UNION ALL
SELECT table2.foo FROM table2

The logical schema from this UNION ALL should be just foo, but it is currently taking the first input's schema directly, resulting in a schema of table1.foo.

This came up as an issue when trying to validate the unparser works correctly for UNION ALL statements, which I added in #10603

Slightly modifying the above example, if I add an ORDER BY clause to the input SQL:

SELECT table1.foo FROM table1
UNION ALL
SELECT table2.foo FROM table2
ORDER BY foo

Then the resulting unparsed SQL will be:

SELECT table1.foo FROM table1
UNION ALL
SELECT table2.foo FROM table2
ORDER BY table1.foo

Because the unparser takes the schema directly from the Union node when writing out the column expressions.

What changes are included in this PR?

Fixes the LogicalPlanBuilder::union to not take the first input's schema directly, but sets it to None.

Are these changes tested?

Yes, I added a test both in the LogicalPlanBuilder and in the Unparser sql_integration test.

Are there any user-facing changes?

No

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions labels May 29, 2024
zip(left_plan.schema().iter(), right_plan.schema().iter())
.map(
|((left_qualifier, left_field), (_right_qualifier, right_field))| {
|((_, left_field), (_, right_field))| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test that failed because of this change, and looking into the history the reason this was done was to avoid the following error:

#5410

SchemaError(DuplicateUnqualifiedField { name: "a" })

So it seems that this is intentional 🤔 If this only affects the Unparser, I could try stripping the qualifier only from the Sort node if the input node is a Union, but that seems a bit hacky to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further discussion that seems to indicate that removing the qualifier is the correct behavior: #2943

I'll dig into if there is a way around the DuplicateUnqualifiedField

Copy link
Contributor Author

@phillipleblanc phillipleblanc Jun 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked on Discord why we need to enforce the DuplicateUnqualifiedField check - Postgres itself does not error if two columns have the same name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've raised #11082 as an alternative way to do this that only affects the Unparser.

@phillipleblanc
Copy link
Contributor Author

Marking as draft while I dig into the DuplicateUnqualifiedField issue

@phillipleblanc
Copy link
Contributor Author

Closing in favor of #11082

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UNION ALL should strip table identifiers in its resulting schema
1 participant