Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for USING to SQL unparser #11636

Merged
merged 3 commits into from
Jul 26, 2024
Merged

Conversation

wackywendell
Copy link
Contributor

@wackywendell wackywendell commented Jul 24, 2024

Which issue does this PR close?

Closes #10652.

Are these changes tested?

Yes - I added a couple example cases to the existing tests.

Are there any user-facing changes?

Only the listed change - LogicalPlans with JoinConstraint::Using can now successfully be translated back to SQL.

Other

👋 Hello! This is my first contribution to apache/datafusion; all feedback welcome! I expect there will be a bit of learning on my side as to how this community works. Thanks for any help there you can provide!

Also - this passes cargo test locally, but if someone could trigger CI please, that would be appreciated 🙇

@github-actions github-actions bot added the sql SQL Planner label Jul 24, 2024
if left_name != right_name {
// USING is only valid when the column names are the
// same, so they should never be different
return not_impl_err!("Unsupported USING with different column names");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be not_impl_err? If I understand correctly, this is an invalid state, so arguably its something like an Internal error... same for below.

@wackywendell
Copy link
Contributor Author

^ I installed the pre-commit.sh and turned on clippy and cargo fmt in my editor; caught the above in 1132489.

Speaking of which - should I squash/rebase? By default I tend not to do that on public PRs/branches, as it can be confusing for reviews.

Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wackywendell for working on this. Overall LGTM.
However, I can't help you to trigger the CI. Could @alamb or @jayzhan211 help to trigger it?

fn join_conditions_to_sql(
&self,
join_conditions: &Vec<(Expr, Expr)>,
join_conditions: &[(Expr, Expr)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- this is a nice refactoring and very nicely commented code.

Comment on lines 582 to 584
return not_impl_err!(
"Unsupported USING with different column names"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, not_impl_err is for some todo items. If this case never happens, I prefer to use internal_err here.

Suggested change
return not_impl_err!(
"Unsupported USING with different column names"
);
return internal_err!(
"Unsupported USING with different column names"
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case can happen -- like JOIN ON (t1.a = t2.b)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can happen with ON, but it can't happen with USING, right? This function is only for USING.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I thought about this some more - these two cases should be impossible for a LogicalPlan made from SQL (you can't do JOIN … USING (t1.a = t1.b), but could be created by a user directly creating a LogicalPlan::Join(Join { … }).

Given that, fallback makes sense, and I'll add a comment to explain.

@goldmedal
Copy link
Contributor

Speaking of which - should I squash/rebase? By default I tend not to do that on public PRs/branches, as it can be confusing for reviews.

Unless you think that rebase or squash makes the review easier, I don't think you need to do it. We don't have any rules requiring you to do so.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wackywendell and @goldmedal -- this is a great first PR. I left some comments

Comment on lines 582 to 584
return not_impl_err!(
"Unsupported USING with different column names"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case can happen -- like JOIN ON (t1.a = t2.b)

relation: _,
name: right_name,
}),
) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than returning "not implemented" it would be nice to just fall back to the existing code

I think you could do it with a match guard like

Suggested change
) => {
) if left_name == right_name => {

@wackywendell
Copy link
Contributor Author

Thanks all for your super-quick reviews! Working on an update now.

@alamb
Copy link
Contributor

alamb commented Jul 24, 2024

Unless you think that rebase or squash makes the review easier, I don't think you need to do it. We don't have any rules requiring you to do so.

Agreed -- the final merge into main is done with squash (so the entire PR turns into a single commit). I typically find it easier to review PRs where subsequent changes are made with new commits (rather than rebases) so it is more clear what changed

When the conditions and filters in the LogicalPlan are not in a form compatible with USING, we can instead use ON - so we do.
@wackywendell
Copy link
Contributor Author

Pushed an update - to get the fallback from USING to ON to work and be reasonably succinct, I refactored a bit. I think it will need another review now - thanks so much!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again

fn join_conditions_to_sql(
&self,
join_conditions: &Vec<(Expr, Expr)>,
join_conditions: &[(Expr, Expr)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- this is a nice refactoring and very nicely commented code.

@alamb alamb merged commit 64af410 into apache:main Jul 26, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 26, 2024

🚀

@wackywendell wackywendell deleted the unparse-using branch July 26, 2024 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support convert LogicalPlan JOIN with Using constraint to SQL String
3 participants