Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FULL OUTER JOIN and LIMIT produces wrong results #14335

Closed
Tracked by #14008
alamb opened this issue Jan 28, 2025 · 7 comments · Fixed by #14338
Closed
Tracked by #14008

FULL OUTER JOIN and LIMIT produces wrong results #14335

alamb opened this issue Jan 28, 2025 · 7 comments · Fixed by #14338
Assignees
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Jan 28, 2025

Describe the bug

LIMITs are incorrectly pushed through FULL OUTER Joins

To Reproduce

COPY (values (1), (2), (3), (4), (5))  TO '/tmp/t1.csv' STORED AS CSV;
-- store t2 in different order so the top N rows are not the same as the top N rows of t1
COPY (values (5), (4), (3), (2), (1))  TO '/tmp/t2.csv' STORED AS CSV;

statement ok
create external table t1(a int) stored as CSV location '/tmp/t1.csv';

statement ok
create external table t2(b int) stored as CSV location '/tmp/t2.csv';

-- FULL join produces 5 rows (all have matches)
select * from t1 FULL JOIN t2 ON t1.a = t2.b;
/*
+---+---+
| a | b |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
+---+---+
*/

-- the output of this query should be two rows from the previous query, there
-- should be no nulls
select * from t1 FULL JOIN t2 ON t1.a = t2.b LIMIT 2;

/*
+------+------+
| a    | b    |
+------+------+
| 1    | NULL |
| NULL | 4    |
+------+------+
*/

Expected behavior

Any two rows from

+---+---+
| a | b |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
+---+---+

There should be no nulls introduced

Additional context

Found while working with @zhuqi-lucas on:

@alamb alamb added the bug Something isn't working label Jan 28, 2025
@alamb
Copy link
Contributor Author

alamb commented Jan 28, 2025

I think @zhuqi-lucas found the issue. This line is almost certainly wrong:

Full => (Some(limit), Some(limit)),

@alamb
Copy link
Contributor Author

alamb commented Jan 28, 2025

I wrote some tests for this here:

@alamb
Copy link
Contributor Author

alamb commented Jan 28, 2025

I don't think this is a regression (it has been like this for while)

It seems to have been introduced in

@alamb
Copy link
Contributor Author

alamb commented Jan 28, 2025

I also filed this ticket to track making it easier to test for this kind of thing

@zhuqi-lucas
Copy link
Contributor

take

@zhuqi-lucas
Copy link
Contributor

zhuqi-lucas commented Jan 28, 2025

Thank you @alamb!

I want to take this task, and for the first step, i think we should disable full outer join push down limit.

For further improvement, we can investigate if we have some cases which we can support push down limit for full outer join.

@zhuqi-lucas
Copy link
Contributor

Submitted a PR for review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants