Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unparse SubqueryAlias without projections to SQL #12896

Merged
merged 8 commits into from
Oct 17, 2024

Conversation

goldmedal
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Consider the following SQL

SELECT 
  c.custkey, 
  c.name 
FROM 
  (
    SELECT 
      c_custkey as custkey, 
      c_name as name 
    FROM 
      customer
  ) c

The unoptimized plan is

Projection: c.custkey, c.name
  SubqueryAlias: c
    Projection: customer.c_custkey AS custkey, customer.c_name AS name
      TableScan: customer

However, after the optimize_projections rule, the top-level projection will be removed like

SubqueryAlias: c
  Projection: customer.c_custkey AS custkey, customer.c_name AS name
    TableScan: customer projection=[c_custkey, c_name]

The unparsing result would like

SELECT 
  customer.c_custkey AS custkey, 
  customer.c_name AS "name" 
FROM 
  (
    SELECT 
      customer.c_custkey, 
      customer.c_name 
    FROM 
      customer
  ) AS c

It's an invalid SQL because the column qualifier is the table name customer but it's in the alias c scope actually.

What changes are included in this PR?

When unparsing a SubqueryAlias without any projections, we should create another scope for the alias like

SELECT 
  * 
FROM 
  (
    SELECT 
      customer.c_custkey AS custkey, 
      customer.c_name AS "name" 
    FROM 
      (
        SELECT 
          customer.c_custkey, 
          customer.c_name 
        FROM 
          customer
      )
  ) AS c

Are these changes tested?

yes

Are there any user-facing changes?

The unparsing result is changed.

@goldmedal goldmedal changed the title Unparse SubqueryAlias without projections to SQL Unparse SubqueryAlias without projections to SQL Oct 12, 2024
@github-actions github-actions bot added the sql SQL Planner label Oct 12, 2024
@goldmedal
Copy link
Contributor Author

After this changed, I found another issue about the anonymous subquery.

SELECT 
  * 
FROM 
  (
    SELECT 
      customer.c_custkey AS custkey, 
      customer.c_name AS "name" 
    FROM 
      (
        SELECT 
          customer.c_custkey, 
          customer.c_name 
        FROM 
          customer
      )
  ) AS c

This SQL is valid for DataFusion but it isn't valid for Postgres or DuckDB. The deepest-level subquery is anonymous, but its outer projection uses its column with the qualifier customer.c_custkey AS custkey, . For DuckDB and Postgres, this kind of anonymous subquery (a.k.a unnamed subquery) is invalid.

DataFusion

DataFusion CLI v42.0.0
> create table customer(c_custkey int, c_name varchar);
0 row(s) fetched. 
Elapsed 0.009 seconds.

> SELECT 
  * 
FROM 
  (
    SELECT 
      customer.c_custkey AS custkey, 
      customer.c_name AS "name" 
    FROM 
      (
        SELECT 
          customer.c_custkey, 
          customer.c_name 
        FROM 
          customer
      )
  ) AS c;
+---------+------+
| custkey | name |
+---------+------+
+---------+------+
0 row(s) fetched. 
Elapsed 0.012 seconds.

DuckDB

D create table customer(c_custkey int, c_name varchar);
D SELECT 
    * 
  FROM 
    (
      SELECT 
        customer.c_custkey AS custkey, 
        customer.c_name AS "name" 
      FROM 
        (
          SELECT 
            customer.c_custkey, 
            customer.c_name 
          FROM 
            customer
        )
    ) AS c;
Binder Error: Referenced table "customer" not found!
Candidate tables: "unnamed_subquery"
LINE 6:       customer.c_custkey AS custkey, 

Postgres

test=# create table customer(c_custkey int, c_name varchar);
CREATE TABLE
test=#  SELECT 
    * 
  FROM 
    (
      SELECT 
        customer.c_custkey AS custkey, 
        customer.c_name AS "name" 
      FROM 
        (
          SELECT 
            customer.c_custkey, 
            customer.c_name 
          FROM 
            customer
        )
    ) AS c;
ERROR:  subquery in FROM must have an alias
LINE 9:         (
                ^
HINT:  For example, FROM (SELECT ...) [AS] foo.

For some SQL pushdown purposes, we are better off generating SQL with the common syntax. I'll file an issue and try to fix it.

@goldmedal goldmedal marked this pull request as ready for review October 12, 2024 16:42
@@ -697,7 +739,7 @@ fn test_table_scan_pushdown() -> Result<()> {
plan_to_sql(&table_scan_with_projection_alias)?;
assert_eq!(
format!("{}", table_scan_with_projection_alias),
"SELECT ta.id, ta.age FROM t1 AS ta"
"SELECT * FROM (SELECT ta.id, ta.age FROM t1 AS ta) AS ta"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not an improvement, right?
I am OK if this is necessary, but is there a different way to fix the original problem, without affecting here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @findepi. Sorry for replying so late 🙇
You're right. I shouldn't break the result of the table scan with any pushdown. I stopped projecting a wildcard if the SubqueryAlias's child is a pushdown table scan.

@goldmedal goldmedal marked this pull request as ready for review October 15, 2024 17:52
.build()?;
let table_scan_with_two_filter = plan_to_sql(&table_scan_with_two_filter)?;
assert_eq!(
format!("{}", table_scan_with_two_filter),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: format! is unnecessary, just use table_scan_with_two_filter (or table_scan_with_two_filter.to_string()) in assert

(same in other places)

Comment on lines +715 to +717
.project(vec![col("id")])?
.alias("a")?
.build()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be indented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's formatted by the cargo formatter. I think we can't control it 🤔

.project(vec![col("id")])?
.alias("a")?
.build()?;
println!("{}", plan.display_indent());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over?

if scan.projection.is_some()
|| !scan.filters.is_empty()
|| scan.fetch.is_some()
if let Some(unparsed_table_scan) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @goldmedal and @findepi for the review

@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

FYI @phillipleblanc and @sgrebnov

Copy link
Contributor

@phillipleblanc phillipleblanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR makes sense to me, thanks!

@sgrebnov
Copy link
Member

LGTM 👍

@alamb alamb merged commit 1ba1e53 into apache:main Oct 17, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 17, 2024

Thanks @sgrebnov @findepi and @phillipleblanc for the review

@goldmedal goldmedal deleted the feature/unparse-subalais branch October 18, 2024 13:40
@goldmedal
Copy link
Contributor Author

Thanks @alamb @sgrebnov @findepi @phillipleblanc !

sgrebnov pushed a commit to spiceai/datafusion that referenced this pull request Oct 23, 2024
* change pub function comment to doc

* unparse subquery alias without projections

* fix tests

* rollback the empty line

* rollback the empty line

* exclude the table_scan with pushdown case

* fmt and clippy

* simplify the ast to string and remove unused debug code
sgrebnov added a commit to spiceai/datafusion that referenced this pull request Oct 24, 2024
* init (apache#12453)

* Fix unparse table scan with the projection pushdown (apache#12534)

* unparse the projection base on the source schema

* refactor and enhance the test

* Fix unparsing OFFSET (apache#12539)

* Unparse Sort with pushdown limit to SQL string (apache#12873)

* unparse Sort with push down limit

* cargo fmt

* set query limit directly

* Unparse `SubqueryAlias` without projections to SQL (apache#12896)

* change pub function comment to doc

* unparse subquery alias without projections

* fix tests

* rollback the empty line

* rollback the empty line

* exclude the table_scan with pushdown case

* fmt and clippy

* simplify the ast to string and remove unused debug code

* enhance unparsing plan with pushdown to avoid unnamed subquery (apache#13006)

* Update

---------

Co-authored-by: Lordworms <[email protected]>
Co-authored-by: Jax Liu <[email protected]>
Co-authored-by: Justus Flerlage <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants