Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unparse SubqueryAlias without projections to SQL #12896

Merged
merged 8 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions datafusion/sql/src/unparser/plan.rs
Original file line number Diff line number Diff line change
Expand Up @@ -496,6 +496,11 @@ impl Unparser<'_> {
plan,
Some(plan_alias.alias.clone()),
)?;
if !select.already_projected() {
select.projection(vec![ast::SelectItem::Wildcard(
ast::WildcardAdditionalOptions::default(),
)]);
}
if !columns.is_empty()
&& !self.dialect.supports_column_alias_in_table_alias()
{
Expand Down
93 changes: 47 additions & 46 deletions datafusion/sql/src/unparser/rewrite.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,25 +101,25 @@ fn rewrite_sort_expr_for_union(exprs: Vec<SortExpr>) -> Result<Vec<SortExpr>> {
Ok(sort_exprs)
}

// Rewrite logic plan for query that order by columns are not in projections
// Plan before rewrite:
//
// Projection: j1.j1_string, j2.j2_string
// Sort: j1.j1_id DESC NULLS FIRST, j2.j2_id DESC NULLS FIRST
// Projection: j1.j1_string, j2.j2_string, j1.j1_id, j2.j2_id
// Inner Join: Filter: j1.j1_id = j2.j2_id
// TableScan: j1
// TableScan: j2
//
// Plan after rewrite
//
// Sort: j1.j1_id DESC NULLS FIRST, j2.j2_id DESC NULLS FIRST
// Projection: j1.j1_string, j2.j2_string
// Inner Join: Filter: j1.j1_id = j2.j2_id
// TableScan: j1
// TableScan: j2
//
// This prevents the original plan generate query with derived table but missing alias.
/// Rewrite logic plan for query that order by columns are not in projections
/// Plan before rewrite:
///
/// Projection: j1.j1_string, j2.j2_string
/// Sort: j1.j1_id DESC NULLS FIRST, j2.j2_id DESC NULLS FIRST
/// Projection: j1.j1_string, j2.j2_string, j1.j1_id, j2.j2_id
/// Inner Join: Filter: j1.j1_id = j2.j2_id
/// TableScan: j1
/// TableScan: j2
///
/// Plan after rewrite
///
/// Sort: j1.j1_id DESC NULLS FIRST, j2.j2_id DESC NULLS FIRST
/// Projection: j1.j1_string, j2.j2_string
/// Inner Join: Filter: j1.j1_id = j2.j2_id
/// TableScan: j1
/// TableScan: j2
///
/// This prevents the original plan generate query with derived table but missing alias.
pub(super) fn rewrite_plan_for_sort_on_non_projected_fields(
p: &Projection,
) -> Option<LogicalPlan> {
Expand Down Expand Up @@ -191,33 +191,33 @@ pub(super) fn rewrite_plan_for_sort_on_non_projected_fields(
}
}

// This logic is to work out the columns and inner query for SubqueryAlias plan for both types of
// subquery
// - `(SELECT column_a as a from table) AS A`
// - `(SELECT column_a from table) AS A (a)`
//
// A roundtrip example for table alias with columns
//
// query: SELECT id FROM (SELECT j1_id from j1) AS c (id)
//
// LogicPlan:
// Projection: c.id
// SubqueryAlias: c
// Projection: j1.j1_id AS id
// Projection: j1.j1_id
// TableScan: j1
//
// Before introducing this logic, the unparsed query would be `SELECT c.id FROM (SELECT j1.j1_id AS
// id FROM (SELECT j1.j1_id FROM j1)) AS c`.
// The query is invalid as `j1.j1_id` is not a valid identifier in the derived table
// `(SELECT j1.j1_id FROM j1)`
//
// With this logic, the unparsed query will be:
// `SELECT c.id FROM (SELECT j1.j1_id FROM j1) AS c (id)`
//
// Caveat: this won't handle the case like `select * from (select 1, 2) AS a (b, c)`
// as the parser gives a wrong plan which has mismatch `Int(1)` types: Literal and
// Column in the Projections. Once the parser side is fixed, this logic should work
/// This logic is to work out the columns and inner query for SubqueryAlias plan for both types of
/// subquery
/// - `(SELECT column_a as a from table) AS A`
/// - `(SELECT column_a from table) AS A (a)`
///
/// A roundtrip example for table alias with columns
///
/// query: SELECT id FROM (SELECT j1_id from j1) AS c (id)
///
/// LogicPlan:
/// Projection: c.id
/// SubqueryAlias: c
/// Projection: j1.j1_id AS id
/// Projection: j1.j1_id
/// TableScan: j1
///
/// Before introducing this logic, the unparsed query would be `SELECT c.id FROM (SELECT j1.j1_id AS
/// id FROM (SELECT j1.j1_id FROM j1)) AS c`.
/// The query is invalid as `j1.j1_id` is not a valid identifier in the derived table
/// `(SELECT j1.j1_id FROM j1)`
///
/// With this logic, the unparsed query will be:
/// `SELECT c.id FROM (SELECT j1.j1_id FROM j1) AS c (id)`
///
/// Caveat: this won't handle the case like `select * from (select 1, 2) AS a (b, c)`
/// as the parser gives a wrong plan which has mismatch `Int(1)` types: Literal and
/// Column in the Projections. Once the parser side is fixed, this logic should work
pub(super) fn subquery_alias_inner_query_and_columns(
subquery_alias: &datafusion_expr::SubqueryAlias,
) -> (&LogicalPlan, Vec<Ident>) {
Expand Down Expand Up @@ -330,6 +330,7 @@ fn find_projection(logical_plan: &LogicalPlan) -> Option<&Projection> {
_ => None,
}
}

/// A `TreeNodeRewriter` implementation that rewrites `Expr::Column` expressions by
/// replacing the column's name with an alias if the column exists in the provided schema.
///
Expand Down
46 changes: 44 additions & 2 deletions datafusion/sql/tests/cases/plan_to_sql.rs
Original file line number Diff line number Diff line change
Expand Up @@ -666,6 +666,48 @@ where
assert_eq!(roundtrip_statement.to_string(), expect);
}

#[test]
fn test_table_scan_alias() -> Result<()> {
let schema = Schema::new(vec![
Field::new("id", DataType::Utf8, false),
Field::new("age", DataType::Utf8, false),
]);

let plan = table_scan(Some("t1"), &schema, None)?
.project(vec![col("id")])?
.alias("a")?
.build()?;
let sql = plan_to_sql(&plan)?;
assert_eq!(
format!("{}", sql),
"SELECT * FROM (SELECT t1.id FROM t1) AS a"
);

let plan = table_scan(Some("t1"), &schema, None)?
.project(vec![col("id")])?
.alias("a")?
.build()?;

let sql = plan_to_sql(&plan)?;
assert_eq!(
format!("{}", sql),
"SELECT * FROM (SELECT t1.id FROM t1) AS a"
);

let plan = table_scan(Some("t1"), &schema, None)?
.filter(col("id").gt(lit(5)))?
.project(vec![col("id")])?
.alias("a")?
.build()?;
println!("{}", plan.display_indent());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over?

let sql = plan_to_sql(&plan)?;
assert_eq!(
format!("{}", sql),
"SELECT * FROM (SELECT t1.id FROM t1 WHERE (t1.id > 5)) AS a"
);
Ok(())
}

#[test]
fn test_table_scan_pushdown() -> Result<()> {
let schema = Schema::new(vec![
Expand Down Expand Up @@ -697,7 +739,7 @@ fn test_table_scan_pushdown() -> Result<()> {
plan_to_sql(&table_scan_with_projection_alias)?;
assert_eq!(
format!("{}", table_scan_with_projection_alias),
"SELECT ta.id, ta.age FROM t1 AS ta"
"SELECT * FROM (SELECT ta.id, ta.age FROM t1 AS ta) AS ta"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not an improvement, right?
I am OK if this is necessary, but is there a different way to fix the original problem, without affecting here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @findepi. Sorry for replying so late 🙇
You're right. I shouldn't break the result of the table scan with any pushdown. I stopped projecting a wildcard if the SubqueryAlias's child is a pushdown table scan.

);

let table_scan_with_projection_alias =
Expand All @@ -708,7 +750,7 @@ fn test_table_scan_pushdown() -> Result<()> {
plan_to_sql(&table_scan_with_projection_alias)?;
assert_eq!(
format!("{}", table_scan_with_projection_alias),
"SELECT ta.age FROM t1 AS ta"
"SELECT * FROM (SELECT ta.age FROM t1 AS ta) AS ta"
);

let table_scan_with_no_projection_alias = table_scan(Some("t1"), &schema, None)?
Expand Down