-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Count wildcard alias #14927
base: main
Are you sure you want to change the base?
Count wildcard alias #14927
Conversation
/// Return `self AS name` alias expression
pub fn alias(self, name: impl Into<String>) -> Expr {
Expr::Alias(Alias::new(self, None::<&str>, name.into()))
} Add duplicate name check here didn't help. datafusion/datafusion/expr/src/logical_plan/builder.rs Lines 748 to 768 in 32224b4
We have count(*) (alias) and count(1) (column name) which mismatches in |
\n| plan_type | plan |\ | ||
\n+---------------+------------------------------------------------------------------------------------------------------------+\ | ||
\n| logical_plan | Projection: t1.b, count(*) |\ | ||
\n| | Sort: count(Int64(1)) AS count(*) AS count(*) ASC NULLS LAST |\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\n| logical_plan | LeftSemi Join: |\ | ||
\n| | TableScan: t1 projection=[a, b] |\ | ||
\n| | SubqueryAlias: __correlated_sq_1 |\ | ||
\n| | Projection: |\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why empty project?
03)--SubqueryAlias: __correlated_sq_1 | ||
04)----Projection: | ||
05)------Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]] | ||
06)--------TableScan: t2 projection=[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimize_projections
remove exprs in projection but keep the empty one
[2025-02-28T06:22:26Z DEBUG datafusion_optimizer::utils] simplify_expressions:
Projection: t1.a
LeftSemi Join:
Filter: Boolean(true)
TableScan: t1
SubqueryAlias: __correlated_sq_1
Projection: count(Int64(1)) AS count(*)
Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]]
TableScan: t2
[2025-02-28T06:22:26Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'unwrap_cast_in_comparison' (pass 0)
[2025-02-28T06:22:26Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'common_sub_expression_eliminate' (pass 0)
[2025-02-28T06:22:26Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'eliminate_group_by_constant' (pass 0)
[2025-02-28T06:22:26Z DEBUG datafusion_optimizer::utils] optimize_projections:
LeftSemi Join:
Filter: Boolean(true)
TableScan: t1 projection=[a]
SubqueryAlias: __correlated_sq_1
Projection:
Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]]
TableScan: t2 projection=[]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 -- I am not sure about the column name functions, but otherwise this is looking very nice 👍
.to_string()) | ||
} | ||
|
||
/// Create count wildcard of Expr::Column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand what this function is for. It seems pretty confusing
Maye we could add a doc example or something to make it less confusing?
Likewise for the count_all_window_column
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the example is been used in test_count_wildcard_on_where_scalar_subquery
let df_results = ctx
.table("t1")
.await?
.filter(
scalar_subquery(Arc::new(
ctx.table("t2")
.await?
.filter(out_ref_col(DataType::UInt32, "t1.a").eq(col("t2.a")))?
.aggregate(vec![], vec![count_all()])?
.select(vec![count_all_column()])?
.into_unoptimized_plan(),
))
.gt(lit(ScalarValue::UInt8(Some(0)))),
)?
.select(vec![col("t1.a"), col("t1.b")])?
.explain(false, false)?
.collect()
.await?;
03)--SubqueryAlias: __correlated_sq_1 | ||
04)----Projection: | ||
05)------Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]] | ||
06)--------TableScan: t2 projection=[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird
Which issue does this PR close?
count_all()
expr_fn function now displayed ascount(1)
rather thancount(*)
#14894.Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?