Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add aggregate op and extend functions accordingly #70

Merged
merged 2 commits into from
Feb 5, 2025

Conversation

ingomueller-net
Copy link
Collaborator

@ingomueller-net ingomueller-net commented Jan 24, 2025

This PR depends on and, therefor, includes #68.

This PR adds support for the AggregateRel from the Substrait spec in the form of the aggregate op. This is arguably the most complex op implemented so far. It has an optional enum argument that requires custom parsing, several optional regions that require custom parsing, an attribute that depends on the presence and contents of the regions and requires custom parsing to omit it in the common case, and return types that depend on the two regions and the attribute. What's more, the current version of the spec is such that it is almost impossibly to interpret "grouping sets" because it relies on protobuf message equality, which is something can protobuf does not offer. The current implementation, thus, implements a best effort by using op equality instead (but needs to run CSE during export to ensure op uniqueness). Finally, the PR also extends the call op to represent also AggregateFunction messages (in addition to ScalarFunction messages), which are used by the new aggregate op.

@ingomueller-net ingomueller-net force-pushed the aggregate-op branch 2 times, most recently from 7734b7a to 1245335 Compare January 24, 2025 12:07
@ingomueller-net ingomueller-net changed the title feat: add aggregation op and extend functions accordingly [WIP] feat: add aggrege op and extend functions accordingly Jan 24, 2025
@ingomueller-net ingomueller-net changed the title feat: add aggrege op and extend functions accordingly feat: add aggregate op and extend functions accordingly Jan 24, 2025
@ingomueller-net ingomueller-net force-pushed the aggregate-op branch 6 times, most recently from e646898 to 7206f71 Compare February 3, 2025 08:45
AggregationInvocationAttr aggregationInvocation) {
if (aggregationInvocation &&
aggregationInvocation.getValue() != AggregationInvocation::all) {
// The whitespace printed here compensates the trimming of whitespace in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't one add it there ? E.g., use " " or some such? (just OOC, not blocker)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found a way that makes me perfectly happy: if I add it there, I get two whitespaces in case this custom directive doesn't print anything. So now the declarative format suppresses a whitespace before this directive, then (1) if the directive prints something, it first prints the whitespace, then that thing, (2) otherwise it doesn't print anything, and in both cases, the declarative format adds another whitespace.

@@ -32,6 +33,30 @@ namespace pb = google::protobuf;

namespace {

// Copied from
// https://github.com/llvm/llvm-project/blob/dea33c/mlir/lib/Transforms/CSE.cpp.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just reuse it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no, at least not currently. It's in the cpp file, so we'd have to move it upstream to some header to make it accessible.

importAggregateFunction(ImplicitLocOpBuilder builder,
const AggregateFunction &message) {
MLIRContext *context = builder.getContext();
Location loc = UnknownLoc::get(context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the location from the ImplicitLocOpBuilder?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same below)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Fixed in latest commit.

// If it's a new expression, assign new reference.
if (hasInserted) {
it->second = groupingExprOps.size() - 1;
groupingExprValues.emplace_back(exprOp.value()->getResult(0));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And these will always have a single result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but currently only by convention. This is bothering for some time but I haven't gotten around to trying to improve things again. Ideally, I would like to attach the OpTrait::OneResult to the RelOpInterface and the ExpressionOpInterface but I didn't immediate manage to do that (and I am not sure if we can attach a trait to an interface). The best alternative I came up with is to add a getResult() method to the interface that simply does return getResult(0).

This PR adds support for the `AggregateRel` from the Substrait spec in
the form of the `aggregate` op. This is arguably the most complex op
implemented so far. It has an optional enum argument that requires
custom parsing, several optional regions that require custom parsing, an
attribute that depends on the presence and contents of the regions and
requires custom parsing to omit it in the common case, and return types
that depend on the two regions and the attribute. What's more, the
current version of the spec is such that it is almost impossibly to
interpret "grouping sets" because it relies on protobuf message
equality, which is something can protobuf does not offer. The current
implementation, thus, implements a best effort by using op equality
instead (but needs to run CSE during export to ensure op uniqueness).
Finally, the PR also extends the `call` op to represent also
`AggregateFunction` messages (in addition to `ScalarFunction` messages),
which are used by the new `aggregate` op.
This commit addresses a comment by @jpienaar by using the `Location`
from the `ImplicitOpLocBuilder` instead of an `UnknownLoc` wherever one
is available (including places there weren't previously affected by this
PR).

Signed-off-by: Ingo Müller <[email protected]>
@ingomueller-net ingomueller-net merged commit 411d3c3 into substrait-io:main Feb 5, 2025
7 checks passed
@ingomueller-net ingomueller-net deleted the aggregate-op branch February 5, 2025 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants