Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement script that produces stats on implementation coverage [WIP] #56

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ingomueller-net
Copy link
Collaborator

No description provided.

This PR introduces support for the `PlanVersion` message type. This is
done by introduces the `VersionAttr`, which corresponds to the `Version`
message type, which, in turn, is the type of the only field of
`PlanVersion`.

Since `PlanVersion` is a top-level message, this PR extends the import
mechanism: when reading a protobuf message, we need to know which type
it is, so when we translate a protobuf message into MLIR, we need to
specify which message type the input is. The PR, thus, adds a new flag
to the `substrait-translate` tool, together with a corresponding
top-level translation function. This also slightly changed some existing
error messages and the corresponding tests.

This PR does not yet factor out the handling of the version of the
`Plan` message type. That should follow up soon but, since the new
`VersionAttr` has a slightly different assembly format than the
corresponding part of the `PlanOp`, that change will require to modify
all test cases, which would bloat this PR.

Signed-off-by: Ingo Müller <[email protected]>
This PR adds support for the `expected_type_urls` field of the `Plan`
message. This is pretty straight-forward: the repeated field of string
becomes an `ArrayAttr` of `StringAttr`s. We don't really need this field
currently but it's the last missing field of the `Plan` message, so it
allows us to tick that message type off.

Signed-off-by: Ingo Müller <[email protected]>
This PR factors out the handling of the `shared_extension` field from
the `plan` op and adds that logic to the `project` op. This mainly
consisted of moving the import and export logic from the functions
related to the `plan` op to dedicated functions. The PR also introduces
the new `ExtensibleOpInterface` that enforces an attribute called
`advanced_extension` on the op that implement it and allows to deal with
all such ops transparently. Since that interface depends on an
attribute, the include order of the generated code of interfaces and
attributes also had to be adapted. Unfortunately, the field names in the
Substrait spec also vary (singular or plural), so the PR also introduces
some template magic to be able to deal with protobuf message types with
both spellings. With this PR, message types with an `advanced_extension`
field should be able to support it by (1) adding the
`ExtensibleOpInterface` to their traits, (2) adding an
`advanced_extension` parameter, and (3) adding that parameter to their
assembly format (although that's technically optional; otherwise, the
attribute is set through the `attributes` dictionary).
This PR adds support for the `AggregateRel` from the Substrait spec in
the form of the `aggregate` op. This is arguably the most complex op
implemented so far. It has an optional enum argument that requires
custom parsing, several optional regions that require custom parsing, an
attribute that depends on the presence and contents of the regions and
requires custom parsing to omit it in the common case, and return types
that depend on the two regions and the attribute. What's more, the
current version of the spec is such that it is almost impossibly to
interpret "grouping sets" because it relies on protobuf message
equality, which is something can protobuf does not offer. The current
implementation, thus, implements a best effort by using op equality
instead (but needs to run CSE during export to ensure op uniqueness).
Finally, the PR also extends the `call` op to represent also
`AggregateFunction` messages (in addition to `ScalarFunction` messages),
which are used by the new `aggregate` op.
This PR adds the `aggregation_phase` to the `call` op, which corresponds
to the `phase` field of the `AggregationFunction` message. The PR also
revisits the assembly format of the `aggregation_invocation` attribute
of the `call` op and now handles the two jointly such that either of the
two can be ommitted if it has the default value. Finally, the PR
revisits the LIT tests such that both aggregation detauls are tested
jointly and explicitly instead of being piggy-backed onto the remaining
tests. This is the last non-type feature we need to run TPC-H Q6 as
produced by Isthmus.
This PR implements the `extension_table` op, the MLIR equivalent of the
`ReadRel.ExtensionTable` message. Since that message uses the
`google.protobuf.Any` message type, the PR also slightly extends and
improves the handling of that message in the dialect. Finally, because
`extension_table` is the second op corresponding to a `ReadRel` case
(after `named_table`), the PR makes some effort to factor out common
logic between the two, namely, how the named structs representing the
schema of the op are handled. However, there is still some opportunity
for factoring out further that the PR does not do, such as defining a
`ReadRelInterface`.
This PR makes some clean-ups to `SubstraitEnums.td`:

* Reformat (mostly) according to `clang-format`.
* Remove unnecessary `Kind` suffix in `JoinTypeKind`.
* Move `summary` property of enum `def`s to comment.
* Rephrase those comments to make them clearer.
This PR implements the `cast` op, which corresponds to the `Cast`
expression in Substrait. This is one of the few remaining message types
we need to support to get a first TPC-H query round-tripped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant