Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce interval_year and interval_day types #48

Merged
merged 4 commits into from
Feb 19, 2025

Conversation

dshaaban01
Copy link
Collaborator

@dshaaban01 dshaaban01 commented Jan 1, 2025

This PR introduces two new atomic interval types:

  1. Interval Year to Month
    Supports a range of [-10,000..10,000] years, with month-level precision. Typically stored as separate integers for years and months (e.g., 12y 3m), but only the total number of months is considered significant. For example, 1y 0m is equivalent to 0y 12m or 1001y -12000m.
  2. Interval Day to Second
    Supports a range of [-3,650,000..3,650,000] days, with second precision. Typically stored as separate integers for days and seconds, but only the total number of seconds is considered significant. For example, 1d 0s is equivalent to 0d 86400s.

@dshaaban01 dshaaban01 force-pushed the dalia/types/intervals branch 3 times, most recently from 83fd62d to d6ac63f Compare January 1, 2025 17:47
%0 = named_table @t1 as ["a", "b"] : tuple<!substrait.interval_year, !substrait.interval_day>
yield %0 : tuple<!substrait.interval_year, !substrait.interval_day>
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: final line break.

@dshaaban01 dshaaban01 force-pushed the dalia/types/intervals branch from dd58ba1 to 296155a Compare January 9, 2025 11:51
@dshaaban01 dshaaban01 force-pushed the dalia/types/intervals branch from 82ea7b2 to 0543ca8 Compare January 16, 2025 14:16
@ingomueller-net
Copy link
Collaborator

Here as well: need to rebase again.

@ingomueller-net
Copy link
Collaborator

I think this one is next in line and needs rebasing to proceed.

@dshaaban01 dshaaban01 closed this Feb 6, 2025
@dshaaban01 dshaaban01 force-pushed the dalia/types/intervals branch from e9d2e3d to 131caf5 Compare February 6, 2025 14:07
@dshaaban01 dshaaban01 reopened this Feb 6, 2025
@dshaaban01
Copy link
Collaborator Author

Rebased and adjusted to #59. PR description also adjusted.

@dshaaban01
Copy link
Collaborator Author

@ingomueller-net

I am getting this error in CI:

duckdb.duckdb.HTTPException: HTTP Error: Failed to download extension "substrait" at URL "http://extensions.duckdb.org/v1.2.0/linux_amd64_gcc4/substrait.duckdb_extension.gz" (HTTP 403)

which is causing one test to fail:

Failed Tests (1):
  Substrait MLIR :: python/dialects/substrait/e2e_duckdb.py

I guess this is an internet connection error? Any ideas on how to fix?

@dshaaban01 dshaaban01 force-pushed the dalia/types/intervals branch 3 times, most recently from 6751b11 to 44fd3d8 Compare February 6, 2025 15:15
@ingomueller-net
Copy link
Collaborator

I ran into this this morning. This must be due to the update of DuckDB from yesterday. #79 should fix it. Rebase onto main and retry.

@dshaaban01 dshaaban01 force-pushed the dalia/types/intervals branch from 44fd3d8 to 8a6f0d8 Compare February 6, 2025 17:26
Copy link
Collaborator

@ingomueller-net ingomueller-net left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this PR again and found a few things we should think about a bit more.

Comment on lines 864 to 938
auto intervalYear =
mlir::cast<IntervalYearMonthAttr>(value).getYearsValue();
auto intervalMonth =
mlir::cast<IntervalYearMonthAttr>(value).getMonthsValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use auto only if the type appears in the RHS of a cast or similar, see LLVM Coding Standards. Also, my change factors out the cast, which makes it more readable and shorter. (Be sure to check if int32_t is the type returned by getXValue().)

Suggested change
auto intervalYear =
mlir::cast<IntervalYearMonthAttr>(value).getYearsValue();
auto intervalMonth =
mlir::cast<IntervalYearMonthAttr>(value).getMonthsValue();
auto intervalYearMonth = mlir::cast<IntervalYearMonthAttr>(value);
int32_t intervalYear = intervalYearMonth.getYearsValue();
int32_t intervalMonth = intervalYearMonth.getMonthsValue();

Comment on lines 875 to 948
auto intervalDay = mlir::cast<IntervalDaySecondAttr>(value).getDaysValue();
auto intervalSecond =
mlir::cast<IntervalDaySecondAttr>(value).getSecondsValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

def Substrait_IntervalYearMonthAttr
: Substrait_StaticallyTypedAttr<"IntervalYearMonth", "interval_year_month",
"IntervalYearMonthType"> {
let summary = "Substrait interval year to month type";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to document a decision here. The spec says:

Usually stored as separate integers for years and months, but only the total number of months is significant, i.e. 1y 0m is considered equal to 0y 12m or 1001y -12000m.

We could deduce that we only want to store the number of months, then. However, I think that our design rationale suggest we should have two different numbers:

Every valid Substrait plan MUST be representable in the dialect.

I'd argue that two plans that differ by the representation of an interval_year_month are different plans (that always produce the same result), so we have to be able to express both of them with our dialect. I think that decision should be documented here.

We should, however, eventually implement a canonicalization pattern that converts these attributes to the representation with month in [0,12).

let description = [{
This type represents a substrait interval year to month attribute type.
}];
let parameters = (ins "int32_t":$years_value, "int32_t":$months_value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec only allows "a range of [-10,000..10,000] years." If we don't add a constraint now, we should add a TODO to be sure we add it later.

@@ -67,6 +67,28 @@ def Substrait_DateAttr
let assemblyFormat = [{ `<` $value `>` }];
}

def Substrait_IntervalDaySecondAttr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same two comments here as for the other interval type.

@dshaaban01 dshaaban01 force-pushed the dalia/types/intervals branch from 8a6f0d8 to 93b561a Compare February 18, 2025 12:49
@dshaaban01
Copy link
Collaborator Author

Comments addressed.

int32_t month) {
if (year < -100000 || year > 100000)
return emitError() << "year must be in a range of [-10,000..10,000] years";
return success();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nice! month should be checked the same way (and be in [120,000..120,000]).

int32_t seconds) {
if (days < -3650000 || days > 3650000)
return emitError()
<< "days must be in a range of [-3,650,000..3,650,000] days";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for seconds.

"IntervalYearMonthType"> {
let summary = "Substrait interval year to month type";
let description = [{
This type represents a substrait interval year to month attribute type. Note that this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break at 80 characters.

let genVerifyDecl = 1;
}

def Substrait_IntervalYearMonthAttr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this and the other trailing white spaces in the file. (Consider setting "files.trimTrailingWhitespace": true in your VS Code config.)

Suggested change
def Substrait_IntervalYearMonthAttr
def Substrait_IntervalYearMonthAttr

This type represents a substrait interval day to second attribute type. Note that this
attribute does not attempt to canonicalize equivalent day-second intervals.
}];
let parameters = (ins "int32_t":$days_value, "int32_t":$seconds_value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am realizing this late but: do we need the _value suffix? It's not used in the Substrait spec and I don't see any value (ha!) in having it here...

@dshaaban01 dshaaban01 force-pushed the dalia/types/intervals branch 2 times, most recently from f8953cb to f608fd4 Compare February 19, 2025 12:38
@ingomueller-net ingomueller-net merged commit b6ae72f into substrait-io:main Feb 19, 2025
7 checks passed
@ingomueller-net
Copy link
Collaborator

Hurray! After many rounds the PR made finally it 🎉

@dshaaban01
Copy link
Collaborator Author

@ingomueller-net Did you see my comment above? We might need some more work 🫠

@ingomueller-net
Copy link
Collaborator

Which comment? The CI failure due to the DuckDB extension? That's been fixed by #79, no?

@dshaaban01
Copy link
Collaborator Author

dshaaban01 commented Feb 19, 2025

Copy/pasted comment in response to adding a constraint for seconds:

  if (seconds < -315360000000 || seconds > 315360000000)
    return emitError() << "seconds must be in a range of [-315,360,000,000.."
                          "315,360,000,000] seconds";

If I do this: 3'650'000 days * 24 hours * 60 min * 60 seconds = 315'360'000'000 seconds --> the value is larger than a int32_t max value/interval range (2,147,483,647)? Shouldn't I use an int64_t?

Screenshot 2025-02-19 at 15 21 50

This is the specification that says I should use int32_t for second and (weirdly?) microsecond precision. It's also weird because there are two different example formats in the middle and right column (middle: 1d 0s and right: 3650001d -86400s 0us)? The middle column mentions microseconds but has "s" in the format, the right column mentions seconds and microseconds - I'm confused? Could there be something wrong in the spec?

@ingomueller-net
Copy link
Collaborator

Aha, I understand the problem. (I don't see the comment, though, but not a problem.)

As for the data type: the spec for the Literal submessage uses an int32, so I'd use that as well. If that means that we cannot get out of the valid range (or even cover it), then we don't have to test anything. I'd add a comment, though, or maybe even a static_assert (in the verifier, where the test would be) to be sure we think of adding a test should we ever change the type.

As for the example in the spec: the difference between the middle and the right column is that the right column has an additional 0us, right? For consistency, I guess it would be best if the middle column had that as well. But both are just informal examples, so I'd say it's rather minor. But if you want to submit a PR that fixes the example, I am pretty sure people will appreciate 😉

@dshaaban01
Copy link
Collaborator Author

Okay cool got it :)) I will add a comment in a new PR and submit a PR to fix the example :)

Btw here's a screenshot of my comment, but it says "Pending"? So maybe by responding to your review comment, it assumed I was trying to review and because I don't have the rights to do so it doesn't display?
Screenshot 2025-02-20 at 10 20 30

@ingomueller-net
Copy link
Collaborator

I think what happened is that you "started a review" in the right-most tab and didn't submit the review. That review mode is handy if you have a few comments for the same PR, then they get sent as one big batch. But only once you submit it. Until then, the comments are still "pending."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants