Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce interval_year and interval_day types #48

Merged
merged 4 commits into from
Feb 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions include/substrait-mlir/Dialect/Substrait/IR/SubstraitAttrs.td
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,34 @@ def Substrait_DateAttr
let assemblyFormat = [{ `<` $value `>` }];
}

def Substrait_IntervalDaySecondAttr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same two comments here as for the other interval type.

: Substrait_StaticallyTypedAttr<"IntervalDaySecond", "interval_day_second",
"IntervalDaySecondType"> {
let summary = "Substrait interval day to second type";
let description = [{
This type represents a substrait interval day to second attribute type. Note
that this attribute does not attempt to canonicalize equivalent day-second
intervals.
}];
let parameters = (ins "int32_t":$days, "int32_t":$seconds);
let assemblyFormat = [{ `<` $days `` `d` $seconds `` `s` `>` }];
let genVerifyDecl = 1;
}

def Substrait_IntervalYearMonthAttr
: Substrait_StaticallyTypedAttr<"IntervalYearMonth", "interval_year_month",
"IntervalYearMonthType"> {
let summary = "Substrait interval year to month type";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to document a decision here. The spec says:

Usually stored as separate integers for years and months, but only the total number of months is significant, i.e. 1y 0m is considered equal to 0y 12m or 1001y -12000m.

We could deduce that we only want to store the number of months, then. However, I think that our design rationale suggest we should have two different numbers:

Every valid Substrait plan MUST be representable in the dialect.

I'd argue that two plans that differ by the representation of an interval_year_month are different plans (that always produce the same result), so we have to be able to express both of them with our dialect. I think that decision should be documented here.

We should, however, eventually implement a canonicalization pattern that converts these attributes to the representation with month in [0,12).

let description = [{
This type represents a substrait interval year to month attribute type. Note
that this attribute does not attempt to canonicalize equivalent year-month
intervals.
}];
let parameters = (ins "int32_t":$years, "int32_t":$months);
let assemblyFormat = [{ `<` $years `` `y` $months `` `m` `>` }];
let genVerifyDecl = 1;
}

def Substrait_TimeAttr
: Substrait_StaticallyTypedAttr<"Time", "time", "TimeType"> {
let summary = "Substrait time type";
Expand Down Expand Up @@ -156,6 +184,8 @@ def Substrait_AtomicAttributes {
Substrait_TimestampTzAttr, // TimestampTZ
Substrait_DateAttr, // Date
Substrait_TimeAttr, // Time
Substrait_IntervalYearMonthAttr, // IntervalYear
Substrait_IntervalDaySecondAttr, // IntervalDay
];
}

Expand Down
16 changes: 16 additions & 0 deletions include/substrait-mlir/Dialect/Substrait/IR/SubstraitTypes.td
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,20 @@ def Substrait_DateType : Substrait_Type<"Date", "date"> {
}];
}

def Substrait_IntervalDaySecondType : Substrait_Type<"IntervalDaySecond", "interval_day_second"> {
let summary = "Substrait interval day to second type";
let description = [{
This type represents a substrait interval day to second type.
}];
}

def Substrait_IntervalYearMonthType : Substrait_Type<"IntervalYearMonth", "interval_year_month"> {
let summary = "Substrait interval year to month type";
let description = [{
This type represents a substrait interval year to month type.
}];
}

def Substrait_StringType : Substrait_Type<"String", "string"> {
let summary = "Substrait string type";
let description = [{
Expand Down Expand Up @@ -79,6 +93,8 @@ def Substrait_AtomicTypes {
Substrait_TimestampTzType, // TimestampTZ
Substrait_DateType, // Date
Substrait_TimeType, // Time
Substrait_IntervalYearMonthType, // IntervalYear
Substrait_IntervalDaySecondType, // IntervalDay
];
}

Expand Down
20 changes: 20 additions & 0 deletions lib/Dialect/Substrait/IR/Substrait.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,26 @@ LogicalResult AdvancedExtensionAttr::verify(
return success();
}

LogicalResult mlir::substrait::IntervalYearMonthAttr::verify(
llvm::function_ref<mlir::InFlightDiagnostic()> emitError, int32_t year,
int32_t month) {
if (year < -100000 || year > 100000)
return emitError() << "year must be in a range of [-10,000..10,000] years";
if (month < -120000 || month > 120000)
return emitError()
<< "month must be in a range of [120,000..120,000] months";
return success();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nice! month should be checked the same way (and be in [120,000..120,000]).

}

LogicalResult mlir::substrait::IntervalDaySecondAttr::verify(
llvm::function_ref<mlir::InFlightDiagnostic()> emitError, int32_t days,
int32_t seconds) {
if (days < -3650000 || days > 3650000)
return emitError()
<< "days must be in a range of [-3,650,000..3,650,000] days";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for seconds.

return success();
}

//===----------------------------------------------------------------------===//
// Substrait enums
//===----------------------------------------------------------------------===//
Expand Down
45 changes: 45 additions & 0 deletions lib/Target/SubstraitPB/Export.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,29 @@ SubstraitExporter::exportType(Location loc, mlir::Type mlirType) {
return std::move(type);
}

// Handle interval_year.
if (mlir::isa<IntervalYearMonthType>(mlirType)) {
// TODO(ingomueller): support other nullability modes.
auto intervalYearType = std::make_unique<proto::Type::IntervalYear>();
intervalYearType->set_nullability(
Type_Nullability::Type_Nullability_NULLABILITY_REQUIRED);

auto type = std::make_unique<proto::Type>();
type->set_allocated_interval_year(intervalYearType.release());
return std::move(type);
}
// Handle interval_day.
if (mlir::isa<IntervalDaySecondType>(mlirType)) {
// TODO(ingomueller): support other nullability modes.
auto intervalDayType = std::make_unique<proto::Type::IntervalDay>();
intervalDayType->set_nullability(
Type_Nullability::Type_Nullability_NULLABILITY_REQUIRED);

auto type = std::make_unique<proto::Type>();
type->set_allocated_interval_day(intervalDayType.release());
return std::move(type);
}

// Handle tuple types.
if (auto tupleType = llvm::dyn_cast<TupleType>(mlirType)) {
auto structType = std::make_unique<proto::Type::Struct>();
Expand Down Expand Up @@ -904,6 +927,28 @@ SubstraitExporter::exportOperation(LiteralOp op) {
// `TimeType`.
else if (auto timeType = dyn_cast<TimeType>(literalType)) {
literal->set_time(mlir::cast<TimeAttr>(value).getValue());
}
// `IntervalType`'s.
else if (auto intervalType = dyn_cast<IntervalYearMonthType>(literalType)) {
auto intervalYearToMonth = std::make_unique<
::substrait::proto::Expression_Literal_IntervalYearToMonth>();
auto intervalYearMonth = mlir::cast<IntervalYearMonthAttr>(value);
int32_t intervalYear = intervalYearMonth.getYears();
int32_t intervalMonth = intervalYearMonth.getMonths();
intervalYearToMonth->set_years(intervalYear);
intervalYearToMonth->set_months(intervalMonth);
literal->set_allocated_interval_year_to_month(
intervalYearToMonth.release());
} else if (auto timeType = dyn_cast<IntervalDaySecondType>(literalType)) {
auto intervalDaytoSecond = std::make_unique<
::substrait::proto::Expression_Literal_IntervalDayToSecond>();
auto intervalDaySecond = mlir::cast<IntervalDaySecondAttr>(value);
int32_t intervalDay = intervalDaySecond.getDays();
int32_t intervalSecond = intervalDaySecond.getSeconds();
intervalDaytoSecond->set_days(intervalDay);
intervalDaytoSecond->set_seconds(intervalSecond);
literal->set_allocated_interval_day_to_second(
intervalDaytoSecond.release());
} else
op->emitOpError("has unsupported value");

Expand Down
16 changes: 16 additions & 0 deletions lib/Target/SubstraitPB/Import.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,10 @@ static mlir::FailureOr<mlir::Type> importType(MLIRContext *context,
return DateType::get(context);
case proto::Type::kTime:
return TimeType::get(context);
case proto::Type::kIntervalYear:
return IntervalYearMonthType::get(context);
case proto::Type::kIntervalDay:
return IntervalDaySecondType::get(context);
case proto::Type::kStruct: {
const proto::Type::Struct &structType = type.struct_();
llvm::SmallVector<mlir::Type> fieldTypes;
Expand Down Expand Up @@ -648,6 +652,18 @@ importLiteral(ImplicitLocOpBuilder builder,
auto attr = TimeAttr::get(context, message.time());
return builder.create<LiteralOp>(attr);
}
case Expression::Literal::LiteralTypeCase::kIntervalYearToMonth: {
auto attr = IntervalYearMonthAttr::get(
context, message.interval_year_to_month().years(),
message.interval_year_to_month().months());
return builder.create<LiteralOp>(attr);
}
case Expression::Literal::LiteralTypeCase::kIntervalDayToSecond: {
auto attr = IntervalDaySecondAttr::get(
context, message.interval_day_to_second().days(),
message.interval_day_to_second().seconds());
return builder.create<LiteralOp>(attr);
}
// TODO(ingomueller): Support more types.
default: {
const pb::FieldDescriptor *desc =
Expand Down
26 changes: 26 additions & 0 deletions test/Dialect/Substrait/literal.mlir
Original file line number Diff line number Diff line change
@@ -1,6 +1,32 @@
// RUN: substrait-opt -split-input-file %s \
// RUN: | FileCheck %s

// CHECK: substrait.plan version 0 : 42 : 1 {
// CHECK-NEXT: relation
// CHECK: %[[V0:.*]] = named_table
// CHECK-NEXT: %[[V1:.*]] = project %[[V0]] : tuple<si1> -> tuple<si1, !substrait.interval_year_month, !substrait.interval_day_second> {
// CHECK-NEXT: ^[[BB0:.*]](%[[ARG0:.*]]: tuple<si1>):
// CHECK-NEXT: %[[V2:.*]] = literal #substrait.interval_year_month<2024y 1m>{{$}}
// CHECK-NEXT: %[[V3:.*]] = literal #substrait.interval_day_second<9d 8000s>{{$}}
// CHECK-NEXT: yield %[[V2]], %[[V3]] : !substrait.interval_year_month, !substrait.interval_day_second
// CHECK-NEXT: }
// CHECK-NEXT: yield %[[V1]] : tuple<si1, !substrait.interval_year_month, !substrait.interval_day_second>

substrait.plan version 0 : 42 : 1 {
relation {
%0 = named_table @t1 as ["a"] : tuple<si1>
%1 = project %0 : tuple<si1> -> tuple<si1, !substrait.interval_year_month, !substrait.interval_day_second> {
^bb0(%arg : tuple<si1>):
%interval_year_month = literal #substrait.interval_year_month<2024y 1m>
%interval_day_second = literal #substrait.interval_day_second<9d 8000s>
yield %interval_year_month, %interval_day_second : !substrait.interval_year_month, !substrait.interval_day_second
}
yield %1 : tuple<si1, !substrait.interval_year_month, !substrait.interval_day_second>
}
}

// -----

// CHECK: substrait.plan version 0 : 42 : 1 {
// CHECK-NEXT: relation
// CHECK: %[[V0:.*]] = named_table
Expand Down
14 changes: 14 additions & 0 deletions test/Dialect/Substrait/types.mlir
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
// RUN: substrait-opt -split-input-file %s \
// RUN: | FileCheck %s

// CHECK-LABEL: substrait.plan
// CHECK: relation
// CHECK: %[[V0:.*]] = named_table @t1 as ["a", "b"] : tuple<!substrait.interval_year_month, !substrait.interval_day_second>
// CHECK-NEXT: yield %0 : tuple<!substrait.interval_year_month, !substrait.interval_day_second>

substrait.plan version 0 : 42 : 1 {
relation {
%0 = named_table @t1 as ["a", "b"] : tuple<!substrait.interval_year_month, !substrait.interval_day_second>
yield %0 : tuple<!substrait.interval_year_month, !substrait.interval_day_second>
}
}

// -----

// CHECK-LABEL: substrait.plan
// CHECK: relation
// CHECK: %[[V0:.*]] = named_table @t1 as ["a"] : tuple<!substrait.time>
Expand Down
38 changes: 38 additions & 0 deletions test/Target/SubstraitPB/Export/literal.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,44 @@
// RUN: --split-input-file --output-split-marker="# -----" \
// RUN: | FileCheck %s

// CHECK-LABEL: relations {
// CHECK-NEXT: rel {
// CHECK-NEXT: project {
// CHECK-NEXT: common {
// CHECK-NEXT: direct {
// CHECK-NEXT: }
// CHECK-NEXT: }
// CHECK-NEXT: input {
// CHECK-NEXT: read {
// CHECK: expressions {
// CHECK-NEXT: literal {
// CHECK-NEXT: interval_year_to_month {
// CHECK-NEXT: years: 2024
// CHECK-NEXT: months: 1
// CHECK-NEXT: }
// CHECK-NEXT: }
// CHECK-NEXT: }
// CHECK-NEXT: expressions {
// CHECK-NEXT: literal {
// CHECK-NEXT: interval_day_to_second {
// CHECK-NEXT: days: 9
// CHECK-NEXT: seconds: 8000

substrait.plan version 0 : 42 : 1 {
relation {
%0 = named_table @t1 as ["a"] : tuple<si1>
%1 = project %0 : tuple<si1> -> tuple<si1, !substrait.interval_year_month, !substrait.interval_day_second> {
^bb0(%arg : tuple<si1>):
%interval_year_month = literal #substrait.interval_year_month<2024y 1m>
%interval_day_second = literal #substrait.interval_day_second<9d 8000s>
yield %interval_year_month, %interval_day_second : !substrait.interval_year_month, !substrait.interval_day_second
}
yield %1 : tuple<si1, !substrait.interval_year_month, !substrait.interval_day_second>
}
}

// -----

// CHECK-LABEL: relations {
// CHECK-NEXT: rel {
// CHECK-NEXT: project {
Expand Down
31 changes: 31 additions & 0 deletions test/Target/SubstraitPB/Export/types.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,37 @@
// RUN: --split-input-file --output-split-marker="# -----" \
// RUN: | FileCheck %s

// CHECK-LABEL: relations {
// CHECK-NEXT: rel {
// CHECK-NEXT: read {
// CHECK: base_schema {
// CHECK-NEXT: names: "a"
// CHECK-NEXT: names: "b"
// CHECK-NEXT: struct {
// CHECK-NEXT: types {
// CHECK-NEXT: interval_year {
// CHECK-NEXT: nullability: NULLABILITY_REQUIRED
// CHECK-NEXT: }
// CHECK-NEXT: }
// CHECK-NEXT: types {
// CHECK-NEXT: interval_day {
// CHECK-NEXT: nullability: NULLABILITY_REQUIRED
// CHECK-NEXT: }
// CHECK-NEXT: }
// CHECK-NEXT: nullability: NULLABILITY_REQUIRED
// CHECK-NEXT: }
// CHECK-NEXT: }
// CHECK-NEXT: named_table {

substrait.plan version 0 : 42 : 1 {
relation {
%0 = named_table @t1 as ["a", "b"] : tuple<!substrait.interval_year_month, !substrait.interval_day_second>
yield %0 : tuple<!substrait.interval_year_month, !substrait.interval_day_second>
}
}

// -----

// CHECK-LABEL: relations {
// CHECK-NEXT: rel {
// CHECK-NEXT: read {
Expand Down
66 changes: 66 additions & 0 deletions test/Target/SubstraitPB/Import/literal.textpb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,72 @@
# RUN: --split-input-file="# ""-----" --output-split-marker="// -----" \
# RUN: | FileCheck %s

# CHECK: substrait.plan version 0 : 42 : 1 {
# CHECK-NEXT: relation
# CHECK: %[[V0:.*]] = named_table
# CHECK-NEXT: %[[V1:.*]] = project %[[V0]] : tuple<si1> -> tuple<si1, !substrait.interval_year_month, !substrait.interval_day_second> {
# CHECK-NEXT: ^[[BB0:.*]](%[[ARG0:.*]]: tuple<si1>):
# CHECK-NEXT: %[[V2:.*]] = literal #substrait.interval_year_month<2024y 1m>
# CHECK-NEXT: %[[V3:.*]] = literal #substrait.interval_day_second<9d 8000s>
# CHECK-NEXT: yield %[[V2]], %[[V3]] : !substrait.interval_year_month, !substrait.interval_day_second
# CHECK-NEXT: }
# CHECK-NEXT: yield %[[V1]] : tuple<si1, !substrait.interval_year_month, !substrait.interval_day_second>

relations {
rel {
project {
common {
direct {
}
}
input {
read {
common {
direct {
}
}
base_schema {
names: "a"
struct {
types {
bool {
nullability: NULLABILITY_REQUIRED
}
}
nullability: NULLABILITY_REQUIRED
}
}
named_table {
names: "t1"
}
}
}
expressions {
literal {
interval_year_to_month {
years: 2024
months: 1
}
}
}
expressions {
literal {
interval_day_to_second {
days: 9
seconds: 8000
}
}
}
}
}
}
version {
minor_number: 42
patch_number: 1
}

# -----

# CHECK: substrait.plan version 0 : 42 : 1 {
# CHECK-NEXT: relation
# CHECK: %[[V0:.*]] = named_table
Expand Down
Loading