Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Utf8View for get_wider_type + binary_to_string_coercion functions #13370

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jonathanc-n
Copy link
Contributor

@jonathanc-n jonathanc-n commented Nov 12, 2024

Which issue does this PR close?

Closes #13361
Closes #13360.

Rationale for this change

What changes are included in this PR?

Added string view to functions

Are these changes tested?

Are there any user-facing changes?

@jayzhan211
Copy link
Contributor

Same with this, I would expect test that failed if the change is reverted

@@ -1550,6 +1552,62 @@ mod tests {
);
}

#[test]
fn test_get_wider_type_with_utf8view() {
assert_eq!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this kind of test, but a valid sql query that goes through the coercion function

Copy link
Contributor Author

@jonathanc-n jonathanc-n Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayzhan211 Correct me if I'm wrong, but this seems to be a problem with arrow casting from numerics to utf8view, the same can be said for this, #13366. However correct me if I'm wrong, I can't seem to find a sql query that triggers it, although i believe there should be one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems only like_coercion utilize this coercion. I think current coercion is unnecessary complex for like operator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think get_wider_type is actually only used by ArrayConcat

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean binary_to_string_coercion

@jonathanc-n jonathanc-n reopened this Nov 15, 2024
);
assert_eq!(
binary_to_string_coercion(&DataType::LargeBinary, &DataType::Utf8View),
Some(DataType::Utf8View)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incorrect, it should be LargeUtf8 I think.

(LargeBinary, Utf8) => Some(LargeUtf8),
(LargeBinary, LargeUtf8) => Some(LargeUtf8),
(LargeBinary, Utf8View) => Some(Utf8View),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, it should be LargeUtf8

Copy link
Contributor

@Omega359 Omega359 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the one bug I noted this looks fine to me. I do think however there should be a ticket filed to look at the get_wider_type function and see if it can be removed altogether as there seems to be just one use case and this may (I haven't checked but I feel it's likely) duplicate a function in arrow-cast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants