Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Path Extraction #19

Merged
merged 4 commits into from
Jan 14, 2025
Merged

Implement Path Extraction #19

merged 4 commits into from
Jan 14, 2025

Conversation

rmarrowstone
Copy link
Owner

@rmarrowstone rmarrowstone commented Jan 10, 2025

This change implements support for path extraction SerDe Properties.
It uses the same ion-java-path-extraction library the Ion Hive SerDe
does. Unlike the Hive SerDe, this ensures that the "strict" and more
performant path extraction implementation is used.

I chose to use the path-extraction in the absence of any defined path
extractors. When a path extractor is defined, you have to define all
columns as extractions. With the strict implementation, the field lookup
is effectively the same as the Decoder here. So given that I would
rather cut modality unless there's a really compelling reason.

I also chose to implement support for the case-sensitive flag. I didn't
find evidence it is used, but it seems like a reasonable use of path-
extraction (disambiguate bad data). And the cost of doing it now is low
whereas the cost of hitting a need and adding it later seems painful.

@github-actions github-actions bot added the hive label Jan 10, 2025
This change implements support for path extraction SerDe Properties.
It uses the same ion-java-path-extraction library the Ion Hive SerDe
does. Unlike the Hive SerDe, this ensures that the "strict" and more
performant path extraction implementation is used.

I chose to use the path-extraction in the absence of any defined path
extractors. When a path extractor is defined, you have to define all
columns as extractions. With the strict implementation, the field lookup
is effectively the same as the Decoder here. So given that I would
rather cut modality unless there's a really compelling reason.
@rmarrowstone rmarrowstone marked this pull request as ready for review January 10, 2025 18:27
Copy link
Collaborator

@desaikd desaikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@rmarrowstone rmarrowstone merged commit 43cde92 into master Jan 14, 2025
55 of 56 checks passed
rmarrowstone added a commit that referenced this pull request Jan 14, 2025
This change implements support for path extraction SerDe Properties.
It uses the same ion-java-path-extraction library the Ion Hive SerDe
does. Unlike the Hive SerDe, this ensures that the "strict" and more
performant path extraction implementation is used.

I chose to use the path-extraction in the absence of any defined path
extractors. When a path extractor is defined, you have to define all
columns as extractions. With the strict implementation, the field lookup
is effectively the same as the Decoder here. So given that I would
rather cut modality unless there's a really compelling reason.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants