Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenericByteViewBuilder::with_deduplicate_strings Max Length #7187

Open
tustvold opened this issue Feb 24, 2025 · 2 comments
Open

GenericByteViewBuilder::with_deduplicate_strings Max Length #7187

tustvold opened this issue Feb 24, 2025 · 2 comments
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@tustvold
Copy link
Contributor

tustvold commented Feb 24, 2025

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently if with_deduplicate_strings is enabled, the builder will attempt to deduplicate any string longer than 12 bytes. In practice I suspect most strings that make sense to deduplicate are relatively short, e.g. <64 bytes. As such I wonder if it might make sense to add an option to configure a max deduplication length. This would potentially make enabling this behaviour by default a more of an option, without the risk of significant time being spent hashing long payloads for likely limited return.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

FYI @alamb @XiangpengHao

@tustvold tustvold added the enhancement Any new improvement worthy of a entry in the changelog label Feb 24, 2025
@alamb
Copy link
Contributor

alamb commented Feb 24, 2025

I think having a configurable behavior makes sense to me

I do think deduplicating long strings could potentially be good in some cases (e.g. repeated URLs) but I don't have a strong preference about how this should look / be handled

@alamb
Copy link
Contributor

alamb commented Feb 24, 2025

For anyone else following along, I think @tustvold is referring to StringViewBuilder::with_deduplicate_strings

@alamb alamb added the arrow Changes to the arrow crate label Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants