You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently if with_deduplicate_strings is enabled, the builder will attempt to deduplicate any string longer than 12 bytes. In practice I suspect most strings that make sense to deduplicate are relatively short, e.g. <64 bytes. As such I wonder if it might make sense to add an option to configure a max deduplication length. This would potentially make enabling this behaviour by default a more of an option, without the risk of significant time being spent hashing long payloads for likely limited return.
I think having a configurable behavior makes sense to me
I do think deduplicating long strings could potentially be good in some cases (e.g. repeated URLs) but I don't have a strong preference about how this should look / be handled
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently if
with_deduplicate_strings
is enabled, the builder will attempt to deduplicate any string longer than 12 bytes. In practice I suspect most strings that make sense to deduplicate are relatively short, e.g. <64 bytes. As such I wonder if it might make sense to add an option to configure a max deduplication length. This would potentially make enabling this behaviour by default a more of an option, without the risk of significant time being spent hashing long payloads for likely limited return.Describe the solution you'd like
Describe alternatives you've considered
Additional context
FYI @alamb @XiangpengHao
The text was updated successfully, but these errors were encountered: