Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark coverage for dynamic numeric range faceting #311

Open
mikemccand opened this issue Nov 4, 2024 · 1 comment
Open

Add benchmark coverage for dynamic numeric range faceting #311

mikemccand opened this issue Nov 4, 2024 · 1 comment

Comments

@mikemccand
Copy link
Owner

Lucene's dynamic numeric range faceting is a cool auto-ranging feature that looks at the distribution of values for a numeric field among all collected results and picks "good" ranges by roughly evenly distributing another field (relevance, counts) across the requested N ranges.

There are exciting optimizations happening to it recently: apache/lucene#13914

Let's get some coverage in our benchmarks, and maybe nightly benchmarks?

@houserjohn
Copy link

Adding a summary of an offline discussion:

To add comprehensive benchmarks for dynamic numeric faceting, we would also need a corpus that has "many numbers." Options include wikipedia line files (day_of_year, etc.), NYC taxis corpus, or even the OpenStreetMaps corpus (possible numeric fields). Random/synthetic datasets are discouraged because they are more likely to draw random/synthetic conclusions.

Related work: GH#325 and GH#160 both add related datasets for benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants