Add benchmark coverage for dynamic numeric range faceting #311

mikemccand · 2024-11-04T14:54:47Z

Lucene's dynamic numeric range faceting is a cool auto-ranging feature that looks at the distribution of values for a numeric field among all collected results and picks "good" ranges by roughly evenly distributing another field (relevance, counts) across the requested N ranges.

There are exciting optimizations happening to it recently: apache/lucene#13914

Let's get some coverage in our benchmarks, and maybe nightly benchmarks?

houserjohn · 2025-02-06T04:07:47Z

Adding a summary of an offline discussion:

To add comprehensive benchmarks for dynamic numeric faceting, we would also need a corpus that has "many numbers." Options include wikipedia line files (day_of_year, etc.), NYC taxis corpus, or even the OpenStreetMaps corpus (possible numeric fields). Random/synthetic datasets are discouraged because they are more likely to draw random/synthetic conclusions.

Related work: GH#325 and GH#160 both add related datasets for benchmarks.

stefanvodita mentioned this issue Feb 4, 2025

Task for dynamic ranges #334

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark coverage for dynamic numeric range faceting #311

Add benchmark coverage for dynamic numeric range faceting #311

mikemccand commented Nov 4, 2024

houserjohn commented Feb 6, 2025

Add benchmark coverage for dynamic numeric range faceting #311

Add benchmark coverage for dynamic numeric range faceting #311

Comments

mikemccand commented Nov 4, 2024

houserjohn commented Feb 6, 2025