Make CANDIDATES_THRESHOLD configurable without recompilation #327
Replies: 3 comments 1 reply
-
Hi @bb 👋 Wouldn't it be easier to use a parameter at search to force MeiliSearch to take more time to return exhaustive results/facets? With something like This way it avoids having to deal with an env var to force it every time and it can be done on-demand in specific cases in addition to facilitating the use for everyone. 🤓 This is just an idea! Thank you! |
Beta Was this translation helpful? Give feedback.
-
Hi @gmourier, yes that would probably be easier for users and I'd prefer it. Except if it takes a year longer to get it then 😃 |
Beta Was this translation helpful? Give feedback.
-
Hello! I moved this feature suggestion in the right repo! :) |
Beta Was this translation helpful? Give feedback.
-
This follows up on a discussion in Slack: https://meilicommunity.slack.com/archives/CP9DVS1RQ/p1637926121296900?thread_ts=1637924606.295000&cid=CP9DVS1RQ
Currently exhaustiveNbHits and exhaustiveFacetsCount are not yet implemented. I'm not aware of the nbHits algorithm, however for the facets, there seem to be two:
facet_distribution_from_documents
andfacet_numbers_distribution_from_facet_levels
.The former is taken when the number of candidates is below the threshold the latter above the threshold: https://github.com/meilisearch/milli/blob/1541bce952f5913a2820e66498630720f0851bd4/milli/src/search/facet/facet_distribution.rs#L189.
If I understand correctly, the
from_documents
version produces exact results (and thus is exhaustive) while thefrom_facet_levels
might not provide exact results (and thus might not be exhaustive).In my dataset I currently have ~6000 documents and would very much prefer having exact facet counts, even if I might lose a bit of speed. Being only a bit over the current
CANDIDATES_THRESHOLD
, I'd appreciate if I could run MS so that always thefrom_documents
implementation is used.I assume that the current threshold value of 3000 was determined experimentally or by some approximation and thus could be configurable. Thus, I'd like to be able the
CANDIDATES_THRESHOLD
via environment variable. In my specific case, I could then set it to e.g. 10000 and be sure that always the exact algorithm is used.h3. Optional second part
When the exact implementation is used,
exhaustiveFacetsCount
could be set totrue
.Beta Was this translation helpful? Give feedback.
All reactions