Release 0.2.4 · xhluca/bm25s

What's Changed

Fix crash tokenizing with empty word_to_id by @mgraczyk in #72

Create nltk_stemmer.py by @aflip in #77

aa31a23: The commit primarily focused on improving the handling of unknown tokens during the tokenization and retrieval processes, enhancing error handling, and improving the logging mechanism for better debugging.

bm25s/init.py: Added checks in the get_scores_from_ids method to raise a ValueError if max_token_id exceeds the number of tokens in the index. Enhanced handling of empty queries in _get_top_k_results method by returning zero scores for all documents.
bm25s/tokenization.py: Fixed the behavior of streaming_tokenize to correctly handle the addition of new tokens and updating word_to_id, word_to_stem, and stem_to_sid.

New Contributors

@mgraczyk made their first contribution in #72
@aflip made their first contribution in #77

Full Changelog: 0.2.3...0.2.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.2.4

What's Changed

New Contributors

Contributors