Releases: rapidfuzz/RapidFuzz
Releases · rapidfuzz/RapidFuzz
Release 1.0.0
Changed
- all normalized string_metrics can now be used as scorer for process.extract/extractOne
- Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future.
- increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future
- improved docstrings of functions
Performance
- Added bit-parallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2).
- Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bit-parallel implementation.
- Improved performance of
fuzz.partial_ratio
-> Sincefuzz.ratio
andfuzz.partial_ratio
are used in most scorers, this improves the overall performance. - Improved performance of
process.extract
andprocess.extractOne
Deprecated
- the
rapidfuzz.levenshtein
module is now deprecated and will be removed in v2.0.0
These functions are now placed inrapidfuzz.string_metric
.distance
,normalized_distance
,weighted_distance
andweighted_normalized_distance
are combined intolevenshtein
andnormalized_levenshtein
.
Added
- added normalized version of the hamming distance in
string_metric.normalized_hamming
- process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff
Fixed
- multiple bugs in extractOne when used with a scorer, that's not from RapidFuzz
- fixed bug in
token_ratio
- fixed bug in result normalization causing zero division
Release 0.14.2
Fixed
- utf8 usage in the copyright header caused problems with python2.7 on some platforms (see #70)
Release 0.14.1
Fixed
- when a custom processor like
lambda s: s
was used with any of the methods inside fuzz.* it always returned a score of 100. This release fixes this and adds a better test coverage to prevent this bug in the future.
Release 0.14.0
Added
- added hamming distance metric in the levenshtein module
Performance
- improved performance of default_process by using lookup table
Release 0.13.4
Fixed
- Add missing virtual destructor that caused a segmentation fault on Mac Os
Release 0.13.3
Added
- C++11 Support
- manylinux
Release 0.13.2
Fixed
- Levenshtein was not imported from __init__
- The reference count of a Python Object inside process.extractOne was decremented to early
Release 0.13.1
Performance
- process.extractOne exits early when a score of 100 is found. This way the other strings do not have to be preprocessed anymore.
Release 0.13.0
Fixed
- string objects passed to scorers had to be strings even before preprocessing them. This was changed, so they only have to be strings after preprocessing similar to process.extract/process.extractOne
Performance
- process.extractOne is now implemented in C++ making it a lot faster
- When token_sort_ratio or partial_token_sort ratio is used inprocess.extractOne the words in the query are only sorted once to improve the runtime
Changed
- process.extractOne/process.extract do now return the index of the match, when the choices are a list.
Removed
- process.extractIndices got removed, since the indices are now already returned by process.extractOne/process.extract
Release 0.12.5
Fixed
- fix documentation of process.extractOne (see #48)