Include digits and update unicode regex generation #115
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
unicode-8.0.0
package has been deprecated for a while. The README also recommends to useregenerate
to make regexes, which is much nicer than the way we were doing it before.But also, a persistent annoyance with lunr-languages was that numbers were missing from
wordCharacters
in all the Latin and Cyrillic-based languages, while they are present in the defaultwordCharacters
. (also, Indic-Arabic numerals are present for Arabic, Hindi, etc...). So this adds them back, thus fixing #66 and maybe some other bugs.The problem of the trimmer not being run in the search pipeline persists but that's a lunr.js bug :) at least now things like "HAL9000" wil get indexed.