Skip to content

Commit

Permalink
Add bwc layer for 'romanian' analyzer
Browse files Browse the repository at this point in the history
The 'romanian' language analyzer has been improved in Lucene 10 in two important
ways. First, the snowball stemmer has been modified to work with s-comma and t-comma characters
but only with their cedilla forms used when Romanian didn't have full Unicode
support (snowballstem/snowball#177). Second, the
analyzer now contains a normalization step to map cedilla forms to forms with comma.

In order to maintain backwards compatibility with existing indices, this change
moves the Lucene 9 stemmer over to the analysis module was a deprecated variant
and creates the analyzer for existing indices with the "old" stemmer and without
the normalization step. New indices automatically run with the improved
behaviour.
  • Loading branch information
cbuescher committed Oct 1, 2024
1 parent 552e935 commit 0c1ef66
Show file tree
Hide file tree
Showing 4 changed files with 969 additions and 7 deletions.
Loading

0 comments on commit 0c1ef66

Please sign in to comment.