Releases: megagonlabs/ginza
Releases · megagonlabs/ginza
ginza-2.2.1
ginza-2.2.1
- 2019-10-28
- Improvements
- JapaneseCorrector can merge the
as_*
type dependencies completely
- JapaneseCorrector can merge the
- Bug fixes
- command line tool failed at the specific situations
ginza-2.2.0
ginza-2.2.0
- 2019-10-04, Ametrine
- Important changes
split_mode
has been set incorrectly to sudachipy.tokenizer from v2.0.0 (#43)- This bug caused
split_mode
incompatibility between the training phase and theginza
command. split_mode
was set to 'B' for training phase and python APIs, but 'C' forginza
command.- We fixed this bug by setting the default
split_mode
to 'C' entirely. - This fix may cause the word segmentation incompatibilities during upgrading GiNZA from v2.0.0 to v2.2.0.
- This bug caused
- New features
- Add
-f
and--output-format
option toginza
command:-f 0
or-f conllu
: CoNLL-U Syntactic Annotation format-f 1
or-f cabocha
: cabocha -f1 compatible format
- Add custom token fields:
bunsetu_index
: bunsetu index starting from 0reading
: reading of token (not a pronunciation)sudachi
: SudachiPy's morpheme instance (or its list when then tokens are gathered by JapaneseCorrector)
- Add
- Performance improvements
- Tokenizer
- Use latest SudachiDict (SudachiDict_core-20190927.tar.gz)
- Use Cythonized SudachiPy (v0.4.0)
- Dependency parser
- Apply
spacy pretrain
command to capture the language model from UD-Japanese BCCWJ, UD_Japanese-PUD and KWDLC. - Apply multitask objectives by using
-pt 'tag,dep'
option ofspacy train
- Apply
- New model file
- ja_ginza-2.2.0.tar.gz
- Tokenizer
ginza-2.0.0
ginza-2.0.0 (2019-07-08)
- Add
ginza
command- run
ginza
from the console
- run
- Change package structure
- module package as
ginza
- language model package as
ja_ginza
spacy.lang.ja
is overridden byginza
- module package as
- Remove
sudachipy
related directories- SudachiPy and its dictionary are installed via
pip
duringginza
installation
- SudachiPy and its dictionary are installed via
- User dictionary available
- Token extension fields
- Added
token._.bunsetu_bi_label
,token._.bunsetu_position_type
- Remained
token._.inf
- Removed
pos_detail
(same value is set totoken.tag_
)
- Added
ginza-latest
v5.2.0
v1.0.2: Merge pull request #20 from megagonlabs/develop
Add new era 'reiwa' to system_core.dic
Merge pull request #10 from megagonlabs/develop v1.0.1
GiNZA NLP formal release version
Merge pull request #7 from megagonlabs/develop refine MDs