Skip to content

Releases: megagonlabs/ginza

ginza-2.2.1

27 Oct 22:24
b39b98b
Compare
Choose a tag to compare

ginza-2.2.1

  • 2019-10-28
  • Improvements
    • JapaneseCorrector can merge the as_* type dependencies completely
  • Bug fixes
    • command line tool failed at the specific situations

ginza-2.2.0

04 Oct 09:24
fe6b9cc
Compare
Choose a tag to compare

ginza-2.2.0

  • 2019-10-04, Ametrine
  • Important changes
    • split_mode has been set incorrectly to sudachipy.tokenizer from v2.0.0 (#43)
      • This bug caused split_mode incompatibility between the training phase and the ginza command.
      • split_mode was set to 'B' for training phase and python APIs, but 'C' for ginza command.
      • We fixed this bug by setting the default split_mode to 'C' entirely.
      • This fix may cause the word segmentation incompatibilities during upgrading GiNZA from v2.0.0 to v2.2.0.
  • New features
    • Add -f and --output-format option to ginza command:
    • Add custom token fields:
      • bunsetu_index : bunsetu index starting from 0
      • reading: reading of token (not a pronunciation)
      • sudachi: SudachiPy's morpheme instance (or its list when then tokens are gathered by JapaneseCorrector)
  • Performance improvements
    • Tokenizer
      • Use latest SudachiDict (SudachiDict_core-20190927.tar.gz)
      • Use Cythonized SudachiPy (v0.4.0)
    • Dependency parser
      • Apply spacy pretrain command to capture the language model from UD-Japanese BCCWJ, UD_Japanese-PUD and KWDLC.
      • Apply multitask objectives by using -pt 'tag,dep' option of spacy train
    • New model file
      • ja_ginza-2.2.0.tar.gz

ginza-2.0.0

07 Jul 14:59
Compare
Choose a tag to compare

ginza-2.0.0 (2019-07-08)

  • Add ginza command
    • run ginza from the console
  • Change package structure
    • module package as ginza
    • language model package as ja_ginza
    • spacy.lang.ja is overridden by ginza
  • Remove sudachipy related directories
    • SudachiPy and its dictionary are installed via pip during ginza installation
  • User dictionary available
  • Token extension fields
    • Added
      • token._.bunsetu_bi_label, token._.bunsetu_position_type
    • Remained
      • token._.inf
    • Removed
      • pos_detail (same value is set to token.tag_)

ginza-latest

07 Jul 15:19
Compare
Choose a tag to compare

v5.2.0

v1.0.2: Merge pull request #20 from megagonlabs/develop

07 Apr 12:09
dd2a1ea
Compare
Choose a tag to compare

Add new era 'reiwa' to system_core.dic

02 Apr 01:52
b8be4fb
Compare
Choose a tag to compare
Merge pull request #10 from megagonlabs/develop

v1.0.1

GiNZA NLP formal release version

01 Apr 05:48
ea0cfd8
Compare
Choose a tag to compare
Merge pull request #7 from megagonlabs/develop

refine MDs