Releases · megagonlabs/ginza

27 Oct 22:24

hiroshi-matsuda-rit

v2.2.1

b39b98b

ginza-2.2.1

2019-10-28
Improvements
- JapaneseCorrector can merge the as_* type dependencies completely
Bug fixes
- command line tool failed at the specific situations

Assets 3

04 Oct 09:24

hiroshi-matsuda-rit

v2.2.0

fe6b9cc

ginza-2.2.0

2019-10-04, Ametrine
Important changes
- split_mode has been set incorrectly to sudachipy.tokenizer from v2.0.0 (#43)
  - This bug caused split_mode incompatibility between the training phase and the ginza command.
  - split_mode was set to 'B' for training phase and python APIs, but 'C' for ginza command.
  - We fixed this bug by setting the default split_mode to 'C' entirely.
  - This fix may cause the word segmentation incompatibilities during upgrading GiNZA from v2.0.0 to v2.2.0.
New features
- Add -f and --output-format option to ginza command:
  - -f 0 or -f conllu : CoNLL-U Syntactic Annotation format
  - -f 1 or -f cabocha: cabocha -f1 compatible format
- Add custom token fields:
  - bunsetu_index : bunsetu index starting from 0
  - reading: reading of token (not a pronunciation)
  - sudachi: SudachiPy's morpheme instance (or its list when then tokens are gathered by JapaneseCorrector)
Performance improvements
- Tokenizer
  - Use latest SudachiDict (SudachiDict_core-20190927.tar.gz)
  - Use Cythonized SudachiPy (v0.4.0)
- Dependency parser
  - Apply spacy pretrain command to capture the language model from UD-Japanese BCCWJ, UD_Japanese-PUD and KWDLC.
  - Apply multitask objectives by using -pt 'tag,dep' option of spacy train
- New model file
  - ja_ginza-2.2.0.tar.gz

Assets 5

07 Jul 14:59

hiroshi-matsuda-rit

v2.0.0

a67bbac

ginza-2.0.0

ginza-2.0.0 (2019-07-08)

Add ginza command
- run ginza from the console
Change package structure
- module package as ginza
- language model package as ja_ginza
- spacy.lang.ja is overridden by ginza
Remove sudachipy related directories
- SudachiPy and its dictionary are installed via pip during ginza installation
User dictionary available
- See Customized dictionary - SudachiPy
Token extension fields
- Added
  - token._.bunsetu_bi_label, token._.bunsetu_position_type
- Remained
  - token._.inf
- Removed
  - pos_detail (same value is set to token.tag_)

Assets 7

07 Jul 15:19

hiroshi-matsuda-rit

latest

a67bbac

ginza-latest

v5.2.0

Assets 14

07 Apr 12:09

hiroshi-matsuda-rit

v1.0.2

dd2a1ea

v1.0.2: Merge pull request #20 from megagonlabs/develop

v1.0.2

Assets 3

02 Apr 01:52

hiroshi-matsuda-rit

v1.0.1

b8be4fb

Add new era 'reiwa' to system_core.dic

Merge pull request #10 from megagonlabs/develop

v1.0.1

Assets 3

01 Apr 05:48

hiroshi-matsuda-rit

v1.0.0

ea0cfd8

GiNZA NLP formal release version

Merge pull request #7 from megagonlabs/develop

refine MDs

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ginza-2.2.1

ginza-2.2.0

ginza-2.0.0 (2019-07-08)

Releases: megagonlabs/ginza

ginza-2.2.1

ginza-2.2.1

ginza-2.2.0

ginza-2.2.0

ginza-2.0.0

ginza-2.0.0 (2019-07-08)

ginza-latest

v1.0.2: Merge pull request #20 from megagonlabs/develop

Add new era 'reiwa' to system_core.dic

GiNZA NLP formal release version