ginza-4.0.0
·
287 commits
to master
since this release
ginza-4.0.0
- 2020-08-16, Chrysoberyl
- Important changes
- Replace Japanese model with
spacy.lang.ja
of spaCy v2.3- Replace values of
Token.lemma_
with the output of SudachiPy'sMorpheme.dictionary_form()
- Replace values of
- Replace ja_ginza_dict with official SudachiDict-core package
- You can delete
ja_ginza_dict
package safety
- You can delete
- Change options and misc field contents of output of command line tool
- Delete use_sentence_separator(-s) option
- NE(OntoNotes) BI labels as
B-GPE
- Add subfields: Reading, Inf(inflection) and ENE(Extended NE)
- Obsolete
Token._.*
and add some entries forDoc.user_data[]
and accessors- inflections (
ginza.inflection(Token)
) - reading_forms (
ginza.reading_form(Token)
) - bunsetu_bi_labels (
ginza.bunsetu_bi_label(Token)
) - bunsetu_position_types (
ginza.bunsetu_position_type(Token)
) - bunsetu_heads (
ginza.is_bunsetu_head(Token)
)
- inflections (
- Change pipeline architecture
- JapaneseCorrector was obsoleted
- Add CompoundSplitter and BunsetuRecognizer
- Upgrade UD_JAPANESE-BCCWJ to v2.6
- Change word2vec to chiVe mc90
- Replace Japanese model with
- API Changes
- Add bunsetu-unit APIs (
from ginza import *
)- bunsetu(Token)
- phrase(Token)
- sub_phrases(Token)
- phrases(Span)
- bunsetu_spans(Span)
- bunsetu_phrase_spans(Span)
- bunsetu_head_list(Span)
- bunsetu_head_tokens(Span)
- bunsetu_bi_labels(Span)
- bunsetu_position_types(Span)
- Add bunsetu-unit APIs (