Skip to content

ginza-4.0.0

Compare
Choose a tag to compare
@hiroshi-matsuda-rit hiroshi-matsuda-rit released this 16 Aug 13:15
· 287 commits to master since this release
a2cc7c4

ginza-4.0.0

  • 2020-08-16, Chrysoberyl
  • Important changes
    • Replace Japanese model with spacy.lang.ja of spaCy v2.3
      • Replace values of Token.lemma_ with the output of SudachiPy's Morpheme.dictionary_form()
    • Replace ja_ginza_dict with official SudachiDict-core package
      • You can deleteja_ginza_dict package safety
    • Change options and misc field contents of output of command line tool
      • Delete use_sentence_separator(-s) option
      • NE(OntoNotes) BI labels as B-GPE
      • Add subfields: Reading, Inf(inflection) and ENE(Extended NE)
    • Obsolete Token._.* and add some entries for Doc.user_data[] and accessors
      • inflections (ginza.inflection(Token))
      • reading_forms (ginza.reading_form(Token))
      • bunsetu_bi_labels (ginza.bunsetu_bi_label(Token))
      • bunsetu_position_types (ginza.bunsetu_position_type(Token))
      • bunsetu_heads (ginza.is_bunsetu_head(Token))
    • Change pipeline architecture
      • JapaneseCorrector was obsoleted
      • Add CompoundSplitter and BunsetuRecognizer
    • Upgrade UD_JAPANESE-BCCWJ to v2.6
    • Change word2vec to chiVe mc90
  • API Changes
    • Add bunsetu-unit APIs (from ginza import *)
      • bunsetu(Token)
      • phrase(Token)
      • sub_phrases(Token)
      • phrases(Span)
      • bunsetu_spans(Span)
      • bunsetu_phrase_spans(Span)
      • bunsetu_head_list(Span)
      • bunsetu_head_tokens(Span)
      • bunsetu_bi_labels(Span)
      • bunsetu_position_types(Span)