Skip to content
Jinho D. Choi edited this page Feb 12, 2019 · 25 revisions

Table of Contents

Part-of-Speech Tagging

The Penn Treebank III: Wall Street Journal (accuracy).

  • Training: sections 0-18.
  • Development: sections 19-21.
  • EvaluationL sections 22-24.

Neural Models

Model ALL OOV
Bonnet et al. (2018) 97.96 -
Akbik et al. (2018) 97.85 (±0.01) -
Ling et al. (2015) 97.78 -
Yasunaga et al. (2018) 97.58 -
Ma & Hovy (2016) 97.55 93.16
Tsuboi (2014) 97.51 91.64
Andor et al. (2016) 97.45 -
Santos & Zadrozny (2014) 97.32 89.86

Non-Neural Models

Model ALL OOV DNN
Choi (2016) 97.64 92.03
Søgaard (2011) 97.50 -
Spoustová et al. (2009) 97.44 89.92
Moore (2015) 97.36 91.09
Sun (2014) 97.36 -
Shen et. al. (2007) 97.33 89.61
Manning (2011) 97.32 90.79

Named Entity Recognition

The CoNLL'03 shared task dataset: English.

Approach F1 Score DNN
Akbik et al. (2018) 93.09 (±0.12) O
Peters et al. (2018) 92.22 (±0.10) O
Peters et al. (2017) 91.93 (±0.19) O
Chiu & Nichols (2016) 91.62 (±0.33) O
Ma & Hovy (2016) 91.21 O
Luo et al. (2015) 91.20
Suzuki et al. (2011) 91.02
Choi (2016) 91.00
Lample et al. (2016) 90.94 O
Passos et al. (2014) 90.90
Lin & Wu (2009) 90.90
Ratinov & Roth (2009) 90.57
Turian et al. (2010) 90.36

The OntoNotes 5.0 dataset for the CoNLL'12 shared task: English.

Approach F1 Score DNN
Chiu & Nichols (2016) 86.28 (±0.26) O
Durrett & Klein (2014) 84.04

Dependency Parsing

The Penn Treebank III: Wall Street Journal (attachment scores).

  • Training: sections 2-21.
  • Development: sections 22.
  • EvaluationL sections 23.
Y&M Headrules UAS LAS T/G EXT
Honnibal et al. (2013) 91.1 88.9 T
Zhang and Zhang (2015) 91.80 90.68 T
Zhang and Nivre (2011) 92.9 91.8 T
Choi and McCallum (2013) 92.96 91.93 T
Koo and Collins (2010) 93.04 - G
Ma et al. (2014) 93.06 - T
Koo et al. (2008) 93.16 - G O
Chen et al. (2009) 93.16 - G O
Li et al. (2014) 93.19 - G O
Zhou et al. (2015) 93.28 92.35 T O
Pei et. al. (2015) 93.29 92.13 G O
Carreras et al. (2008) 93.5 - G O
Chen et al. (2013) 93.77 - G O
Suzuki et al. (2009) 93.79 - G O
Suzuki et al. (2011) 94.22 - G O
Stanford UAS LAS T/G EXT
Goldberg and Nivre (2012) 90.96 88.72 T
Honnibal et al. (2013) 91.0 88.9 T
Chen and Manning (2014) 91.8 89.6 T
Zhang and Nivre (2011) 93.5 91.9 T
Kiperwasser and Goldberg (2015) 92.45 -
Dyer et. al. (2015) 93.1 90.9
Weiss et. al. (2015) 94.26 92.41

Coreference Resolution

The CoNLL 2012 shared task dataset (F1 scores).

Model MUC B3 CEAFe CoNLL
Fernandes et al. (2014) 70.51 57.58 53.86 60.65
Björkelund & Kuhn (2014) 70.72 58.58 55.61 61.63
Durrett & Klein (2014) 71.24 58.71 55.18 61.71
Martschat & Strube (2015) 72.17 59.58 55.67 62.47
Clark & Manning (2015) 72.59 60.44 56.02 63.02
Peng et al. (2015) 72.22 60.50 56.37 63.03
Wiseman et al. (2015) 72.60 60.52 57.05 63.39
Wiseman et al. (2016) 73.42 61.50 57.70 64.21
Clark & Manning (2016) 74.06 62.86 58.96 65.29

Sentiment Analysis

The Stanford Sentiment Treebank (accuracies).

Model Fine-grained Binary
Socher et al. (2013) 45.7 85.4
Kim (2014) 48.0 87.2
Kalchbrenner et al. (2014) 48.5 86.8
Le and Mikolov (2014) 48.7 87.8
Shin et al. (2016) 48.8 -
Yin and Schütze (2015) 49.6 89.4
Irsoy and Cardie (2014) 49.8 86.6
Tai et al. (2015) 51.0 88.0

AMR Parsing