-
Notifications
You must be signed in to change notification settings - Fork 43
Benchmarks
Jinho D. Choi edited this page Feb 12, 2019
·
25 revisions
- Part-of-speech tagging.
- Named entity recognition.
- Dependency parsing.
- Coreference resolution.
- Sentiment analysis.
The Penn Treebank III: Wall Street Journal (accuracy).
- Training: sections 0-18.
- Development: sections 19-21.
- EvaluationL sections 22-24.
Model | ALL | OOV |
---|---|---|
Bonnet et al. (2018) | 97.96 | - |
Akbik et al. (2018) | 97.85 (±0.01) | - |
Ling et al. (2015) | 97.78 | - |
Yasunaga et al. (2018) | 97.58 | - |
Ma & Hovy (2016) | 97.55 | 93.16 |
Tsuboi (2014) | 97.51 | 91.64 |
Andor et al. (2016) | 97.45 | - |
Santos & Zadrozny (2014) | 97.32 | 89.86 |
- Contextual String Embeddings for Sequence Labeling, Akbik et al., COLING, 2018.
- Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings, Bonnet et al., ACL, 2018.
- Robust Multilingual Part-of-Speech Tagging via Adversarial Training, Yasunaga et al., NAACL, 2018.
- End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, Ma & Hovy, ACL, 2016.
- Globally Normalized Transition-Based Neural Networks, Andor et al., ACL, 2016.
- Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation, Ling et al., EMNLP, 2015.
- Neural Networks Leverage Corpus-wide Information for Part-of-speech Tagging, Tsuboi, EMNLP, 2014.
- Learning Character-level Representations for Part-of-Speech Tagging, Santos & Zadrozny, ICML, 2014.
Model | ALL | OOV | DNN |
---|---|---|---|
Choi (2016) | 97.64 | 92.03 | |
Søgaard (2011) | 97.50 | - | |
Spoustová et al. (2009) | 97.44 | 89.92 | |
Moore (2015) | 97.36 | 91.09 | |
Sun (2014) | 97.36 | - | |
Shen et. al. (2007) | 97.33 | 89.61 | |
Manning (2011) | 97.32 | 90.79 |
- Dynamic Feature Induction: The Last Gist to the State-of-the-Art, Choi, NAACL, 2016.
- An Improved Tag Dictionary for Faster Part-of-Speech Tagging, Moore, EMNLP, 2015.
- Structure Regularization for Structured Prediction, Sun, NIPS, 2014.
- Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?, Manning, CICLing, 2011.
- Semisupervised condensed nearest neighbor for part-of-speech tagging, Søgaard, ACL, 2011.
- Semi-supervised Training for the Averaged Perceptron POS Tagger, Spoustová et al., EACL, 2009.
- Guided Learning for Bidirectional Sequence Classification, Shen et al., ACL, 2007.
The CoNLL'03 shared task dataset: English.
Approach | F1 Score | DNN |
---|---|---|
Akbik et al. (2018) | 93.09 (±0.12) | O |
Peters et al. (2018) | 92.22 (±0.10) | O |
Peters et al. (2017) | 91.93 (±0.19) | O |
Chiu & Nichols (2016) | 91.62 (±0.33) | O |
Ma & Hovy (2016) | 91.21 | O |
Luo et al. (2015) | 91.20 | |
Suzuki et al. (2011) | 91.02 | |
Choi (2016) | 91.00 | |
Lample et al. (2016) | 90.94 | O |
Passos et al. (2014) | 90.90 | |
Lin & Wu (2009) | 90.90 | |
Ratinov & Roth (2009) | 90.57 | |
Turian et al. (2010) | 90.36 |
- Contextual String Embeddings for Sequence Labeling, Akbik et al., COLING, 2018.
- Deep contextualized word representations, Peters et al., NAACL, 2018.
- Semi-supervised sequence tagging with bidirectional language models, Peters et al., ACL, 2017.
- Named Entity Recognition with Bidirectional LSTM-CNNs, Chiu & Nichols, TACL, 2016.
- End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, Ma & Hovy, ACL, 2016.
- Neural Architectures for Named Entity Recognition, Lample et al., NAACL, 2016.
- Dynamic Feature Induction: The Last Gist to the State-of-the-Art, Choi, NAACL, 2016.
- Joint Named Entity Recognition and Disambiguation, Luo et al., EMNLP, 2015.
- Lexicon Infused Phrase Embeddings for Named Entity Resolution, Passos et al., CoNLL, 2014.
- Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning, Suzuki, ACL, 2011.
- Word Representations: A Simple and General Method for Semi-Supervised Learning, Turian et al., ACL, 2010.
- Design Challenges and Misconceptions in Named Entity Recognition, Ratinov & Roth, CoNLL, 2009.
- Phrase Clustering for Discriminative Learning, Lin & Wu, ACL, 2009.
The OntoNotes 5.0 dataset for the CoNLL'12 shared task: English.
Approach | F1 Score | DNN |
---|---|---|
Chiu & Nichols (2016) | 86.28 (±0.26) | O |
Durrett & Klein (2014) | 84.04 |
- Named Entity Recognition with Bidirectional LSTM-CNNs, Chiu & Nichols, TACL, 2016.
- A Joint Model for Entity Analysis: Coreference, Typing, and Linking, Durrett & Klein, TACL, 2014.
The Penn Treebank III: Wall Street Journal (attachment scores).
- Training: sections 2-21.
- Development: sections 22.
- EvaluationL sections 23.
Y&M Headrules | UAS | LAS | T/G | EXT |
---|---|---|---|---|
Honnibal et al. (2013) | 91.1 | 88.9 | T | |
Zhang and Zhang (2015) | 91.80 | 90.68 | T | |
Zhang and Nivre (2011) | 92.9 | 91.8 | T | |
Choi and McCallum (2013) | 92.96 | 91.93 | T | |
Koo and Collins (2010) | 93.04 | - | G | |
Ma et al. (2014) | 93.06 | - | T | |
Koo et al. (2008) | 93.16 | - | G | O |
Chen et al. (2009) | 93.16 | - | G | O |
Li et al. (2014) | 93.19 | - | G | O |
Zhou et al. (2015) | 93.28 | 92.35 | T | O |
Pei et. al. (2015) | 93.29 | 92.13 | G | O |
Carreras et al. (2008) | 93.5 | - | G | O |
Chen et al. (2013) | 93.77 | - | G | O |
Suzuki et al. (2009) | 93.79 | - | G | O |
Suzuki et al. (2011) | 94.22 | - | G | O |
Stanford | UAS | LAS | T/G | EXT |
---|---|---|---|---|
Goldberg and Nivre (2012) | 90.96 | 88.72 | T | |
Honnibal et al. (2013) | 91.0 | 88.9 | T | |
Chen and Manning (2014) | 91.8 | 89.6 | T | |
Zhang and Nivre (2011) | 93.5 | 91.9 | T | |
Kiperwasser and Goldberg (2015) | 92.45 | - | ||
Dyer et. al. (2015) | 93.1 | 90.9 | ||
Weiss et. al. (2015) | 94.26 | 92.41 |
- Simple Semi-supervised Dependency Parsing, Koo et al., ACL, 2008.
- TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-rich Parsing, Carreras et al., CoNLL, 2008.
- An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing, Suzuki et al., EMNLP, 2009.
- Improving Dependency Parsing with Subtrees from Auto-Parsed Data, Chen et al., EMNLP, 2009.
- Efficient Third-order Dependency Parsers, Koo and Collins, ACL, 2010.
- Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning, Suzuki et al., ACL, 2011.
- Transition-based Dependency Parsing with Rich Non-local Features, Zhang and Nivre, ACL, 2011.
- A Dynamic Oracle for Arc-Eager Dependency Parsing, Goldberg and Nivre, COLING, 2012.
- Transition-based Dependency Parsing with Selectional Branching, Choi and McCallum, ACL, 2013.
- Semi-supervised Feature Transformation for Dependency Parsing, Chen et al., EMNLP, 2013.
- Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing, Li et al., ACL, 2014.
- Punctuation Processing for Projective Dependency Parsing, Ma et al., ACL, 2014.
- A Fast and Accurate Dependency Parser using Neural Networks, Chen and Manning, EMNLP, 2014.
- A Non-Monotonic Arc-Eager Transition System for Dependency Parsing, Honnibal et al., CoNLL, 2015.
- Combining Discrete and Continuous Features for Deterministic Transition-based Dependency Parsing, Zhang and Zhang, EMNLP, 2015
- Transition-Based Dependency Parsing with Stack Long Short-Term Memory, Dyer et al., ACL, 2015.
- Structured Training for Neural Network Transition-Based Parsing, Weiss et al., ACL, 2015.
- An Effective Neural Network Model for Graph-based Dependency Parsing, Pei et. al., ACL, 2015.
- A Neural Probabilistic Structured-Prediction Model for Transition-Based Dependency Parsing, Zhou et al., ACL, 2015
- Semi-supervised Dependency Parsing using Bilexical Contextual Features from Auto-Parsed Data, Kiperwasser and Goldberg, EMNLP, 2015.
The CoNLL 2012 shared task dataset (F1 scores).
Model | MUC | B3 | CEAFe | CoNLL |
---|---|---|---|---|
Fernandes et al. (2014) | 70.51 | 57.58 | 53.86 | 60.65 |
Björkelund & Kuhn (2014) | 70.72 | 58.58 | 55.61 | 61.63 |
Durrett & Klein (2014) | 71.24 | 58.71 | 55.18 | 61.71 |
Martschat & Strube (2015) | 72.17 | 59.58 | 55.67 | 62.47 |
Clark & Manning (2015) | 72.59 | 60.44 | 56.02 | 63.02 |
Peng et al. (2015) | 72.22 | 60.50 | 56.37 | 63.03 |
Wiseman et al. (2015) | 72.60 | 60.52 | 57.05 | 63.39 |
Wiseman et al. (2016) | 73.42 | 61.50 | 57.70 | 64.21 |
Clark & Manning (2016) | 74.06 | 62.86 | 58.96 | 65.29 |
- Latent Trees for Coreference Resolution, Fernandes et al., CL, 2014.
- Learning structured perceptrons for coreference Resolution with Latent Antecedents and Non-local Features, Björkelund and Kuhn, ACL, 2014.
- A Joint Model for Entity Analysis: Coreference, Typing, and Linking, Durrett and Klein, TACL, 2014.
- Latent Structures for Coreference Resolution, Martschat and Strube, TACL, 2015.
- Entity-Centric Coreference Resolution with Model Stacking, Clark and Manning, ACL, 2015.
- A Joint Framework for Coreference Resolution and Mention Head Detection, Peng et al., CoNLL, 2015.
- Learning Anaphoricity and Antecedent Ranking Features for Coreference Resolution, Wiseman et al., ACL, 2015.
- Learning Global Features for Coreference Resolution, Wiseman et al., NAACL, 2016.
- Improving Coreference Resolution by Learning Entity-Level Distributed Representations, Clark and Manning, ACL, 2016.
The Stanford Sentiment Treebank (accuracies).
Model | Fine-grained | Binary |
---|---|---|
Socher et al. (2013) | 45.7 | 85.4 |
Kim (2014) | 48.0 | 87.2 |
Kalchbrenner et al. (2014) | 48.5 | 86.8 |
Le and Mikolov (2014) | 48.7 | 87.8 |
Shin et al. (2016) | 48.8 | - |
Yin and Schütze (2015) | 49.6 | 89.4 |
Irsoy and Cardie (2014) | 49.8 | 86.6 |
Tai et al. (2015) | 51.0 | 88.0 |
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, Socher et al., ACL, 2013.
- Distributed Representations of Sentences and Documents, Le and Mikolov, ICML, 2014.
- A Convolutional Neural Network for Modelling Sentences, Kalchbrenner et al., ACL, 2014.
- Convolutional Neural Networks for Sentence Classification, Kim, EMNLP, 2014.
- Deep Recursive Neural Networks for Compositionality in Language, Irsoy and Cardie, NIPS, 2014.
- Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks, Tai et al., ACL, 2015.
- Multichannel Variable-Size Convolution for Sentence Classification, Yin and Schütze, CoNLL, 2015.
- Lexicon Integrated CNN Models with Attention for Sentiment Analysis, Shin et al., ArXiv, 2016.
- Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation, Pust et. al., EMNLP, 2015
- Boosting Transition-based AMR Parsing with Refined Actions and Auxiliary Analyzers, Wang et. al., ACL, 2015.
- Robust Subgraph Generation Improves Abstract Meaning Representation Parsing, Werling et. al., ACL, 2015.
Copyright © 2015-2019 Emory University - All Rights Reserved.