A re-implementation of nndep
using PyTorch.
Currently used for training CoreNLP dependency parsing models.
Requires Stanza for some features (auto-tagging with CoreNLP via server).
Originally by Danqi Chen. Leave a GitHub Issue if you have any questions!
Train a model:
python train.py -l universal -d /path/to/data --train_file it-train.conllu --dev_file it-dev.conllu --test_file it-test.conllu --embedding_file /path/to/it-embeddings.txt --embedding_size 100 --random_seed 21 --learning_rate .005 --l2_reg .01 --epsilon .001 --optimizer adamw --save_path /path/to/experiment-dir --job_id experiment-name --corenlp_tags --corenlp_tag_lang italian --n_epoches 2000
Note that the above command will automatically tag the input data with the CoreNLP tagger. Thus you need to have CoreNLP and the Italian models (for this example) in your CLASSPATH, and you need the latest version of Stanza installed.
Why is this done? When CoreNLP runs a dependency parser, it relies on part of speech tags, so the training and development data used during training need to have the predicted tags CoreNLP will use for optimal performance.
Convert to CoreNLP format:
python gen_model.py -o /path/to/italian-corenlp-parser.txt /path/to/experiment-dir/experiment-name