Releases: huggingface/transformers
Releases · huggingface/transformers
Bug fix update to load the pretrained `TransfoXLModel` from s3, added fallback for OpenAIGPTTokenizer when SpaCy is not installed
Mostly a bug fix update for loading the TransfoXLModel
from s3:
- Fixes a bug in the loading of the pretrained
TransfoXLModel
from the s3 dump (which is a convertedTransfoXLLMHeadModel
) in which the weights were not loaded. - Added a fallback of
OpenAIGPTTokenizer
on BERT'sBasicTokenizer
when SpaCy and ftfy are not installed. Using BERT'sBasicTokenizer
instead of SpaCy should be fine in most cases as long as you have a relatively clean input (SpaCy+ftfy were included to exactly reproduce the paper's pre-processing steps on the Toronto Book Corpus) and this also let us use thenever_split
option to avoid splitting special tokens like[CLS], [SEP]...
which is easier than adding the tokens after tokenization. - Updated the README on the tokenizers options and methods which was lagging behind a bit.
Adding OpenAI GPT and Transformer-XL pretrained models, python2 support, pre-training script for BERT, SQuAD 2.0 example
New pretrained models:
-
Open AI GPT pretrained on the Toronto Book Corpus ("Improving Language Understanding by Generative Pre-Training" by Alec Radford et al.).
- This is a slightly modified version of our previous PyTorch implementation to increase the performances by spliting words and position embeddings in separate embeddings matrices.
- Performance checked to be on part with the TF implementation on ROCStories: single run evaluation accuracy of 86.4% vs. authors reporting a median accuracy of 85.8% with the TensorFlow code (see details in the example section of the readme).
-
Transformer-XL pretrained on WikiText 103 ("Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" by Zihang Dai, Zhilin Yang et al.). This is a slightly modified version of Google/CMU's PyTorch implementation to match the performances of the TensorFlow version by:
- untying relative positioning embeddings across layers,
- changing memory cells initialization to keep sinusoïdal positions identical
- adding full logits outputs in the adaptive softmax to use it in a generative setting.
- Performance checked to be on part with the TF implementation on WikiText 103: evaluation perplexity of 18.213 vs. authors reporting a perplexity of 18.3 on this dataset with the TensorFlow code (see details in the example section of the readme).
New scripts:
- Updated the SQuAD fine-tuning script to work also on SQuAD V2.0 by @abeljim and @Liangtaiwan
run_lm_finetuning.py
let you pretrain aBERT
language model or fine-tune it with masked-language-modeling and next-sentence-prediction losses by @deepset-ai, @tholor and @nhatchan (compatibility Python 3.5)
Backward compatibility:
- The library is now compatible with Python 2 also
Improvements and bug fixes:
- add a
never_split
option and arguments to the tokenizers (@WrRan) - better handle errors when BERT is feed with inputs that are too long (@patrick-s-h-lewis)
- better layer normalization layer initialization and bug fix in examples scripts: args.do_lower_case is always True(@donglixp)
- fix learning rate schedule issue in example scripts (@matej-svejda)
- readme fixes (@danyaljj, @nhatchan, @davidefiocco, @girishponkiya )
- importing unofficial TF models in BERT (@nhatchan)
- only keep the active part of the loss for token classification (@Iwontbecreative)
- fix argparse type error in example scripts (@ksurya)
- docstring fixes (@rodgzilla, @wlhgtc )
- improving
run_classifier.py
loading of saved models (@SinghJasdeep) - In examples scripts: allow do_eval to be used without do_train and to use the pretrained model in the output folder (@jaderabbit, @likejazz and @JoeDumoulin )
- in
run_squad.py
: fix error whenbert_model
param is path or url (@likejazz) - add license to source distribution and use entry-points instead of scripts (@sodre)
4x speed-up using NVIDIA apex, new multi-choice classifier and example for SWAG-like dataset, pytorch v1.0, improved model loading, improved examples...
New:
- 3-4 times speed-ups in fp16 (versus fp32) thanks to NVIDIA's work on apex (by @FDecaYed)
- new sequence-level multiple-choice classification model + example fine-tuning on SWAG (by @rodgzilla)
- improved backward compatibility to python 3.5 (by @hzhwcmhf)
- bump up to PyTorch 1.0
- load fine-tuned model with
from_pretrained
- add examples on how to save and load fine-tuned models.
Added two pre-trained models and one new fine-tuning class
This release comprise the following improvements and updates:
- added two new pre-trained models from Google:
bert-large-cased
andbert-base-multilingual-cased
, - added a model that can be fine-tuned for token-level classification:
BertForTokenClassification
, - added tests for every model class, with and without labels,
- fixed tokenizer loading function
BertTokenizer.from_pretrained()
when loading from a directory containing a pretrained model, - fixed typos in model docstrings and completed the docstrings,
- improved examples (added
do_lower_case
argument).
Small improvements and a few bug fixes.
Improvement:
- Added a
cache_dir
option tofrom_pretrained()
function to select a specific path to download and cache the pre-trained model weights. Useful for distributed training (see readme) (fix issue #44).
Bug fixes in model training and tokenizer loading:
- Fixed error in CrossEntropyLoss reshaping (issue #55).
- Fixed unicode error in vocabulary loading (issue #52).
Bug fixes in examples:
- Fix weight decay in examples (previously bias and layer norm weights were also decayed due to an erroneous check in training loop).
- Fix fp16 grad norm is None error in examples (issue #43).
Updated readme and docstrings
First release
This is the first release of pytorch_pretrained_bert
.