flax/examples/lm1b at master · rolandgvc/flax

README.md

Trains a Transformer-based model (Vaswani et al., 2017) on the One Billion Word Benchmark (lm1b; Chelba et al., 2013).

This example uses linear learning rate warmup and inverse square root learning rate schedule.

TensorFlow dataset lm1b/subwords32k will be downloaded and prepared automatically, if necessary.

The model should run with other configurations and hardware, but explicitely tested on the following.

Hardware	Batch size	Training time	Perplexity	TensorBoard.dev
8 x Nvidia V100 (16GB)	2048	1d 4h 32m	33.24	2020-03-14

python train.py --batch_size=2048 --model_dir=./lm1b_bs=2048