Inconsistence between paper and training scripts on NMT tasks #25

yzlnew · 2021-12-02T06:41:48Z

On page 16 of the newest version, it mentions that,

We set dropout as 0.0 for Transformer base/small model.

However --dropout 0.3 is used in

Line 22 in d1a3442

--dropout 0.3 --attention-dropout 0.1 --relu-dropout 0.1 \

More importantly, for lr of AdamW, the paper adopts lr from this work, which utilizes

lr =7×10−4/5×10−4 for Transformer-Base/Big respectively

While lr=0.0015 is used in

Line 17 in d1a3442

--lr 0.0015 --min-lr 1e-9 \

Would be grateful if the original training parameters are provided for reproducing results.

The text was updated successfully, but these errors were encountered:

Provide feedback