We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On page 16 of the newest version, it mentions that,
We set dropout as 0.0 for Transformer base/small model.
However --dropout 0.3 is used in
--dropout 0.3
adahessian/transformer/config/adahessian.sh
Line 22 in d1a3442
More importantly, for lr of AdamW, the paper adopts lr from this work, which utilizes
lr
AdamW
lr =7×10−4/5×10−4 for Transformer-Base/Big respectively
While lr=0.0015 is used in
lr=0.0015
adahessian/transformer/config/adam.sh
Line 17 in d1a3442
Would be grateful if the original training parameters are provided for reproducing results.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
On page 16 of the newest version, it mentions that,
However
--dropout 0.3
is used inadahessian/transformer/config/adahessian.sh
Line 22 in d1a3442
More importantly, for
lr
ofAdamW
, the paper adopts lr from this work, which utilizesWhile
lr=0.0015
is used inadahessian/transformer/config/adam.sh
Line 17 in d1a3442
Would be grateful if the original training parameters are provided for reproducing results.
The text was updated successfully, but these errors were encountered: