training detail for ViT-H/14 xlm roberta large #644

ae86208 · 2023-02-13T02:14:30Z

ae86208
Feb 13, 2023

As mentioned in README, ViT-B/32 xlm roberta base is trained from scratch, while ViT-H/14 xlm roberta large using LiT style training. Does ViT-H/14 xlm roberta large training from scratch ever converge?

rom1504 · 2023-02-14T17:56:50Z

rom1504
Feb 14, 2023
Maintainer

Most likely it would, just more costly since the visual encoder is then trained.

https://twitter.com/rom1504/status/1593719037808320513?t=8s946794nkQOi1um4nrQQg&s=19

Do you have any specific questions?

0 replies

ae86208 · 2023-02-15T03:31:09Z

ae86208
Feb 15, 2023
Author

Well, I tried ViT-L/14 with xlm roberta large, both from scratch and the loss did not converge at all for quite a few iterations. Training data is laion-2B multi.

0 replies

rom1504 · 2023-02-15T08:57:42Z

rom1504
Feb 15, 2023
Maintainer

What learning rate and BS are you using? If using something like batch size 90000 and learning rate 0.001 and precision amp bfloat16 Then after 7500 iterations you should see it go down significantly Note that you would need like 512 GPUs to train it from scratch in less than 2 weeks.

…

On Wed, Feb 15, 2023, 04:31 Jiahui Du ***@***.***> wrote: Well, I tried ViT-L/14 with xlm roberta large, both from scratch and the loss did not converge at all for quite a few iterations. Training data is laion-2B multi. — Reply to this email directly, view it on GitHub <#423 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAR437UDHHOKOQEVOC3MI43WXREYRANCNFSM6AAAAAAUZWSQNI> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

ae86208 · 2023-02-15T15:21:21Z

ae86208
Feb 15, 2023
Author

I used 58,880 bs(128 a100 80G with grad checkpointing) and 5e-4 lr, and tried both amp and amp bfloat16. After that, I tried both initialized from pretrained models, vit from lain-2b en pretrained, with amp bf16 it starts to converge quickly. However, glad to know that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training detail for ViT-H/14 xlm roberta large #644

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

training detail for ViT-H/14 xlm roberta large #644

ae86208 Feb 13, 2023

Replies: 4 comments

rom1504 Feb 14, 2023 Maintainer

ae86208 Feb 15, 2023 Author

rom1504 Feb 15, 2023 Maintainer

ae86208 Feb 15, 2023 Author

ae86208
Feb 13, 2023

rom1504
Feb 14, 2023
Maintainer

ae86208
Feb 15, 2023
Author

rom1504
Feb 15, 2023
Maintainer

ae86208
Feb 15, 2023
Author