You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm trying to fine-tune CLIP on a custom dataset whose domain is not very different from LAION's datasets, just more specific. I'm curious if anyone's tried fine-tuning the models here on image-caption datasets and if so, are there any general guidelines for doing so?
More specifically, I'm wondering:
Is convergence possible with a small batch size (say, 32)?
Given the nature of training, it seems to me that a ginormous batch size is essential when training from scratch to help the model learn richer representations. However, does this still hold for fine-tuning? If not, what may be a reasonable batch size?
What's the smallest dataset one can use to see good performance -- 10k, 20k, 100k, ...?
Has catastrophic forgetting been observed? What are the best strategies available to combat it?
I wanted to pose these questions sooner rather than later as I've limited resources and it can be a while before I can test any hypotheses. Any pointers / insight are much appreciated. Thanks!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, I'm trying to fine-tune CLIP on a custom dataset whose domain is not very different from LAION's datasets, just more specific. I'm curious if anyone's tried fine-tuning the models here on image-caption datasets and if so, are there any general guidelines for doing so?
More specifically, I'm wondering:
Given the nature of training, it seems to me that a ginormous batch size is essential when training from scratch to help the model learn richer representations. However, does this still hold for fine-tuning? If not, what may be a reasonable batch size?
I wanted to pose these questions sooner rather than later as I've limited resources and it can be a while before I can test any hypotheses. Any pointers / insight are much appreciated. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions