-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on new dataset #36
Comments
It seems to me that you only did stage1 training, where I would expect some blurry faces as it relies on the ImageNet-trained MaskGIT-VQGAN which is not good for humans. Did you try run stage2 fine-tuning as well? I would expect the stage2 finetuning should help in your case |
Yes, I only did stage 1 training, this is due to the fact that I only want the new 32 embedded tokens. Will the stage 2 training affect the values of the 32 tokens? I read in the paper that stage 2 training is only for the decoder Also, do you think I should train MaskGIt to my dataset too for the proxy codes? |
@cornettoyu do you have any inputs |
IMHO it would still be helpful to train stage2, to see if the reconsturction results can be better. If not, then it means the token representation is bottlenecked by gap between your dataset and ImageNet, then I would suggest a full fine-tuning on you dataset instead of decoder only (e.g., you can enable the gradient for encoder/quantizer here) |
Hey, First of all, congratulations on the great work. Absolutely outstanding. On the topic of training on a new dataset from scratch, what are the steps/command I should use to train stages 1 and 2? I'd really appreciate some directions on this. |
Hi I am looking to train this tokenizer on a sign language dataset CSL-Daily so that I can compress video frames into 32 tokens.
However, I am not getting very good results (attached below). I would like to check if all I have to do is download the pretrained model and then run it as:
Please advice! Thanks!
The text was updated successfully, but these errors were encountered: