Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on new dataset #36

Open
tanzheen opened this issue Oct 10, 2024 · 5 comments
Open

Training on new dataset #36

tanzheen opened this issue Oct 10, 2024 · 5 comments

Comments

@tanzheen
Copy link

tanzheen commented Oct 10, 2024

Hi I am looking to train this tokenizer on a sign language dataset CSL-Daily so that I can compress video frames into 32 tokens.
However, I am not getting very good results (attached below). I would like to check if all I have to do is download the pretrained model and then run it as:

WANDB_MODE=offline accelerate launch --num_machines=1 --num_processes=2 --machine_rank=0 --main_process_ip=127.0.0.1 --main_process_port=9999 --same_network scripts/train_titok.py config=configs/training/stage1/titok_;32.yaml \
    experiment.project="titok_l32_stage1" \
    experiment.name="titok_;32_stage1_run1" \
    experiment.output_dir="titok_l32_stage1_run1" \
    training.per_gpu_batch_size=32

00862136_s-001
00862136_s-000

Please advice! Thanks!

@cornettoyu
Copy link
Collaborator

Hi I am looking to train this tokenizer on a sign language dataset CSL-Daily so that I can compress video frames into 32 tokens. However, I am not getting very good results (attached below). I would like to check if all I have to do is download the pretrained model and then run it as:

WANDB_MODE=offline accelerate launch --num_machines=1 --num_processes=2 --machine_rank=0 --main_process_ip=127.0.0.1 --main_process_port=9999 --same_network scripts/train_titok.py config=configs/training/stage1/titok_;32.yaml \
    experiment.project="titok_l32_stage1" \
    experiment.name="titok_;32_stage1_run1" \
    experiment.output_dir="titok_l32_stage1_run1" \
    training.per_gpu_batch_size=32

00862136_s-001 00862136_s-000

Please advice! Thanks!

It seems to me that you only did stage1 training, where I would expect some blurry faces as it relies on the ImageNet-trained MaskGIT-VQGAN which is not good for humans. Did you try run stage2 fine-tuning as well? I would expect the stage2 finetuning should help in your case

@tanzheen
Copy link
Author

Yes, I only did stage 1 training, this is due to the fact that I only want the new 32 embedded tokens. Will the stage 2 training affect the values of the 32 tokens? I read in the paper that stage 2 training is only for the decoder

Also, do you think I should train MaskGIt to my dataset too for the proxy codes?

@tanzheen
Copy link
Author

@cornettoyu do you have any inputs

@cornettoyu
Copy link
Collaborator

@cornettoyu do you have any inputs

IMHO it would still be helpful to train stage2, to see if the reconsturction results can be better. If not, then it means the token representation is bottlenecked by gap between your dataset and ImageNet, then I would suggest a full fine-tuning on you dataset instead of decoder only (e.g., you can enable the gradient for encoder/quantizer here)

@grsilva9
Copy link

Hey,

First of all, congratulations on the great work. Absolutely outstanding.

On the topic of training on a new dataset from scratch, what are the steps/command I should use to train stages 1 and 2?

I'd really appreciate some directions on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants