Training on new dataset #36

tanzheen · 2024-10-10T04:27:06Z

Hi I am looking to train this tokenizer on a sign language dataset CSL-Daily so that I can compress video frames into 32 tokens.
However, I am not getting very good results (attached below). I would like to check if all I have to do is download the pretrained model and then run it as:

WANDB_MODE=offline accelerate launch --num_machines=1 --num_processes=2 --machine_rank=0 --main_process_ip=127.0.0.1 --main_process_port=9999 --same_network scripts/train_titok.py config=configs/training/stage1/titok_;32.yaml \
    experiment.project="titok_l32_stage1" \
    experiment.name="titok_;32_stage1_run1" \
    experiment.output_dir="titok_l32_stage1_run1" \
    training.per_gpu_batch_size=32

Please advice! Thanks!

The text was updated successfully, but these errors were encountered:

cornettoyu · 2024-10-10T22:08:47Z

Hi I am looking to train this tokenizer on a sign language dataset CSL-Daily so that I can compress video frames into 32 tokens. However, I am not getting very good results (attached below). I would like to check if all I have to do is download the pretrained model and then run it as:
WANDB_MODE=offline accelerate launch --num_machines=1 --num_processes=2 --machine_rank=0 --main_process_ip=127.0.0.1 --main_process_port=9999 --same_network scripts/train_titok.py config=configs/training/stage1/titok_;32.yaml \
    experiment.project="titok_l32_stage1" \
    experiment.name="titok_;32_stage1_run1" \
    experiment.output_dir="titok_l32_stage1_run1" \
    training.per_gpu_batch_size=32
Please advice! Thanks!

It seems to me that you only did stage1 training, where I would expect some blurry faces as it relies on the ImageNet-trained MaskGIT-VQGAN which is not good for humans. Did you try run stage2 fine-tuning as well? I would expect the stage2 finetuning should help in your case

tanzheen · 2024-10-11T01:59:44Z

Yes, I only did stage 1 training, this is due to the fact that I only want the new 32 embedded tokens. Will the stage 2 training affect the values of the 32 tokens? I read in the paper that stage 2 training is only for the decoder

Also, do you think I should train MaskGIt to my dataset too for the proxy codes?

tanzheen · 2024-10-12T06:58:36Z

@cornettoyu do you have any inputs

cornettoyu · 2024-10-14T22:19:06Z

@cornettoyu do you have any inputs

IMHO it would still be helpful to train stage2, to see if the reconsturction results can be better. If not, then it means the token representation is bottlenecked by gap between your dataset and ImageNet, then I would suggest a full fine-tuning on you dataset instead of decoder only (e.g., you can enable the gradient for encoder/quantizer here)

grsilva9 · 2024-12-11T19:06:29Z

Hey,

First of all, congratulations on the great work. Absolutely outstanding.

On the topic of training on a new dataset from scratch, what are the steps/command I should use to train stages 1 and 2?

I'd really appreciate some directions on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on new dataset #36

Training on new dataset #36

tanzheen commented Oct 10, 2024 •

edited

Loading

cornettoyu commented Oct 10, 2024

tanzheen commented Oct 11, 2024

tanzheen commented Oct 12, 2024

cornettoyu commented Oct 14, 2024

grsilva9 commented Dec 11, 2024

Training on new dataset #36

Training on new dataset #36

Comments

tanzheen commented Oct 10, 2024 • edited Loading

cornettoyu commented Oct 10, 2024

tanzheen commented Oct 11, 2024

tanzheen commented Oct 12, 2024

cornettoyu commented Oct 14, 2024

grsilva9 commented Dec 11, 2024

tanzheen commented Oct 10, 2024 •

edited

Loading