Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于训练syncnet 这块 我结合了deepseek 直接用512的图片训练 #92

Open
endofD opened this issue Jan 17, 2025 · 4 comments

Comments

@endofD
Copy link

endofD commented Jan 17, 2025

直接用512的图片训练
scripts/train_syncnet.py: 脚本用的是 configs/syncnet/syncnet_16_pixel.yaml

visual_encoder: # input (48, 128, 256)
in_channels: 48
block_out_channels: [64, 128, 256, 256, 512, 1024, 2048, 2048]
downsample_factors: [[1, 2], 2, 2, 2, 2, 2, 2, 2]
attn_blocks: [0, 0, 0, 0, 0, 0, 0, 0]
dropout: 0.0

修改

visual_encoder: # input (48, 128, 512)
in_channels: 48
block_out_channels: [64, 128, 256, 256, 512, 1024, 2048, 2048,2048]
downsample_factors: [[1, 2], 2, 2, 2, 2, 2, 2, 2, 2]
attn_blocks: [0, 0, 0, 0, 0, 0, 0, 0,0]
dropout: 0.0

resolution: 256 修改成512

然后数据处理去 resize 脸到512 。

@jishunyu
Copy link

你训练出来可用的模型了吗

@endofD
Copy link
Author

endofD commented Jan 19, 2025

所以 别切换vae可能 显存爆炸
直接用512的图片训练
scripts/train_syncnet.py: 脚本用的是 configs/syncnet/syncnet_16_pixel.yaml

  visual_encoder: # input (48, 128, 256)
    in_channels: 48
    block_out_channels: [64, 128, 256, 256, 512, 1024, 2048, 2048]
    downsample_factors: [[1, 2], 2, 2, 2, 2, 2, 2, 2]
    attn_blocks: [0, 0, 0, 0, 0, 0, 0, 0]
    dropout: 0.0

修改

 visual_encoder: # input (48, 128, 512)
    in_channels: 48
    block_out_channels: [64, 128, 256, 256, 512, 1024, 2048, 2048,2048]
    downsample_factors: [[1, 2], 2, 2, 2, 2, 2, 2, 2, 2]
    attn_blocks: [0, 0, 0, 0, 0, 0, 0, 0,0]
    dropout: 0.0

resolution: 256 修改成512

然后数据处理去 resize 脸到512 。

... 我回头测试下

@endofD endofD changed the title 关于训练syncnet 这块 我结合了deepseek 步骤感觉ok 关于训练syncnet 这块 我结合了deepseek 直接用512的图片训练 Jan 19, 2025
@endofD
Copy link
Author

endofD commented Jan 19, 2025

data_processing_pipeline.sh

里面直接 能修改 分辨率 提取

python -m preprocess.data_processing_pipeline \
    --total_num_workers 20 \
    --per_gpu_num_workers 10 \
    --resolution 256 \    #  512
    --sync_conf_threshold 3 \
    --temp_dir temp \
    --input_dir /mnt/bn/maliva-gen-ai-v2/chunyu.li/VoxCeleb2/raw

--resolution 256 \ 修改成512

@endofD
Copy link
Author

endofD commented Jan 19, 2025

回头测试下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants