New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

关于训练unet和SyncNet的问题 #68

Open

wangaocheng opened this issue Jan 11, 2025 · 0 comments

wangaocheng commented Jan 11, 2025

UNET按照教程我训练了20000步，训练正常。我一共只有自己录的三个视频，一共30分钟左右。
但是SyncNet训练的时候报错，

最先是报：
RuntimeError: Given groups=1, weight of size [64, 48, 3, 3], expected input[4, 15, 128, 256] to have 48 channels, but got 15 channels instead

然后我把
visual_encoder:
in_channels: 15 #原配置这里的值是48，我改成了15

跟着就报下面的错误：
RuntimeError: Calculated padded input size per channel: (3 x 2). Kernel size: (3 x 3). Kernel size can't be greater than actual input size

说输入特征图的尺寸过小。这个怎么解决？

视频是1080x1080，训练用的256x256，数据集用的预处理后的high_resolution目录。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment