Please share restyle_video settings #11
Replies: 5 comments 8 replies
-
In the past with some of the older style transfer systems they used optical flow however the older code wasn't really gpu optimized and took forever to process (days). Nvidia has their own GPU accelerated optical flow but it requires downloading their sdk etc.. blah blah blah.. I did find this https://github.com/NVIDIA/flownet2-pytorch which could be useful. I have found at least in older experiments optical flow method seems like the best for any ai generated frames that get converted into video. However I havent really researched what new methods are out, but what ever there is it really needs to be gpu enabled. I think the settings you are allowing us to mess around with it one step but maybe research in optical flow might even make things better.. Thoughts? |
Beta Was this translation helpful? Give feedback.
-
It's an interesting idea to use optical flow. It's a cool technology that's temping to play with. However, I'm not sure it would fundamentally solve the problem of flicker / instability in GAN-generated video. Optical flow seems to me to be, fundamentally, state of the art frame interpolation. If you have two adjacent frames that are very different from each other, there's no way any algorithm could do more than smoothly transition from one very different image to another (optical flow would move visual elements smoothly to their new locations). Consider a pathological case where VQGAN is rendering a photo of a cat on even frames, and dogs on odd frames. Optical flow can't solve that my moving elements around. This is why I am focusing on ways to get VQGAN to have more stable training results, where small differences in init_image (adjacent frames of source video) don't lead to such hugely different results after training. It's not written up anywhere in the docs here, but I've done a fair bit of testing with different optimization algorithms, and you do get very different results using AdamW vs Adam vs Adagrad etc. There are many available in torch.optim and torch_optimizers (both packages are included in this package for this kind of evaluation). These optimizers are what tells the GAN how to change it's parameters to generate the next iteration of image so it will more closely match the CLIP prompts. Just due to my own background I'm more interested in looking at the instability of the training, rather than fixing it in post, so to speak, with sophisticated interpolation methods. In fact, I've just realized as I write this that a change I made in v1.1 is resetting the RNG seed in my new restyle_video methods every frame of video, which undermines my desire for stability. Maybe by a lot. I'll change that right away... |
Beta Was this translation helpful? Give feedback.
-
Here is a clip I created with v.1.1.2 with the following settings: Video link config.init_weight = 1.0
text_prompts = 'The dragon smaug breathing fire as the village burns'
copy_audio = False
extraction_framerate = 15
output_framerate = 60
iterations = 15
current_source_frame_prompt_weight=0.1
previous_generated_frame_prompt_weight=0.0
generated_frame_init_blend=0.1
upscale_images = False
face_enhance=False |
Beta Was this translation helpful? Give feedback.
-
Was talking with someone lately about smoothing out videos and they mention this.. not sure if your familiar with this process
https://www.reddit.com/r/deepdream/comments/qcyu9v/video_smoothing_deflicker_optical_flow_etc_for/ |
Beta Was this translation helpful? Give feedback.
-
Another video with the same settings using v1.1.3. Video link
|
Beta Was this translation helpful? Give feedback.
-
My initial motivation for creating this package was to have a modular implementation for vqgan+clip so that I could experiment with style transfers / restyling videos and try to get smoother, less flickery video.
I think that adding the ability to blend in the previous generated frames to each new frame’s initial image helps a lot with smoothing the video. However, now the algorithm has a lot of degrees of freedom. I’ve shared my current best settings in the readme and examples folder, but if you find settings that work well for you I’d appreciate it if you share the settings here.
Have fun!
Beta Was this translation helpful? Give feedback.
All reactions