Please share restyle_video settings #11

rkhamilton · 2021-10-18T13:21:52Z

rkhamilton
Oct 18, 2021
Maintainer

My initial motivation for creating this package was to have a modular implementation for vqgan+clip so that I could experiment with style transfers / restyling videos and try to get smoother, less flickery video.

I think that adding the ability to blend in the previous generated frames to each new frame’s initial image helps a lot with smoothing the video. However, now the algorithm has a lot of degrees of freedom. I’ve shared my current best settings in the readme and examples folder, but if you find settings that work well for you I’d appreciate it if you share the settings here.

Have fun!

gateway · 2021-10-18T18:14:02Z

gateway
Oct 18, 2021

In the past with some of the older style transfer systems they used optical flow however the older code wasn't really gpu optimized and took forever to process (days). Nvidia has their own GPU accelerated optical flow but it requires downloading their sdk etc.. blah blah blah.. I did find this https://github.com/NVIDIA/flownet2-pytorch which could be useful. I have found at least in older experiments optical flow method seems like the best for any ai generated frames that get converted into video. However I havent really researched what new methods are out, but what ever there is it really needs to be gpu enabled.

I think the settings you are allowing us to mess around with it one step but maybe research in optical flow might even make things better..

Thoughts?

0 replies

rkhamilton · 2021-10-19T02:14:52Z

rkhamilton
Oct 19, 2021
Maintainer Author

It's an interesting idea to use optical flow. It's a cool technology that's temping to play with. However, I'm not sure it would fundamentally solve the problem of flicker / instability in GAN-generated video. Optical flow seems to me to be, fundamentally, state of the art frame interpolation. If you have two adjacent frames that are very different from each other, there's no way any algorithm could do more than smoothly transition from one very different image to another (optical flow would move visual elements smoothly to their new locations).

Consider a pathological case where VQGAN is rendering a photo of a cat on even frames, and dogs on odd frames. Optical flow can't solve that my moving elements around. This is why I am focusing on ways to get VQGAN to have more stable training results, where small differences in init_image (adjacent frames of source video) don't lead to such hugely different results after training.

It's not written up anywhere in the docs here, but I've done a fair bit of testing with different optimization algorithms, and you do get very different results using AdamW vs Adam vs Adagrad etc. There are many available in torch.optim and torch_optimizers (both packages are included in this package for this kind of evaluation). These optimizers are what tells the GAN how to change it's parameters to generate the next iteration of image so it will more closely match the CLIP prompts.

Just due to my own background I'm more interested in looking at the instability of the training, rather than fixing it in post, so to speak, with sophisticated interpolation methods.

In fact, I've just realized as I write this that a change I made in v1.1 is resetting the RNG seed in my new restyle_video methods every frame of video, which undermines my desire for stability. Maybe by a lot. I'll change that right away...

2 replies

gateway Oct 19, 2021

Yea I understand that.. I had played with optical flow a while back for that was on the style transfer system that came out a few years ago.. Here is an example of a clip of my friend.. not sure why he sent it to me upside down but it turned out pretty cool. https://youtu.be/N4skAqMwhaA

Maybe we should start a thread for the various optimizations (I saw them in some other code a while back). I could probably run a set of tests over night to see the differences. RIFE is pretty neat but kinda off the topic unless I want to smooth the frame rate a bit more.. https://rife-vfi.github.io/

rkhamilton Oct 19, 2021
Maintainer Author

RIFE looks promising. It looks like something you could insert as a line at the end of existing scripts. Create a restyled video at lower framerate (maybe 6fps) then use RIFE with the 4x option to convert to 24fps. I'd prefer if they had a script that took video frames as the input, but it might not be difficult to create one. I expect they are internally just extracting frames to a folder.

BTW I tested my fix to the RNG seed last night and I think it makes for noticeably more consistent frames of video when restyling.

pip install git+https://github.com/rkhamilton/vqgan-clip-generator.git --upgrade

rkhamilton · 2021-10-19T16:34:47Z

rkhamilton
Oct 19, 2021
Maintainer Author

Here is a clip I created with v.1.1.2 with the following settings: Video link

config.init_weight = 1.0
text_prompts = 'The dragon smaug breathing fire as the village burns'
copy_audio = False
extraction_framerate = 15
output_framerate = 60
iterations = 15
current_source_frame_prompt_weight=0.1
previous_generated_frame_prompt_weight=0.0
generated_frame_init_blend=0.1
upscale_images = False
face_enhance=False

0 replies

gateway · 2021-10-21T20:21:30Z

gateway
Oct 21, 2021

Was talking with someone lately about smoothing out videos and they mention this.. not sure if your familiar with this process

Just interpolate the latent vector. Or if you're dealing with VQGAN frame by frame use spynet for dense optical flow. Then subtract frames to see which parts are occluded. To fill occluded part use the trick from masked VQGAN: select cutouts from the occluded parts with high probability and from with low probability from the non-occluded parts. That way you get a nice feathered fill.

https://www.reddit.com/r/deepdream/comments/qcyu9v/video_smoothing_deflicker_optical_flow_etc_for/

3 replies

gateway Oct 21, 2021

NOW this is smooth.. https://www.reddit.com/r/deepdream/comments/mfbvy3/deep_dreaming_with_pytorchspynet_optical_flow/

rkhamilton Oct 21, 2021
Maintainer Author

That example looks great. As far as your redditor comment about interpolating the latent vector etc... I know what some of those words mean, so that's something!

I'm not an expert in machine learning, but I have a background from a long time ago in (completely unrelated) numerical research. The guy you quoted seems like a machine learning computer vision researcher. I'd have to do some work to understand more what they are talking about wrt spynet for optical flow, cutouts, etc.

His suggestion about interpolating the latent vector does make sense to me, and it's something I could try out. It's not clear to me why the results would be different from interpolating between output images though.

gateway Oct 21, 2021

yea all this is above my head, but I like to try to find solutions, get people taking and see what comes out that process..

rkhamilton · 2021-10-21T22:35:34Z

rkhamilton
Oct 21, 2021
Maintainer Author

Another video with the same settings using v1.1.3. Video link
``python
config.init_weight = 1.0
text_prompts = 'The dragon smaug breathing fire as the village burns'
copy_audio = False
extraction_framerate = 15
output_framerate = 60
iterations = 15
current_source_frame_prompt_weight=0.1
previous_generated_frame_prompt_weight=0.0
generated_frame_init_blend=0.1
upscale_images = False
face_enhance=False

3 replies

gateway Oct 21, 2021

that video link doesn't seem to work..

rkhamilton Oct 21, 2021
Maintainer Author

It was HEVC, so wouldn't play on mobile. I just replaced the link with h264.

gateway Oct 21, 2021

like the video it deff is looking better.. but you probably have the same feeling when you watch it the frame changes or multiple frame changes really can take your eye away from the main subject and in this case you.. but awesome dude!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please share restyle_video settings #11

{{title}}

Replies: 5 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Please share restyle_video settings #11

rkhamilton Oct 18, 2021 Maintainer

Replies: 5 comments · 8 replies

gateway Oct 18, 2021

rkhamilton Oct 19, 2021 Maintainer Author

gateway Oct 19, 2021

rkhamilton Oct 19, 2021 Maintainer Author

rkhamilton Oct 19, 2021 Maintainer Author

gateway Oct 21, 2021

gateway Oct 21, 2021

rkhamilton Oct 21, 2021 Maintainer Author

gateway Oct 21, 2021

rkhamilton Oct 21, 2021 Maintainer Author

gateway Oct 21, 2021

rkhamilton Oct 21, 2021 Maintainer Author

gateway Oct 21, 2021

rkhamilton
Oct 18, 2021
Maintainer

Replies: 5 comments 8 replies

gateway
Oct 18, 2021

rkhamilton
Oct 19, 2021
Maintainer Author

rkhamilton Oct 19, 2021
Maintainer Author

rkhamilton
Oct 19, 2021
Maintainer Author

gateway
Oct 21, 2021

rkhamilton Oct 21, 2021
Maintainer Author

rkhamilton
Oct 21, 2021
Maintainer Author

rkhamilton Oct 21, 2021
Maintainer Author