This repository contains a script designed to enhance videos generated by the Wav2Lip tool.
🔥 New Update : Automatic1111 extension can be found here, https://github.com/numz/sd-wav2lip-uhq with big improvement !!
Result video can be find here : https://www.youtube.com/watch?v=-3WLUxz6XKM
This script provides an enhancement to the videos generated by the Wav2Lip tool. It improves the quality of the lip-sync videos by applying specific post-processing techniques with controlNet 1.1.
- Stable diffusion webui automatic1111 + ControlNet 1.1 extension
- Python 3.6 or higher
- FFmpeg
- You can install Stable Diffusion Webui by following the instructions on the Stable Diffusion Webui repository.
- You can install ControlNet 1.1 extension by following the instructions on the ControlNet 1.1 repository.
- Download ControlNet model control_v11f1e_sd15_tile at [ControlNet Models]https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main and install it in controlnet models folder in automatic1111
- FFmpeg : download it from the official FFmpeg site. Follow the instructions appropriate for your operating system.
- Clone this repository.
git clone https://github.com/numz/wav2lip_uhq.git
- go to the directory
cd wav2lip_uhq
- Create venv and activate it.
python3 -m venv venv
source venv/bin/activate
- Install the required Python libraries using the command :
pip install -r requirements.txt
- Launch Stable diffusion webui with "--api" flag.
- Choose your model in stable diffusion webui.
- Run using the following command:
python wav2lip_uhq.py -f <file> -i <input_file>
Here is a description of each argument:
-f
or--file
: Path to the video generated by Wav2Lip.-i
or--input_file
: Path to the original video.-p
or--post_process
: if set to false script only create images and mask for alternative process
This script operates in several stages to improve the quality of Wav2Lip-generated videos:
-
Mask Creation: The script first creates a mask around the mouth in the video.
-
Video Quality Enhancement: It takes the low-quality Wav2Lip video and overlays the low-quality mouth onto the high-quality original video.
-
ControlNet Integration: The script then sends the original image with the low-quality mouth and the mouth mask to ControlNet. Using the
automatic1111
API, it requests ControlNet to perform a render on the mouth, thereby enhancing the final quality of the video.
in the file "payloads/controlNet.json" you'll find the payload send to automatic1111 api. feel free to change it to your needs. following parameters could drastically change the result:
- denoising_strength (0.2 - 1.0) default 1, high value can create flickering, low value can create blurry result
- mask_blur (0 - 50) default 8
- alwayson_scripts > controlnet > args > threshold_a (1 - 32) default 1
- alwayson_scripts > controlnet > args > threshold_b (1 - 32) default 32
- inpainting_fill (0 - 3) default 2, 0 = fill, 1 = original, 2 = latent noise, 3 = latent nothing
- steps (1 - 100) default 30, number of steps for diffusion
if you set -p
or --post_process
to "False", the script will only create images and masks. you can then use those folders in automatic1111 webui in img2img Batch mode:
It will give you more control over the result
- use a high quality video as input
- use a high quality model in stable diffusion webui like delibarate_v2
- play with the payload parameters
Contributions to this project are welcome. Please ensure any pull requests are accompanied by a detailed description of the changes made.
Specify the open-source license under which your project is published here.
Provide your contact details here for any questions or comments about the project.