This is a fitting framework implemented in Pytorch for reconstructing the face in an image or a video using a 3DMM model.
The framework only uses Pytorch modules and a differentiable renderer from pytorch3d. The whole module is differentiable and can be integrated into other systems for the gradient propagation.
- 🌟 Refactored the code to make future extension easier (other 3DMM models will be supported). The hyper-parameters are also re-organized.
- 🌟 Added support for monocular video reconstruction.
- 🌟 Added support for multi-gpu when processing long videos.
- pytorch3d It might require a specific version of Pytorch to make pytorch3d run succussfully on gpus, please follow the official instructions.
- Please refer to "requirements.txt" for other dependences.
- Basel Face Model 2009 (BFM09)
- Expression Basis extra expression basis.
- Clone the repo:
git clone https://github.com/ascust/3DMM-Fitting-Pytorch
cd 3DMM-Fitting-Pytorch
-
Download the Basel Face Model and put "01_MorphableModel.mat" and put it into "BFM".
-
Download the Expression Basis. Go to the repo, download the "CoarseData" and put "Exp_Pca.bin" into "BFM".
-
Convert the BFM parameters by:
python convert_bfm09_data.py
python fit_single_img.py --img_path data/000002.jpg --res_folder results
The results are stored in "results" folder.
To fit a video, simply run:
python fit_video.py --v_path data/sample_video.mp4 --res_folder results
The script will extract frames, detact features and fit all frames.
Fitting a video is a bit different from fitting an image, because frames are not isolated. In this implementation, we first estimate shape and texture of the target face using some of the frames (indicated by --nframes_shape). Then we estimate the other coefficients (expression, rotatation etc.) for each frame and keep the shape and texture coefficients fixed.
For the first frame, we use much more iterations to get a good starting point. For remaining frames, each is initialized from the previous estimation.Pose regulerizers are also imposed to increase temporal consistency.
Please check the reconstructed video in the result folder.
It could take quite a while to process a long video, so multi-gpu and multi-process are also supported to accelerate fitting. To use this feature, simply run:
python fit_video.py --v_path data/sample_video.mp4 --res_folder results --ngpus 4 --nworkers 4
Here we use 4 gpus and 4 workers. We can also use more workers to assign each gpu with multiple workers. The video will be evenly split and each clip will be fit respectively.
There are bunch of parameters that might require further tuning. The iteration numbers for non-rigid fitting "--first_nrf_iters" and "--rest_nrf_iters" affect the fitting speed a lot since we have to render the 3d mesh in each iteration. Try to change it to find a trade-off between speed and accuracy. If the result is not good, try to play with the parameters.
The code is partially borrowed from Deep3DFaceReconstrution, which is a Tensorflow-based deep reconstruction method using CNNs. Please note that our framework does not require any pretrained deep models. We estimate the parameters directly using the landmarks and photometric loss as the supervision.