From b89c23358647d7b948269d908047293566e6b8cd Mon Sep 17 00:00:00 2001 From: Gong Yicheng <50798737+Gong-Yicheng@users.noreply.github.com> Date: Tue, 30 Jul 2024 14:44:05 +0800 Subject: [PATCH] Update index.html --- projects/EmoTalk3D/index.html | 204 ++++++++++++++++++---------------- 1 file changed, 106 insertions(+), 98 deletions(-) diff --git a/projects/EmoTalk3D/index.html b/projects/EmoTalk3D/index.html index f0df608..8e3af9b 100644 --- a/projects/EmoTalk3D/index.html +++ b/projects/EmoTalk3D/index.html @@ -60,24 +60,29 @@ @@ -87,9 +92,7 @@ -
-
@@ -146,9 +149,9 @@

EmoTalk3D:


- 1State Key Laboratory for Novel Software Technology, Nanjing University, China,
- 2Fudan University, Shanghai, China - 3Huawei Noah's Ark Lab + 1 State Key Laboratory for Novel Software Technology, Nanjing University, China,
+ 2 Fudan University, Shanghai, China    + 3 Huawei Noah's Ark Lab
@@ -193,70 +196,66 @@

EmoTalk3D: - + - Data Aquisition* + Data Aquisition -
+
- -
-
-

Abstract

-
-

- Despite significant progress in the field of 3D talking heads, prior methods still suffer from - multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect - EmoTalk3D dataset with calibrated multi-view videos, emotional - annotations, and per-frame 3D geometry. Besides, We present a novel approach for synthesizing - emotion-controllable, featuring enhanced lip synchronization and rendering quality. -

-

- By training on the EmoTalk3D dataset, we propose a - "Speech-to-Geometry-to-Appearance" - mapping framework that first predicts faithful 3D geometry sequence from the audio features, then - the appearance of a 3D talking head represented by 4D Gaussians is synthesized from the predicted - geometry. The appearance is further disentangled into canonical and dynamic Gaussians, learned - from multi-view videos, and fused to render free-view talking head animation. -

-

- Moreover, our model extracts emotion labels from the input speech and enables controllable emotion - in the generated talking heads. Our method exhibits improved rendering quality and stability in - lip motion generation while capturing dynamic facial details such as wrinkles and subtle - expressions. -

-
-
-
-

-
- - - -
+ +
+
+

Abstract

+
+

+ Despite significant progress in the field of 3D talking heads, prior methods still suffer from + multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect + EmoTalk3D dataset with calibrated multi-view videos, emotional + annotations, and per-frame 3D geometry. Besides, We present a novel approach for synthesizing + emotion-controllable, featuring enhanced lip synchronization and rendering quality. +

+

+ By training on the EmoTalk3D dataset, we propose a + "Speech-to-Geometry-to-Appearance" + mapping framework that first predicts faithful 3D geometry sequence from the audio features, then + the appearance of a 3D talking head represented by 4D Gaussians is synthesized from the predicted + geometry. The appearance is further disentangled into canonical and dynamic Gaussians, learned + from multi-view videos, and fused to render free-view talking head animation. +

+

+ Moreover, our model extracts emotion labels from the input speech and enables controllable emotion + in the generated talking heads. Our method exhibits improved rendering quality and stability in + lip motion generation while capturing dynamic facial details such as wrinkles and subtle + expressions. +

+
+
+
+ +
-

Method

+

Method

- Overall Pipeline.The pipeline consists of five modules: + Overall Pipeline.The pipeline consists of five modules: 1) Emotion-content; Disentangle Encoder that parses content features and emotion features from input speech; 2) Speech-to-Geometry Network (S2GNet) that predicts dynamic 3D pointclouds from the features; @@ -268,64 +267,78 @@

Method

+ +
-

Dataset

+

Dataset

- We establish EmoTalk3D dataset, an emotion-annotated multi-view talking head dataset with per-frame 3D + We establish EmoTalk3D dataset, an emotion-annotated multi-view talking head dataset with per-frame 3D facial shapes. - EmoTalk3D dataset provides audio, per-frame multi-view images, camera paramters and corresponding + EmoTalk3D dataset provides audio, per-frame multi-view images, camera paramters and corresponding reconstructed 3D shapes. The data have been released to public for non-commercial research purpose.

+
+
+
+ + + +
+
+

Data Acquisition

+

For data acquisition, please fill up the License Agreement - and send it via email by clicking this link. + and send it to nju3dv@nju.edu.cn. + The email subject format is [EmoTalk3D Dataset Request]. We recommend to apply using a *.edu e-mail, which is more likely to be authorized.

- + +
-

Results

+

Results

-
-
-
@@ -335,7 +348,7 @@

Results

-

In-the-wild Audio-driven

+

In-the-wild Audio-driven

+ - -
-
+
-
-

Our video

+

Our video

-
- -
- +
+ +
(back to top) -
- -
+ +

BibTeX

-
@article{he2024emotalk3d,
-  author    = {He, Qianyun and Ji, Xinya and Gong, Yicheng and Lu, Yuanxun and Diao, Zhengyu and Huang, Linjia and Yao, Yao and Zhu, Siyu and Ma, Zhan and Xu, Songchen and Wu, Xiaofei and Zhang, Zixiao and Cao, Xun and Zhu, Hao},
-  title     = {EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head},
-  journal   = {ECCV},
-  year      = {2024},
+      
@inproceedings{he2024emotalk3d,
+        title={EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head},
+        author={He, Qianyun and Ji, Xinya and Gong, Yicheng and Lu, Yuanxun and Diao, Zhengyu and Huang, Linjia and Yao, Yao and Zhu, Siyu and Ma, Zhan and Xu, Songchen and Wu, Xiaofei and Zhang, Zixiao and Cao, Xun and Zhu, Hao},
+        booktitle={European Conference on Computer Vision (ECCV)},
+        year={2024}      
 }