Can I help to add Chinese lip sync for this project? #82

yiouyou · 2025-01-12T10:42:46Z

If so, what should I do?

met4citizen · 2025-01-12T12:45:16Z

If you're interested in contributing, you can start by reading the Appendix C: Create A New Lip-sync Module.

Creating a new lip-sync module is quite straightforward for phonetically orthographic languages, like Finnish, in which most letters represent specific phonemes. However, as you well know, Chinese is morphosyllabic, not phonetically orthographic, and very different from English too, so the approaches used in the currently available lip-sync modules might not be the best place to start. Therefore, if you don't already have a clear idea of how to do words-to-visemes conversion in Chinese, take your time and perhaps see how others have approached that specific task in other similar projects.

Once you have a new lip-sync module file for Chinese, e.g. "lipsync-zh.mjs," you can open a new Pull Request, and I can help you finalize and test it. I don't speak Chinese, so I'm unable to give you advice on the pre-processor method or the actual words-to-visemes conversion, but otherwise, I'm happy to answer questions and help where I can.

yiouyou · 2025-01-13T02:53:03Z

Thank you for those information, I'll look into it. I also found that the Google TTS is not the best option for the Chinese language. Could we support edgeTTS easily?

met4citizen · 2025-01-13T07:29:58Z

Edge TTS is essentially a wrapper that calls Microsoft's undocumented WebSocket API for the Azure TTS service while pretending to be an Edge browser and using Microsoft's internal client token. Technically, you could make Edge TTS work with TalkingHead by setting the WordBoundary request header and passing the audio and word-level timestamps to the TalkingHead's speakAudio method. The real problem with Edge TTS is that it is not legal to use. Here's a quote from Microsoft regarding the use of Edge TTS: "To legally provide TTS services, we recommend using Microsoft's official Azure AI Speech platform."

Note that if you decide to use the official Microsoft Azure TTS, you don't need to create a new lip-sync module, as Azure can provide viseme IDs directly. This feature is also supported in Chinese. I just tried the voice "zh-CN-XiaoxiaoNeural," and the lip-sync worked without any changes. You can find a code example of how to use Azure TTS (Microsoft Speech SDK) with TalkingHead class in the test app ./index.html (see methods microsoftSpeak and microsoftProcessQueue).

yiouyou · 2025-01-14T02:30:13Z

Wow, if "zh-CN-XiaoxiaoNeural" works directly, do you mind adding a minimal example code for the usage of Azure and Chinese? Thanks!

met4citizen · 2025-01-14T07:21:00Z

Yes, the TalkingHead speakAudio method accepts viseme IDs, so if the TTS engine can provide them, there is no need for words-to-visemes conversion. The support has been there for a year already so this is not a new feature. However, as far as I know, Azure TTS is still the only one that can output viseme IDs. Google TTS, ElevenLabs, and all the others can't provide them and, therefore, require the language-specific lip-sync modules.

Azure TTS can provide viseme IDs for over a hundred different languages or dialects. Chinese is just one of them and works exactly the same way as any other language, so I see no need for a Chinese-specific example. As an example of how to use Azure in your app, you can use the test app (./index.html) as your reference. All the code you need is there, and I don't want to maintain it in two different places.

If you don't know where to start, here's a rough outline: 1) Include Microsoft Speech SDK to your app from a CDN (see index.htm, line 403), 2) Copy-paste the "// Speak using Microsoft" section from the test app (includes the microsoftSpeak and microsoftProcessQueue functions), and 3) Instead of calling the speakText method in your app, call the microsoftSpeak function. That's it. Of course, you need to make some minor modifications to the code, set the Azure voice ID, decide how to manage your Azure API key in your app, and so on, but that shouldn't be a problem.

yiouyou · 2025-01-14T14:56:05Z

Thanks for detailed introduction. Moreover, as I know, like Azure, the AWS polly tts service outputs viseme as well: https://docs.aws.amazon.com/polly/latest/dg/viseme.html

met4citizen · 2025-01-14T16:51:20Z

You are right, Amazon Polly can also output visemes. I always forget it because I have never used it myself. Thank you!

met4citizen closed this as completed Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I help to add Chinese lip sync for this project? #82

Can I help to add Chinese lip sync for this project? #82

yiouyou commented Jan 12, 2025

met4citizen commented Jan 12, 2025

yiouyou commented Jan 13, 2025

met4citizen commented Jan 13, 2025

yiouyou commented Jan 14, 2025

met4citizen commented Jan 14, 2025 •

edited

Loading

yiouyou commented Jan 14, 2025

met4citizen commented Jan 14, 2025

Can I help to add Chinese lip sync for this project? #82

Can I help to add Chinese lip sync for this project? #82

Comments

yiouyou commented Jan 12, 2025

met4citizen commented Jan 12, 2025

yiouyou commented Jan 13, 2025

met4citizen commented Jan 13, 2025

yiouyou commented Jan 14, 2025

met4citizen commented Jan 14, 2025 • edited Loading

yiouyou commented Jan 14, 2025

met4citizen commented Jan 14, 2025

met4citizen commented Jan 14, 2025 •

edited

Loading