Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I help to add Chinese lip sync for this project? #82

Closed
yiouyou opened this issue Jan 12, 2025 · 7 comments
Closed

Can I help to add Chinese lip sync for this project? #82

yiouyou opened this issue Jan 12, 2025 · 7 comments

Comments

@yiouyou
Copy link

yiouyou commented Jan 12, 2025

If so, what should I do?

@met4citizen
Copy link
Owner

If you're interested in contributing, you can start by reading the Appendix C: Create A New Lip-sync Module.

Creating a new lip-sync module is quite straightforward for phonetically orthographic languages, like Finnish, in which most letters represent specific phonemes. However, as you well know, Chinese is morphosyllabic, not phonetically orthographic, and very different from English too, so the approaches used in the currently available lip-sync modules might not be the best place to start. Therefore, if you don't already have a clear idea of how to do words-to-visemes conversion in Chinese, take your time and perhaps see how others have approached that specific task in other similar projects.

Once you have a new lip-sync module file for Chinese, e.g. "lipsync-zh.mjs," you can open a new Pull Request, and I can help you finalize and test it. I don't speak Chinese, so I'm unable to give you advice on the pre-processor method or the actual words-to-visemes conversion, but otherwise, I'm happy to answer questions and help where I can.

@yiouyou
Copy link
Author

yiouyou commented Jan 13, 2025

Thank you for those information, I'll look into it. I also found that the Google TTS is not the best option for the Chinese language. Could we support edgeTTS easily?

@met4citizen
Copy link
Owner

Edge TTS is essentially a wrapper that calls Microsoft's undocumented WebSocket API for the Azure TTS service while pretending to be an Edge browser and using Microsoft's internal client token. Technically, you could make Edge TTS work with TalkingHead by setting the WordBoundary request header and passing the audio and word-level timestamps to the TalkingHead's speakAudio method. The real problem with Edge TTS is that it is not legal to use. Here's a quote from Microsoft regarding the use of Edge TTS: "To legally provide TTS services, we recommend using Microsoft's official Azure AI Speech platform."

Note that if you decide to use the official Microsoft Azure TTS, you don't need to create a new lip-sync module, as Azure can provide viseme IDs directly. This feature is also supported in Chinese. I just tried the voice "zh-CN-XiaoxiaoNeural," and the lip-sync worked without any changes. You can find a code example of how to use Azure TTS (Microsoft Speech SDK) with TalkingHead class in the test app ./index.html (see methods microsoftSpeak and microsoftProcessQueue).

@yiouyou
Copy link
Author

yiouyou commented Jan 14, 2025

Wow, if "zh-CN-XiaoxiaoNeural" works directly, do you mind adding a minimal example code for the usage of Azure and Chinese? Thanks!

@met4citizen
Copy link
Owner

met4citizen commented Jan 14, 2025

Yes, the TalkingHead speakAudio method accepts viseme IDs, so if the TTS engine can provide them, there is no need for words-to-visemes conversion. The support has been there for a year already so this is not a new feature. However, as far as I know, Azure TTS is still the only one that can output viseme IDs. Google TTS, ElevenLabs, and all the others can't provide them and, therefore, require the language-specific lip-sync modules.

Azure TTS can provide viseme IDs for over a hundred different languages or dialects. Chinese is just one of them and works exactly the same way as any other language, so I see no need for a Chinese-specific example. As an example of how to use Azure in your app, you can use the test app (./index.html) as your reference. All the code you need is there, and I don't want to maintain it in two different places.

If you don't know where to start, here's a rough outline: 1) Include Microsoft Speech SDK to your app from a CDN (see index.htm, line 403), 2) Copy-paste the "// Speak using Microsoft" section from the test app (includes the microsoftSpeak and microsoftProcessQueue functions), and 3) Instead of calling the speakText method in your app, call the microsoftSpeak function. That's it. Of course, you need to make some minor modifications to the code, set the Azure voice ID, decide how to manage your Azure API key in your app, and so on, but that shouldn't be a problem.

@yiouyou
Copy link
Author

yiouyou commented Jan 14, 2025

Thanks for detailed introduction. Moreover, as I know, like Azure, the AWS polly tts service outputs viseme as well: https://docs.aws.amazon.com/polly/latest/dg/viseme.html

@met4citizen
Copy link
Owner

You are right, Amazon Polly can also output visemes. I always forget it because I have never used it myself. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants