Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add support for Microsoft's leaked offline natural voices #31

Open
1001ruchka opened this issue Nov 21, 2024 · 4 comments
Open

Comments

@1001ruchka
Copy link

1001ruchka commented Nov 21, 2024

Together with the game Senua's Saga: Hellblade II leaked a few offline natural voices that can not be downloaded from the Microsoft website. In particular, there are Russian and Ukrainian local voices.
In MultiTTS program ( t.me/MultiTTS_channel) added support for these voices.

Unfortunately NaturalVoiceSAPIAdapter gives an error when trying to use them:

Speech synthesis error:
Local TTS speak failed, with TTS error code = EMBEDDED_TTS_ERROR_WRONG_DECRYPTION_KEY. Wrong embedded speech synthesis model key.

I tested on the Russian voice:
https://drive.google.com/drive/folders/1o8Y1oAm-3RCs_efoNjo075iB0tarilzb

@1001ruchka 1001ruchka changed the title Please add support for Microsoft's leaked natural voices Please add support for Microsoft's leaked local natural voices Nov 21, 2024
@1001ruchka 1001ruchka changed the title Please add support for Microsoft's leaked local natural voices Please add support for Microsoft's leaked offline natural voices Nov 21, 2024
@1001ruchka
Copy link
Author

1001ruchka commented Nov 21, 2024

Configuration file for use in MultiTTS. There may be an encryption key here.

microsoft:
- !!org.nobody.multitts.tts.speaker.Speaker
  avatar: null
  code: ru-RU-SvetlanaNeural
  desc: ru-RU-SvetlanaNeural
  extendUI: null
  gender: 0
  locale: ru-RU
  name: Svetlana
  note: null
  param: |-
    Microsoft Server Speech Text to Speech Voice (ru-RU, SvetlanaNeural)
    BGJJwTRVfhRYIZq0xtySkIQlJbmBDsX6GsVyDRFHM0AzOjRvZ7ELI5kgzCUWYAKhTk99WDj5aOSWY@KHnffCDqlB008FmEUZHXM2lmKaFnfffnu4r8eiLUyYuH1uf4fSYA39OKQUZ9wY
  pitch: 1.0
  sampleRate: 24000
  speed: 1.0
  type: 0
  volume: 1.0

@gexgd0419
Copy link
Owner

This is very interesting. Thanks.

With the correct key BGJJwTRVfhRYIZq0xtySkIQlJbmBDsX6GsVyDRFHM0AzOjRvZ7ELI5kgzCUWYAKhTk99WDj5aOSWY@KHnffCDqlB008FmEUZHXM2lmKaFnfffnu4r8eiLUyYuH1uf4fSYA39OKQUZ9wY in the config file, the voices do work.

However, the voice enumerator in this adapter assumes that all local natural voices are Narrator voices, so it uses the key for Narrator voices, which is incorrect for those leaked voices.

To utilize those voices in the current version of this adapter, you can bypass the voice enumerator, and put the voice information directly into the registry.

In registry editor, create a registry key under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens. The key's name will be the voice's internal ID. Then, create the following keys & values inside this key:

  • String value (Default): display name of the voice, e.g. Microsoft Svetlana
  • String value CLSID: {013ab33b-ad1a-401c-8bee-f6e2b046a94e}
  • Subkey: Attributes
    • String value Language: hexadecimal language ID of the voice, e.g. 409 for English (US), 419 for Russian (Russia). Usually you can find an INI file named with a number, e. g. 1049.INI, inside the voice folder. Convert 1049 to hexadecimal and you will get the language ID.
    • String value Gender: Male or Female.
    • String value Age: Adult, Senior, Child, or Teen.
  • Subkey: NaturalVoiceConfig
    • String value Path: path of the voice folder (e. g. path to ru-RU-SvetlanaNeural)
    • String value Key: the decryption key

After that, the voice will be shown in the voice list of 64-bit programs. To make it work in 32-bit programs, you should also create the same registry key under HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Speech\Voices\Tokens.

Create one key for each voice you want to add. This has to be done manually, until I update the voice enumerator so that it can support such voices.

@gexgd0419
Copy link
Owner

MultiTTS seems to be much more advanced than this adapter, which was originally meant to be no more than a simple proof of concept.

As some users requested to support more kinds of voices, I think that a more user-friendly way to introduce other voices would be to use "voice packs" that users can just import and use, like what MultiTTS is doing, instead of making the users manually input the keys, paths, etc.

As I don't have my own channel to distribute such voices, maybe I can consider making the adapter able to use (some of) the voice packs for MultiTTS. So here are some questions.

  • Is MultiTTS open-sourced? Do I have a way to get its source code?
  • Can I make my adapter able to use the voice pack format for MultiTTS?

@1001ruchka
Copy link
Author

Thank you so much! It's working.
As for MultiTTS I unfortunately have no information. There is an official chat: https://t.me/MultiTTS_global where you can chat with the developer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants