-
Notifications
You must be signed in to change notification settings - Fork 16
Supported SAPI 5 Features
gexgd0419 edited this page Feb 27, 2024
·
5 revisions
- Basic TTS audio output. Only the
24kHz 16Bit mono
audio format is supported. If the SAPI client requests a different format, the SAPI framework will convert the audio to the specified format, but you can't get higher quality than24kHz 16Bit mono
. - Volume and Rate adjustment. Volume can be 0 to 100, and rate can be -10 (1/3x speed) to 10 (3x speed).
- Pitch adjustment. Pitch can be -10 (50% lower) to 10 (50% higher).
- Word boundary event that tells the client which word is being spoken right now.
- Viseme event that tells the client the current viseme being pronounced. The client can use them to show real-time mouth positions.
-
Bookmark event that will be sent to the client whenever a bookmark tag
<bookmark mark="xx"/>
is reached. -
<silence>
tag that pauses the voice for a specified duration. -
<emph>
tag that emphasizes a section of text. -
<context>
tag that tells the voice what a certain confusable part is supposed to mean. For example, what date03/04/01
is. -
<pron>
tag that inserts a specified pronunciation, when the engine doesn't know how to pronounce a word.
- Skipping. The engine ignores all skipping requests.
- Applying volume and rate adjustments during speaking.
-
SPF_NLP_SPEAK_PUNC
flag specifying that punctuation characters should be expanded into words. - Phoneme event. Viseme event, however, is supported.
-
<partofsp>
tag that specifies which part of speech a word is (noun, verb, etc.).
If you are using the local/offline/embedded natural voices, these features will be unavailable.
-
<silence>
tag (SAPI) /<break>
tag (SSML) -
<emph>
tag (SAPI) /<emphasis>
tag (SSML)
Microsoft Edge online voices only recognize some of the SSML tags, such as <speak>
, <voice>
, and <prosody>
.
If there's an unrecognized tag in the SSML text, it will be rejected by the server. So this engine removes all such tags and converts them to plain text.
Voice audio data will be fetched in 24kHz 96kbps mono MP3
format, then converted to 24kHz 16Bit mono
wave data.