Supported SAPI 5 Features

Supported SAPI 5 features

Basic TTS audio output. Only the 24kHz 16Bit mono audio format is supported. If the SAPI client requests a different format, the SAPI framework will convert the audio to the specified format, but you can't get higher quality than 24kHz 16Bit mono.
Volume and Rate adjustment. Volume can be 0 to 100, and rate can be -10 (1/3x speed) to 10 (3x speed).
Pitch adjustment. Pitch can be -10 (50% lower) to 10 (50% higher).
Word boundary event that tells the client which word is being spoken right now.
Viseme event that tells the client the current viseme being pronounced. The client can use them to show real-time mouth positions.
Bookmark event that will be sent to the client whenever a bookmark tag <bookmark mark="xx"/> is reached.
<silence> tag that pauses the voice for a specified duration.
<emph> tag that emphasizes a section of text.
<context> tag that tells the voice what a certain confusable part is supposed to mean. For example, what date 03/04/01 is.
<pron> tag that inserts a specified pronunciation, when the engine doesn't know how to pronounce a word.

Not supported SAPI 5 features

Skipping. The engine ignores all skipping requests.
Applying volume and rate adjustments during speaking.
SPF_NLP_SPEAK_PUNC flag specifying that punctuation characters should be expanded into words.
Phoneme event. Viseme event, however, is supported.
<partofsp> tag that specifies which part of speech a word is (noun, verb, etc.).

Not supported features when using embedded voices

If you are using the local/offline/embedded natural voices, these features will be unavailable.

<silence> tag (SAPI) / <break> tag (SSML)
<emph> tag (SAPI) / <emphasis> tag (SSML)

Limitation when using Microsoft Edge online voices

Microsoft Edge online voices only recognize some of the SSML tags, such as <speak>, <voice>, and <prosody>.

If there's an unrecognized tag in the SSML text, it will be rejected by the server. So this engine removes all such tags and converts them to plain text.

Voice audio data will be fetched in 24kHz 96kbps mono MP3 format, then converted to 24kHz 16Bit mono wave data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supported SAPI 5 Features

Supported SAPI 5 features

Not supported SAPI 5 features

Not supported features when using embedded voices

Limitation when using Microsoft Edge online voices

Clone this wiki locally