German TTS: using an emotion results in neutral speech #2721

Mareike-RTY · 2025-01-13T14:33:39Z

Bug Description

Hi, I am trying to create emotional speech using the MSTTS extension and SSML. In English and Chinese it works as expected, but somehow not in German. When using the style "sad" or "cheerful" for the voice "de-DE-ConradNeural", the resulting audio output file still sounds neutral (even though the docs say that this voice supports these two styles). The WAV file is created successfully, it just doesn't sound any different from the neutral style.

Expected Behavior

When setting the style attribute to "sad" or "cheerful", the German neural voice "de-DE-ConradNeural" should read the text in the corresponding emotional tone.

Code

import os

import azure.cognitiveservices.speech as speechsdk
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("SPEECH_KEY")
region = os.getenv("SPEECH_REGION")

speech_config = speechsdk.SpeechConfig(subscription=api_key, region=region)

speech_synthesizer = speechsdk.SpeechSynthesizer(
        speech_config=speech_config, audio_config=None
    )
ssml_string = """
<speak xmlns="https://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" version="1.0" xml:lang="de-DE">
  <voice name="de-DE-ConradNeural">
    <mstts:express-as style="cheerful" styledegree="2">Ich freue mich riesig über die Beförderung</mstts:express-as>
  </voice>
</speak>
"""
speech_synthesis_result = speech_synthesizer.speak_ssml_async(ssml_string).get()
stream = speechsdk.AudioDataStream(speech_synthesis_result)
stream.save_to_wav_file("./out.wav")

To reproduce

Use an SSML string with the voice "de-DE-ConradNeural", style "sad" or "cheerful", style degree "2" and the text "Ich freue mich riesig über die Beförderung"
Synthesize.
Listen to the resulting audio file, noticing it doesn't sound sad or cheerful but neutral.

Version of the Cognitive Services Speech SDK

azure-cognitiveservices-speech 1.41.1

Operating System and Programming Language

Linux, Python 3.11 and 3.12

Any hint or help is appreciated, thanks.

Mareike-RTY changed the title ~~German TTS: setting the style to "cheerful" results in neutral speech~~ German TTS: using an emotion results in neutral speech Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

German TTS: using an emotion results in neutral speech #2721

German TTS: using an emotion results in neutral speech #2721

Mareike-RTY commented Jan 13, 2025 •

edited

Loading

German TTS: using an emotion results in neutral speech #2721

German TTS: using an emotion results in neutral speech #2721

Comments

Mareike-RTY commented Jan 13, 2025 • edited Loading

Bug Description

Expected Behavior

Code

To reproduce

Version of the Cognitive Services Speech SDK

Operating System and Programming Language

Mareike-RTY commented Jan 13, 2025 •

edited

Loading