Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

German TTS: using an emotion results in neutral speech #2721

Open
Mareike-RTY opened this issue Jan 13, 2025 · 0 comments
Open

German TTS: using an emotion results in neutral speech #2721

Mareike-RTY opened this issue Jan 13, 2025 · 0 comments

Comments

@Mareike-RTY
Copy link

Mareike-RTY commented Jan 13, 2025

Bug Description

Hi, I am trying to create emotional speech using the MSTTS extension and SSML. In English and Chinese it works as expected, but somehow not in German. When using the style "sad" or "cheerful" for the voice "de-DE-ConradNeural", the resulting audio output file still sounds neutral (even though the docs say that this voice supports these two styles). The WAV file is created successfully, it just doesn't sound any different from the neutral style.

Expected Behavior

When setting the style attribute to "sad" or "cheerful", the German neural voice "de-DE-ConradNeural" should read the text in the corresponding emotional tone.

Code

import os

import azure.cognitiveservices.speech as speechsdk
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("SPEECH_KEY")
region = os.getenv("SPEECH_REGION")

speech_config = speechsdk.SpeechConfig(subscription=api_key, region=region)

speech_synthesizer = speechsdk.SpeechSynthesizer(
        speech_config=speech_config, audio_config=None
    )
ssml_string = """
<speak xmlns="https://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" version="1.0" xml:lang="de-DE">
  <voice name="de-DE-ConradNeural">
    <mstts:express-as style="cheerful" styledegree="2">Ich freue mich riesig über die Beförderung</mstts:express-as>
  </voice>
</speak>
"""
speech_synthesis_result = speech_synthesizer.speak_ssml_async(ssml_string).get()
stream = speechsdk.AudioDataStream(speech_synthesis_result)
stream.save_to_wav_file("./out.wav")

To reproduce

  1. Use an SSML string with the voice "de-DE-ConradNeural", style "sad" or "cheerful", style degree "2" and the text "Ich freue mich riesig über die Beförderung"
  2. Synthesize.
  3. Listen to the resulting audio file, noticing it doesn't sound sad or cheerful but neutral.

Version of the Cognitive Services Speech SDK

azure-cognitiveservices-speech 1.41.1

Operating System and Programming Language

Linux, Python 3.11 and 3.12

Any hint or help is appreciated, thanks.

@Mareike-RTY Mareike-RTY changed the title German TTS: setting the style to "cheerful" results in neutral speech German TTS: using an emotion results in neutral speech Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant