Wyoming protocol server for Microsoft Azure text-to-speech.
This Python package provides a Wyoming integration for Microsoft Azure text-to-speech and can be directly used with Home Assistant voice and Rhasspy.
This program uses Microsoft Azure Speech Service. You can sign up to a free Azure account which comes with free tier of 500K characters per month, this should be enough for running a voice assistant as each command is relatively short. Plus, on Home Assistant the outputs are cached so each response will only be requested once. Once this amount is exceeded Azure could charge you for each second used (Current pricing is $0.36 per audio hour). I am not responsible for any incurred charges and recommend you set up a spending limit to reduce your exposure. However, for normal usage the free tier could suffice and the resource should not switch to a paid service automatically.
If you have not set up a speech resource, you can follow the instructions below. (you only need to do this once and works both for Speech-to-Text and Text-to-Speech)
- Sign in or create an account on portal.azure.com.
- Create a subscription by searching for
subscription
in the search bar. Consult Microsoft Learn for more information. - Create a speech resource by searching for
speech service
. - Select the subscription you created, pick or create a resource group, select a region, pick an identifiable name, and select the pricing tier (you probably want Free F0)
- Once created, copy one of the keys from the speech service page. You will need this to run this program.
Depending on your use case there are different installation options.
-
Using pip Clone the repository and install the package using pip. Please note the platform requirements as noted here.
pip install .
-
Home Assistant Add-On Add the following repository as an add-on repository to your Home Assistant, or click the button below. https://github.com/hugobloem/homeassistant-addons
-
Docker container To run as a Docker container use the following command:
docker run ghcr.io/hugobloem/wyoming-microsoft-tts-noha:latest --<key> <value>
For the relevant keys please look at the table below
Depending on the installation method parameters are parsed differently. However, the same options are used for each of the installation methods and can be found in the table below. Your service region and subscription key can be found on the speech service resource page (step 5 the Azure Speech service instructions).
For the bare-metal Python install the program is run as follows:
python -m wyoming-microsoft-tts --<key> <value>
Key | Optional | Description |
---|---|---|
service-region |
No | Azure service region e.g., uksouth |
subscription-key |
No | Azure subscription key |
uri |
No | Uri where the server will be broadcasted e.g., tcp://0.0.0.0:10200 |
download-dir |
Yes | Directory to download voices.json into (default: /tmp/) |
voice |
Yes | Default voice to set for transcription, default: en-GB-SoniaNeural |
auto-punctuation |
Yes | Automatically add punctuation (default: ".?!" ) |
samples-per-chunk |
Yes | Number of samples per audio chunk (default: 1024) |
update-voices |
Yes | Download latest languages.json during startup |
debug |
Yes | Log debug messages |