Implementation of Tacotron 2 TTS model in PyTorch
Note: don't forget to clone the repo with git clone --recurse-submodules
To launch and inference in nvidia-docker container follow these instructions:
- Install nvidia-docker
- Run
./docker-build.sh
To launch training follow these instructions:
- Set preferred configurations in
config/config.yaml
in particular you might want to set dataset path (it will be concatendated with data path indocker-train.sh
) - In
docker-run.sh
changememory
,memory-swap
,shm-size
,cpuset-cpus
,gpus
, and datavolume
to desired values - Set WANDB_API_KEY environment variable to your wandb key
- Run
./docker-train.sh waveglow_model_path
Where:
waveglow_model_path
is a path to waveglow .pt model file. It can be downloaded here (Link from https://github.com/NVIDIA/waveglow)
All outputs including models will be saved to outputs
dir.
To launch inference run the following command:
./docker-inference.sh model_path label_encoder_path waveglow_model_path device input_text
Where:
model_path
is a path to .ckpt model filelabel_encoder_path
is a path to .pickle label encoder file. It is generated during training byfut_label_encoder.py
scriptwaveglow_model_path
is a path to waveglow .pt model file. It can be downloaded here (Link from https://github.com/NVIDIA/waveglow)device
is the device to inference on: either 'cpu', 'cuda' or cuda device numberinput_text
is an input text for TTS
Predicted output wav and spectrogram will be saved in inferenced
folder
Full example:
./docker-inference.sh ./last.ckpt ./le.pickle ../../Tacotron2/waveglow_256channels_universal_v5.pt cuda 'So, so what? I'\''m still a rock star I got my rock moves!'
All pretrained files for inference (tacotron 2 checkpoint trained on LJSpeech, label encoder and waveglow checkpoint) can be downloaded here.