Train voice having 44Khz sampling rate #604

donlk · 2024-09-15T13:10:24Z

Hi!
I have appr. 1.5 hours of audio voice at 44Khz and like to train a usable model from it. I don't want to retrain, as the pre-trained checkpoints are all 22Khz, sounding muddy and not that good.
I tried training from scratch, specifying the correct sampling_rate of 44100. Reached 2000 epochs, but the inferred audio was way too fast, skipping words in the process.

What should I modify or patch in to make this work?

thanks!

agonzalezd · 2024-09-17T13:19:00Z

i suggest resampling your data to 22050 Hz. you can use ffmpeg to do so

donlk · 2024-09-18T23:15:21Z

I would abstain from that if possible, due to huge quality loss.

Luke100000 · 2024-10-07T15:47:08Z

Make sure the samplerate is set correctly everywhere, not just training but also inference: https://github.com/search?q=repo%3Arhasspy%2Fpiper%2022050&type=code

Other than that my guess is that you would need to adapt the decoder parameters here: https://github.com/rhasspy/piper/blob/master/src/python/piper_train/vits/config.py#L30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train voice having 44Khz sampling rate #604

Train voice having 44Khz sampling rate #604

donlk commented Sep 15, 2024

agonzalezd commented Sep 17, 2024

donlk commented Sep 18, 2024

Luke100000 commented Oct 7, 2024

Train voice having 44Khz sampling rate #604

Train voice having 44Khz sampling rate #604

Comments

donlk commented Sep 15, 2024

agonzalezd commented Sep 17, 2024

donlk commented Sep 18, 2024

Luke100000 commented Oct 7, 2024