Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train voice having 44Khz sampling rate #604

Open
donlk opened this issue Sep 15, 2024 · 3 comments
Open

Train voice having 44Khz sampling rate #604

donlk opened this issue Sep 15, 2024 · 3 comments

Comments

@donlk
Copy link

donlk commented Sep 15, 2024

Hi!
I have appr. 1.5 hours of audio voice at 44Khz and like to train a usable model from it. I don't want to retrain, as the pre-trained checkpoints are all 22Khz, sounding muddy and not that good.
I tried training from scratch, specifying the correct sampling_rate of 44100. Reached 2000 epochs, but the inferred audio was way too fast, skipping words in the process.

What should I modify or patch in to make this work?

thanks!

@agonzalezd
Copy link

i suggest resampling your data to 22050 Hz. you can use ffmpeg to do so

@donlk
Copy link
Author

donlk commented Sep 18, 2024

I would abstain from that if possible, due to huge quality loss.

@Luke100000
Copy link

Make sure the samplerate is set correctly everywhere, not just training but also inference: https://github.com/search?q=repo%3Arhasspy%2Fpiper%2022050&type=code

Other than that my guess is that you would need to adapt the decoder parameters here: https://github.com/rhasspy/piper/blob/master/src/python/piper_train/vits/config.py#L30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants