Update tokenizer.py #60

ghost · 2024-07-18T08:09:15Z

Added proper tokenizer support for Hindi Language which would prevent crash while fine tuning Hindi language.

eginhard

Thanks a lot for the PR, this is great!

It includes some changes that are not related to Hindi because there are some small differences in the code between this fork and the original repo and it looks like you started from the original one. Could you revert these unrelated changes?

TTS/tts/layers/xtts/tokenizer.py

FIX:Tokenizer for Hindi Language

ghost

Changes fixed, pls check.

manash997 · 2024-07-22T07:31:24Z

Hi @akshatrocky i am getting this error, while trying to run tokenizer script from your branch:
tokenizer.py", line 806, in test_expand_numbers_multilingual
assert out == b, f"'{out}' vs '{b}'"
AssertionError: 'Через двенадцать целых пять десятых секунды.' vs 'Через двенадцать запятая пять секунды.'

ghost · 2024-07-22T07:39:52Z

Which language are you trying to generate @manash997 ?

(Edit) - How are you running the tokenizer script, from fine tuning XTTS?

manash997 · 2024-07-22T09:25:01Z

Which language are you trying to generate @manash997 ?

(Edit) - How are you running the tokenizer script, from fine tuning XTTS?

i just tried running it as a standalone python script. The changes for Hindi, look good to me. However wanted to test the entire script once.

ghost · 2024-07-22T09:34:40Z

Running this script(standalone), even without changes made by me also produces an error. The main use of tokenizer.py is for fine tuning XTTS. When we fine tune any language, the program gets the tokenizer from which it gets fine tuned. Which means this script is only used in fine tuning, as in my knowledge.

eginhard · 2024-07-25T14:25:51Z

Yes, the checks in this file are not actually run during any tests and were probably already broken for a while. Otherwise everything looks good. The author deleted their account, so I had to recreate the PR in #64.

Update tokenizer.py

283f3a1

Added proper tokenizer support for Hindi Language which would prevent crash while fine tuning Hindi language.

eginhard requested changes Jul 18, 2024

View reviewed changes

TTS/tts/layers/xtts/tokenizer.py Outdated Show resolved Hide resolved

Update tokenizer.py

0aa827a

FIX:Tokenizer for Hindi Language

ghost commented Jul 18, 2024

View reviewed changes

ghost requested a review from eginhard July 18, 2024 08:47

Akshat Bhardwaj added 2 commits July 18, 2024 23:07

Update README.md

6176cf0

Update README.md

0195454

ghost closed this Jul 25, 2024

eginhard mentioned this pull request Jul 25, 2024

feat(xtts): support hindi in tokenizer #64

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update tokenizer.py #60

Update tokenizer.py #60

ghost commented Jul 18, 2024

eginhard left a comment

ghost left a comment

manash997 commented Jul 22, 2024

ghost commented Jul 22, 2024 •

edited by ghost

Loading

manash997 commented Jul 22, 2024

ghost commented Jul 22, 2024

eginhard commented Jul 25, 2024

Update tokenizer.py #60

Update tokenizer.py #60

Conversation

ghost commented Jul 18, 2024

eginhard left a comment

Choose a reason for hiding this comment

ghost left a comment

Choose a reason for hiding this comment

manash997 commented Jul 22, 2024

ghost commented Jul 22, 2024 • edited by ghost Loading

manash997 commented Jul 22, 2024

ghost commented Jul 22, 2024

eginhard commented Jul 25, 2024

ghost commented Jul 22, 2024 •

edited by ghost

Loading