Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work with numbers and numeric values #21

Open
pengpengtao opened this issue Feb 18, 2025 · 1 comment
Open

Work with numbers and numeric values #21

pengpengtao opened this issue Feb 18, 2025 · 1 comment

Comments

@pengpengtao
Copy link

Hello, how should I deal with the relationship between numerical values and numbers? In some quantitative units, such as money, date or time, numbers need to be read as numeric values, while some mobile phone tail numbers, ID numbers, and address numbers need to be read one by one. How should I handle this? I tested this script and found that they are all translated according to numerical reading.

@w11wo
Copy link
Collaborator

w11wo commented Feb 18, 2025

Hi @pengpengtao

For such cases, it is similar to how we need to pre-split letters if we want them to be pronounced one by one.

For example, if the digits are combined like

>> g2p("123")
[['s', 'ə', 'r', 'a', 't', 'u', 's'], ['d', 'u', 'ʔ', 'a'], ['p', 'u', 'l', 'u', 'h'], ['t', 'i', 'ɡ', 'a']]

This will treat it as "one hundred and twenty three". Such handling is done through this package, which is like an extension of num2words for Indonesian.

But when it comes to your case of requiring each number to be read separately, you have to pre-split them like

>> g2p("1 2 3")
[['s', 'a', 't', 'u'], ['d', 'u', 'ʔ', 'a'], ['t', 'i', 'ɡ', 'a']]

which correctly gives the phonemes of "one, two, three", since each digit is treated as an individual word.

Currently, it is not supported to automatically distinguish between the two cases other than a manual split, since it is hard to distinguish when & where to do which. Perhaps for phone numbers, you can pre-define its format with regex (e.g. Indonesian phone numbers are +62xxx / 08xxxx), but I can't think of how you can do the same for ID numbers.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants