Work with numbers and numeric values #21

pengpengtao · 2025-02-18T01:40:01Z

Hello, how should I deal with the relationship between numerical values and numbers? In some quantitative units, such as money, date or time, numbers need to be read as numeric values, while some mobile phone tail numbers, ID numbers, and address numbers need to be read one by one. How should I handle this? I tested this script and found that they are all translated according to numerical reading.

w11wo · 2025-02-18T02:46:58Z

Hi @pengpengtao

For such cases, it is similar to how we need to pre-split letters if we want them to be pronounced one by one.

For example, if the digits are combined like

>> g2p("123")
[['s', 'ə', 'r', 'a', 't', 'u', 's'], ['d', 'u', 'ʔ', 'a'], ['p', 'u', 'l', 'u', 'h'], ['t', 'i', 'ɡ', 'a']]

This will treat it as "one hundred and twenty three". Such handling is done through this package, which is like an extension of num2words for Indonesian.

But when it comes to your case of requiring each number to be read separately, you have to pre-split them like

>> g2p("1 2 3")
[['s', 'a', 't', 'u'], ['d', 'u', 'ʔ', 'a'], ['t', 'i', 'ɡ', 'a']]

which correctly gives the phonemes of "one, two, three", since each digit is treated as an individual word.

Currently, it is not supported to automatically distinguish between the two cases other than a manual split, since it is hard to distinguish when & where to do which. Perhaps for phone numbers, you can pre-define its format with regex (e.g. Indonesian phone numbers are +62xxx / 08xxxx), but I can't think of how you can do the same for ID numbers.

Hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work with numbers and numeric values #21

Work with numbers and numeric values #21

pengpengtao commented Feb 18, 2025

w11wo commented Feb 18, 2025

Work with numbers and numeric values #21

Work with numbers and numeric values #21

Comments

pengpengtao commented Feb 18, 2025

w11wo commented Feb 18, 2025