-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't support Chinese? #52
Comments
Hi, I made this example script, with references and predictions taken from https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn from jiwer import wer, cer
ground_truths = [
"宋朝末年年间定居粉岭围。",
"渐渐行动不便",
"二十一年去世。",
"他们自称恰哈拉。",
"局部干涩的例子包括有口干、眼睛干燥、及阴道干燥。",
"嘉靖三十八年,登进士第三甲第二名。",
"这一名称一直沿用至今。",
"同时乔凡尼还得到包税合同和许多明矾矿的经营权。",
"为了惩罚西扎城和塞尔柱的结盟,盟军在抵达后将外城烧毁。",
"河内盛产黄色无鱼鳞的鳍射鱼。",
]
hypothesis = [
"宋朝末年年间定居分定为",
"建境行动不片",
"二十一年去世",
"他们自称家哈",
"菊物干寺的例子包括有口肝眼睛干照以及阴到干",
"嘉靖三十八年登进士第三甲第二名",
"这一名称一直沿用是心",
"同时桥凡妮还得到包税合同和许多民繁矿的经营权",
"为了曾罚西扎城和塞尔素的节盟盟军在抵达后将外曾烧毁",
"合类生场环色无鱼林的骑射鱼",
]
wer_score = wer(truth=ground_truths, hypothesis=hypothesis)
print(f"{wer_score=}")
cer_score = cer(truth=ground_truths, hypothesis=hypothesis)
print(f"{cer_score=}") Which outputs:
What would the correct answer be? Does |
Thanks for your reply, but the wer_score=1.0, is mean totally wrong word, i do not think so, for example, ground_truths = 宋朝末年年间定居粉岭围, hypothesis = "宋朝末年年间定居分定为" only last 3 words is wrong, so the wer must be < 1.0. if you improve the result, i can test it again. thanks! |
Semantically, I assume that you need the "character" error rate instead of the word error rate, as I assume they should be equivalent for Chinese. Therefore, is the CER score of 0.3 correct? |
yes, CER is correct, i just focused on WER yesterday, because CER is confuse for chinese words. |
No description provided.
The text was updated successfully, but these errors were encountered: