Multispeaker WER #34

mpariente · 2020-11-17T18:32:40Z

Hi, thanks a bunch for this tool !

When working with speech mixtures, WER can take into account that words from each speaker might be picked up.
There is a description of the method here: https://my.fit.edu/~vkepuska/ece5527/sctk-2.3-rc1/doc/asclite.html

Would you be willing to integrate this feature in Jiwer?

nikvaessen · 2020-11-18T09:03:50Z

I think there are two ways of implementing this:

we need a either wrap around asclite which will require shipping its binary for every platform
or write a custom dynamic programming solution, which would be most likely be very slow if implemented in python, or difficult if it needs to be written in C (I don't have much if any experience in writing C and integrating it into a python application).

How would you use this feature? Are there many speech datasets which have this problem?

mpariente · 2020-11-18T12:44:07Z

Thanks for your answer.

How would you use this feature? Are there many speech datasets which have this problem?

All datasets that include overlapping speech have this problem. Few examples: Chime5-6, AMI, wsj0-mix, Librimix. In order to evaluate speech separation algorithm, this seems to be needed.

I'd go with solution 1.
I personally wouldn't ship the binaries but link to the installation instructions. This would be an optional feature of jiwer, and the user would need to make an extra step to benefit from it. WDYT?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multispeaker WER #34

Multispeaker WER #34

mpariente commented Nov 17, 2020

nikvaessen commented Nov 18, 2020

mpariente commented Nov 18, 2020

Multispeaker WER #34

Multispeaker WER #34

Comments

mpariente commented Nov 17, 2020

nikvaessen commented Nov 18, 2020

mpariente commented Nov 18, 2020