Filter out bad stitched samples by checking the speech to text transcription #72

ljj7975 · 2021-04-07T02:46:00Z

When I checked the initial stitched datasets, I have noticed that the quality is not that good.
Therefore, I have decided to do a second pass on the generated samples, making sure that the script select only the sample of which sphinx keyword detection can find every vocab word ("hey", "fire", "fox")
This should improve the quality of the stitched datasets

related to #59

edwinzhng

LGTM

ljj7975 added 2 commits April 6, 2021 21:42

sphinx keyword detector

d4ea55c

support audio sample verification with keyword detection

153a707

ljj7975 requested review from daemon and edwinzhng April 7, 2021 02:46

fix incorrect name

7387b22

edwinzhng approved these changes Apr 9, 2021

View reviewed changes

ljj7975 merged commit be80330 into master Apr 9, 2021

ljj7975 deleted the keyword_detection branch April 9, 2021 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter out bad stitched samples by checking the speech to text transcription #72

Filter out bad stitched samples by checking the speech to text transcription #72

ljj7975 commented Apr 7, 2021 •

edited

Loading

edwinzhng left a comment

Filter out bad stitched samples by checking the speech to text transcription #72

Filter out bad stitched samples by checking the speech to text transcription #72

Conversation

ljj7975 commented Apr 7, 2021 • edited Loading

edwinzhng left a comment

Choose a reason for hiding this comment

ljj7975 commented Apr 7, 2021 •

edited

Loading