Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out bad stitched samples by checking the speech to text transcription #72

Merged
merged 3 commits into from
Apr 9, 2021

Conversation

ljj7975
Copy link
Member

@ljj7975 ljj7975 commented Apr 7, 2021

When I checked the initial stitched datasets, I have noticed that the quality is not that good.
Therefore, I have decided to do a second pass on the generated samples, making sure that the script select only the sample of which sphinx keyword detection can find every vocab word ("hey", "fire", "fox")
This should improve the quality of the stitched datasets

related to #59

@ljj7975 ljj7975 requested review from daemon and edwinzhng April 7, 2021 02:46
Copy link
Member

@edwinzhng edwinzhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ljj7975 ljj7975 merged commit be80330 into master Apr 9, 2021
@ljj7975 ljj7975 deleted the keyword_detection branch April 9, 2021 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants