-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
complex hotwords support #Current Model Limitations Discussion #18
Comments
+1 So would like to have |
Sorry for the delayed response, the model was currently trained on single words , however its should work in simple phrases like Hey xxx though . Moreover the current model was trained on 1 sec audio clippings , so bizare behviour might occur on trying to process audio clippings greater in length than 1 sec The model was trained using Euclidean distance hence works on the same during inference time too Coming to increasing hotword length, hotwords are usually small , may be we can extend the processing window to 1.5 sec , but 2 sec I am not really sure . Can you give a few examples where a hotword could be greater than 1.5 secs? Kindly give you additional model suggestions in discussions page #3 Join the same channel and put forward you queries there , planning to create faster , more performant version of current implimentation soon, your suggestions will be helpful |
Thanks for the Information. |
sorry for the delay , didn't have time to clean the repository which held the training code , the same is built using keras https://github.com/Ant-Brain/wakeword_dataset_generator . It has both the training code and dataset generator code |
Hey, thanks for this repo. I can not find your training code here https://github.com/Ant-Brain/wakeword_dataset_generator . is it available in any other repo? |
Extremely sorry for the delay, my bad forgot to add the notebook which contained the training code |
Currently working on a newer model with better perfomance and higher hotword length, will be available in a month's time |
UpdateA newer model with better resilience to noise, 1.5 secs window support has been added to the flow . kindly check it out!! |
Hi,
Thanks for your helpful research. I wonder if the current model can handle complex hot words like "Hey Siri" or just handle one word, like "Siri"?
My second question is about hot words that their pronunciation takes more than 1s, like"Hey XXXX." Does your model support changing the recording time?
Did you try to use cosine_similarity instead of Euclidian distance in inference time?
Thanks.
The text was updated successfully, but these errors were encountered: