-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any intention to make the use of embedded speech to text feature? #3
Comments
Here is a sample code provided by Microsoft |
You mean exposing embedded speech recogition models as SAPI 5 speech recognition (SR) engines? Actually I'm interested in that too. I extracted the key for the recognition models, and did some experiments on my system to prove that it does work. However, implementing a custom SAPI SR engine is more difficult than implementing a custom SAPI TTS engine. ISpTTSEngine only requires implementing two methods: Speak and GetOutputFormat, while ISpSREngine plus ISpSREngine2 requires a lot more. SAPI SR engines usually support not only dictation (speech to any text), but also recognizing voice commands defined by grammars. Embedded speech does support recognizing voice commands via intent recognition, which I suspect is what the new Voice Access feature on Windows 11 is based on. However, the grammar system in SAPI seems more complex and flexible than what IntentRecognizer can provide, which means that translating from SAPI grammars to IntentRecognizer patterns can be difficult, or sometimes impossible without losing some information. This might be easier if I could just implement the dictation part. However, support for voice command recognition is required if you want to use the SR engine with the built-in Speech Recognition feature in Windows. It can do dictation, but only when what the user says does not match any of the supported voice commands. Now I wonder, how many apps are actually utilizing SAPI speech recognition engines? |
Whoaa, that seems more complex that implementing tts, but is there a way to bypass entering the api key to access models? I know where the stt model's files are stored so it might be possible to load them directly from there 🤔 And then just try to replicate the grammar code from their code and get the job done. Anyway, thanks for the insights 🙂. |
I checked the documentation about pattern matching in intent recognition again.
So the pattern matching is completely offline? If the so-called "pattern matching" is just matching the recognized text, then I don't need to translate SAPI grammars to intent recognition patterns at all. I can use, for example, regular expressions to match the text. Anyway, if you want to access the speech recognition models installed on your system, you can use the extracted keys in my source file. Keep in mind that the keys are not guaranteed to work forever. |
Oh thanks for the keys, by the way I used the keys to run the models which came with Microsoft, and it seems that they are using different models here. Any method to download those speech recognition model provided by embedded speech? I kind of want them so badly for my project 😅 I have been searching for months now but no success. |
The keys are for the models that will be installed when you go to Windows Settings > Time & language > Language & region, open the language option for a language, then choose to install "Enhanced speech recognition". (By the way, the "Basic speech recognition" is for installing the older SAPI 5 speech recognition engines.) You can get the paths for the installed models by using the following PowerShell: Get-AppxPackage -Name MicrosoftWindows.Speech.* | Select-Object Name, InstallLocation Or by code, using WinRT API PackageManager.FindPackagesForUser to get a list of all installed packages, then find the packages whose ID starts with If you want a download link, you can find your installed "Speech Packs" in Microsoft Store's library. Then you can copy their Microsoft Store links. For example, here's the link to the English (US) speech pack. ("Speech Packs" are for speech recognition, not for Narrator's natural voices.) If you want a more direct way to download, you can utilize something like store.rg-adguard.net to get direct download links to download the msix files, without requiring the Microsoft Store app or a Microsoft account. The downloaded msix files can be extracted to a folder just like a zip file, and then you should be able to use the folder as the model path. Finally, if you want to know the official way to get the offline models, see this documentation about embedded speech. It requires you to submit an application form, and if you are eligible, you can get the model files with your own keys. |
Great 😃 thanks buddy 🙏, appreciate your help |
Microsoft Speech SDK also consists of the speech to text feature which is very good and has a very low WER(word error rate). You can use it via pressing win+h on your keyboard and a streaming speech to text feature will pop up on your screen.
It even works offline
You can find some help here
The text was updated successfully, but these errors were encountered: