-
-
Notifications
You must be signed in to change notification settings - Fork 1
Terminology
Caleb Bassi edited this page May 2, 2020
·
23 revisions
- action: The callback that gets executed when a transcript matches a certain rule.
- CCR: Continuous Command Recognition.
- choice: A grammar element and placeholder that can match one of several predefined words.
- command: A pairing of a rule and an action.
- grammar: A nested tree of grammar elements that a rule is compiled to. Also used to refer to the collection of all available rules that a transcript can be matched against.
- grammar complexity: A measure of the complexity of a grammar based on the number of rules and how complex the patterns are.
- grammar element:
- keyword: A word literal that is specified in a rule.
- match object: A result passed to an action with information based on the rule that was matched and the transcript.
- modes
- command mode: When dictating commands. Used for general computer usage and programming. This is the default mode.
- speech mode: When dictating natural language like words, phrases, and sentences.
- placeholder: A grammar element that acts as a variable for certain words. The value of a placeholder is added to the match object.
- Osprey script: A Python file that includes user-specified commands and is loaded by Osprey at runtime.
- rule: A pattern of words and placeholders that a transcript is matched against that maps to an action.
- voice typing: Using your voice as a form of computer input.
- ASR: Automatic Speech Recognition
- decoding/transcribing: The process of converting audio to text.
- online decoding: Streaming audio in chunks to be decoded in real time with multiple intermediate results and one final result.
- offline decoding: Decoding one chunk of audio and getting one result. If using offline decoding to transcribe a microphone stream, you have to use a VAD to segment the audio and then decode each segment.
- enrollment/training: When an individual speaker reads text or vocabulary to a speech recognition system to fine-tune the system for that individual.
- speaker dependent: Systems that use enrollment/training.
- speaker independent: Systems that do not use enrollment/training.
- model:
- language model:
- RTF: Real Time Factor. Measures how quickly a speech recognition engine is able to return results.
- SOTA: State Of The Art
- speech recognition engine: Transcribes speech from some given audio based on a given model.
- STT: Speech To Text
- transcript: A sequence of words that a speech recognition engine generates based on some audio.
- VAD: Voice Activity Detection
- vocabulary: The set of words that a speech recognition engine can transcribe with a given model.
- WER: Word Error Rate. The frequency of incorrectly transcribed words by a speech recognition engine with a given model. A measurement of accuracy.