Skip to content
Neumair Günther edited this page Dec 9, 2018 · 6 revisions

How it works

The Feature extractor

The Feature extractor is responsible for turning PCM encoded audio data into 8bit mel-spectrogram features. You might question yourself why the feature extractor is separate and why it uses only 8bits. Some applications like verifying if a hotword is issued by a certain speaker require two models running the same audio data. Having the feature extractor as separate entity saves the duplicate computation of the mel-features. Secondly, it can be a convenient way of compressing and transmitting data. One second of audio contains 40X98 mel-features. You can capture your audio on a lightweight system (like ESP32) and transmit the features to a more powerful system. This only requires 40X98X8bit = 3kbit per second. Using 8bit looses almost no audible information.

Clone this wiki locally