Home

How it works

The Feature extractor

The Feature extractor is responsible for turning PCM encoded audio data into 8bit mel-spectrogram features. You might question yourself why the feature extractor is separate and why it uses only 8bits. Some applications like verifying if a hotword is issued by a certain speaker require two models running the same audio data. Having the feature extractor as separate entity saves the duplicate computation of the mel-features. Secondly, it can be a convenient way of compressing and transmitting data. One second of audio contains 40X98 mel-features. You can capture your audio on a lightweight system (like ESP32) and transmit the features to a more powerful system. This only requires 40X98X8bit = 3kbit per second. Using 8bit looses almost no audible information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

How it works

The Feature extractor

Clone this wiki locally