Machine Hearing, or Machine Listening, is the use of Machine Learning and audio sensors to derive meaningful information from sound. This include listening for and diagnosing problems in machinery, understanding events and activities that cause noise, and estimation of how humans perceive certain sounds.
Here you can find some notes on the topic compiled by Jon Nordby.
This research is sponsored by Soundsensing, a provider of IoT audio sensors with built-in Machine Learning, used for Noise Monitoring and Condition Monitoring. The sensors are ideal for continious monitoring of audible noises and events, and can perform tasks such as Audio Classification, Audio Event Detection and Acoustic Anomaly Detection. Their sensors can transmit compressed and privacy-preserving spectrograms, allowing Machine Learning to be done in the cloud using familiar tools like Python. Or models can be deployed onto the sensor itself, for a highly efficient on-edge ML solution.
Some information is found in sub-pages.
July 26, 2021. Presented at EuroPython 2021. Video recording, slides, notes.
June 7, 2021. Presented at tinyML EMEA Technical Forum 2021. Video recording coming, slides, notes.
March 25, 2021. Video recording, slides, notes.
At KnowIt Oslo, 2020. Video recording, slides, notes
Master thesis. Report and code available in the Github repository.
Presentation at EuroPython2019. Video recording, notes
Presentation at PyCode Conference 2019 in Gdansk. Slides, notes
Video recording. Coming, maybe in November.
Presentation at SenseCamp 2019 hosted by FORCE Technology Senselab. Slides: web, .PDF
Report and lecture at NMBU Data Science.
With example code in Python
- Loading Youtube audio data with youtube-dl and librosa
- Extracting fixed-size analysis windows from audio
- Classifying an audio clip of many analysis windows using Keras Timedistributed and GlobalAveragePooling
- Classifying an audio clip by voting over analysis windows. Mean/majority voting.
- Annotating/labeling audio data using Audacity
- Preprocessing audio into mel-spectrograms
- Multi-core preprocessing of audio files using joblib
- Compute MFCC or mel-spectrogram from existing STFT spectrograms
- Converting mel-spectrograms into PNG images
- Converting mel-spectrogram or MFCC back to audio waveform using librosa
Rough notes on various topics.
- Applications. Practical applications of Machine Hearing
- Tasks. Established problem formulations
- Audio Quality. Metrics for measuring audio quality
- Explainable models for Audio.
- Features. Feature representations
- Preprocessing. Preprocessing techniques
- DCASE2018. Notes from DCASE2018 challenge and conference
- Commercial solutions. Companies and products in Machine Hearing
- Speech. Speech-specific techniques and applications
- Music. Music-specific techniques and applications
- Compressive Sensing.
Useful resources to learn more.
- Audio Event Detection w/Deep Learning. By Robert Coop, Ph.D, Head of AI and ML @ Stanley B&D. From Data Science Connect, 2028.
- Computational Analysis of Sound Scenes and Events. Tuomas Virtanen, Mark D. Plumbley, Dan Ellis. 2018.
- Human and Machine Hearing - Extracting Meaning from Sound. Richard F. Lyon. 2017, revised 2018.
- An Introduction to Audio Content Analysis - Applications in Signal Processing and Music Informatics. Alexander Lerch. 2012. Companion website: https://www.audiocontentanalysis.org/
- Machine Learning for Audio, Image and Video Analysis: Theory and Applications (Advanced Information and Knowledge Processing). Francesco Camastra, 3 sections. From Perception to Computation, Machine Learning, Applications.
- CSC 83060: Speech and Audio Understanding. http://mr-pc.org/t/csc83060/ Brooklyn College (CUNY).
- Deep Learning (for Audio) with Python by Valerio Velardo
- PyTorch for Audio + Music Processing by Valerio Velardo
Feature extraction
- librosa. The go-to Python module.
- essentia. C++ library, with Python bindings. Lots of Music Analysis extractors. Used by FreeSound and Acousticbrainz.
- kapre. On-demand GPU computation of melspectrograms, for Keras
- torchaudio. Audio processing in PyTorch
Data Augmentation
- muda: Python library for augmenting annotated audio data
- audiomentations.
- scaper. Soundscape synthesis tool with automatic label handling.
- Audio Classification. http://www.cs.tut.fi/~sgn24006/PDF/L04-audio-classification.pdf Covers low-level features, MFCC. Classification by distance metrics. GMM. HMM.
- Speech Signal Analysis, Lecture 2. January 2017, Hiroshi Shimodaira and Steve Renals. ! great diagrams of audio discretization, mel filters, wide versus narrow-band spectrograms.
- Kaggle Whale detection
- Kaggle FreeSound tagging 2018
- Kaggle FreeSound
- DCASE2014
- DCASE2018
- DCASE2019
- DCASE2020
- DCASE2021
- https://mircommunity.slack.com/ - Music Information Retrieval
- The Sound of AI, Slack Community
- Awesome Deep Learning Music
- Fast.ai forums: Deep Learning with Audio. Large lists of resources, both in first post and "popular links". Feb 2019, 315 replies over 4 months.