Automated Playlist Generation

Used MusAV dataset for playlist generation with audio features and embeddings.

Quick Start

Feature Extraction

Setup

The following parameters need to be set in the config.py file -
- DATA_PATH - Root Path of the folder with all audio files. Current configuration supports MP3 files with 44,100 Hz sampling rate.
MusAV dataset access can be requested here.
Set up the virtual environment with requirements file.
Install essentia and essentia-tensorflow using pip. Unless you're on Mac, then use the wheels here

Playlist Generators

Usage

Run the audio descriptors playlist app using -

streamlit run playlist_descriptor.py

Run the track similarity playlist app using -

streamlit run playlist_embeddings.py

Documentation

Audio Content based Playlists

Introduction

In this project, I've created scripts for analyzing and previewing music in any given collection -

main.py - Extract audio features and embeddings of 44.1kHz MP3 files from any custom music collection
collection_overview.ipynb - Compute analysis statistics for the music collection
playlist_descriptor.py - Create music playlists filtered by descriptors such as key, tempo, music style, danceability, voice vs instrumental, and arousal-valence
playlist_embeddings.py - Generate music playlists based on queries by track example, finding tracks similar to a given query track.

Audio Analysis with Essentia

Objective: Developed a standalone Python script to analyze an entire audio collection (MusAV for this assignment) and extract specific descriptors for each track using Essentia.

Descriptors Extraction:

Tempo (BPM) - Used TempoCNN with deepsquare-k16 square filters. Based on the referenced paper, deepsquare performs best for tempo detection on 3 out of 4 datasets. Additional computation cost is okay since we store results.
Key - KeyExtractor with temperley, krumhansl, edma profiles
Loudness - LoudnessEBUR128 for integrated loudness in LUFS
Embeddings - Discogs-Effnet and MSD-MusiCNN models
Music styles - Discogs-Effnet with activations for 400 music styles
Voice/instrumental classifier - Model based on Discogs-Effnet embeddings
Danceability - Classifier model with Discogs-Effnet embeddings
Arousal and valence (music emotion) - emoMusic pretrained model with MSD-MusiCNN embeddings

Loading Audio:

Load audio with AudioLoader - 44.1 kHz stereo for Loudness
Downmix to mon with MonoMixer - 44.1 kHz mono for Key
Resample 44.1 kHz mono to 11,025 Hz mono for TempoCNN
Resample 44.1 kHz mono to 16kHz mono for Embeddings

Using ML Models:

Instantiated necessary algorithms once.
Computed required embeddings once for each track.

Script Design:

Script runnable on any music collection of 44.1kHz MP3s with any size and nested folder structure.
DATA_PATH can be set in config.py file
Five separate pickle files in features folder for storing - analysis results, two sets of embeddings, genre activations and file paths
Error handling implemented to skip files with analysis errors
Added a tqdm progress bar

Music Collection Overview

Objective: Generate a statistical report from analyzed audio files to explore the music collection.

Report Commentary:

Single music style chosen per track, the one with the maximum activation
Comment on the collection's diversity in terms of music styles, tempos, tonality, emotion, etc. -
- Music Style - Classes are heavily unbalanced in favour of popular styles Rock, Electronic, Hip Hop. But it seems reasonable based on the average consumption pattern of the population.
- Tempos - Seems to have 2 peaks, one around 90 and another around 125. Distribution seems to reflect popular music patterns.
- Tonality - C, G, Am, F seem to be the most popular keys in Krumhansl, again reflecting popular music.
- Emotion - Positive valence and high arousal values, as expected with popular music.
Comment on differences in key and scale estimation using the three profiles (temperley, krumhansl, edma). What is the % of tracks on which all three estimations agree? - All three estimations agree on around 49% of tracks
If we had to select only one profile to present to the users, which one should we use? - krumhansl based on the analysis in my statistical report
Comment on loudness. Does its distribution make sense? - Yeah it seems reasonable, detailed explanation in the statistical report.

Playlists Based on Descriptor Queries

Objective: Create a simple UI for playlist generation based on audio analysis results using Streamlit.

Extend the provided Streamlit code to implement search by various analysis results such as tempo, voice/instrumental classification, danceability, arousal/valence, and key/scale.
- Track is picked if the activation in any selected style is within the range, unlike original code where activation was checked in all selected styles.
- Range selection filters for tempo, danceability, arousal and valence
- Multiselect for key, scale and vocals, to allow maximum flexibility
krumhansl key/scale estimation profile picked for UI

Observations and Issues:

Style - Activation ranges had to be set wide to get one song for some styles. Was helpful to have the style summary in picking threshold.
Tempo - Worked well
Voice Instrumental - distribution has some songs in the middle, not towards either extremities. Precision recall tradeoff in picking activation threshold that performs well.
Danceability - skewed distribution, too many values close to 1, didn't seem to be meaningful
Arousal Valence - Found the predictions more reasonable
Key - Worked well

One extension could be to add category sorting too, instead of just filtering.

Playlists Based on Track Similarity

Objective:

Allow users to select a query track and generate two lists of the 10 most similar tracks using effnet-discogs and msd-musicnn embeddings.
Compute music similarity using cosine distance between the query and other tracks' embeddings.
Embed music players to listen and compare results.

Observations:

Discogs Embedding seemed to be better at giving results from the same genre and with similar style.
MSD Embedding seemed to be capturing certain elements of the song, and would give results that had some similarity but not necessarily from the same genre.
For certain styles, both the embeddings produced similar results.
I would pick Discogs Embedding for building a similarity ranking, it seemed more robust.
For a recommender system, if I wanted it to have more surprise element, I could probably use MSD Embeddings.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
features		features
metadata		metadata
playlists		playlists
resources/images		resources/images
results		results
weights		weights
.gitignore		.gitignore
README.md		README.md
audio.py		audio.py
collection_overview.ipynb		collection_overview.ipynb
config.py		config.py
features.py		features.py
fileio.py		fileio.py
main.py		main.py
playlist_descriptor.py		playlist_descriptor.py
playlist_embeddings.py		playlist_embeddings.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Playlist Generation

Quick Start

Feature Extraction

Setup

Playlist Generators

Usage

Documentation

Audio Content based Playlists

Introduction

Audio Analysis with Essentia

Music Collection Overview

Playlists Based on Descriptor Queries

Playlists Based on Track Similarity

About

Releases

Packages

Languages

dhunstack/essentia-playlists-generation

Folders and files

Latest commit

History

Repository files navigation

Automated Playlist Generation

Quick Start

Feature Extraction

Setup

Playlist Generators

Usage

Documentation

Audio Content based Playlists

Introduction

Audio Analysis with Essentia

Music Collection Overview

Playlists Based on Descriptor Queries

Playlists Based on Track Similarity

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages