A multivariate data analysis of music streaming
Spotify, Music, Statistics, Streaming, Popularity
This project analyses musictheoretical variables, "audio features" (Jehan, 2005) for unsupervised learning. Artist specific information is additionally used to predict artist popularity (0,100) and whether the artist/song was a "chartbreaker" in the German Spotify market (Feb 2019 - Mar 2020) or not. The code accompanying the paper can be found in "Notebook_Submission.Rmd". A rendered version of the code can be found on RPubs. Feel free to reach out to me if you have questions!
Fun fact: the three most streamed tracks in Germany in this period were
- Dance Monkey by Tones and I (107 million streams)
- Roller by Apache 207 (80 million streams)
- bad guy by Billie Eilish (77 million streams)
- Correlation coefficients
- Tests of independence
- Nonlinear transformations
- Principal Component Analysis
- Exploratory Factor Analysis
- Cluster Analysis
- Regression Analysis
- Decision trees and neural networks
- Cross validation
- Spotify Charts (Top 200, daily)
- Spotify Web API
- YouTube Data API
- Python / Jupyter
- R / Markdown
- Stata
- LaTex