Skip to content

This is the repository for the homework assignment in Datenanalyse 2. I'm analysing data from various sources such as Spotify, Keywordtool and the Youtube Data API to predict music streams.

Notifications You must be signed in to change notification settings

gerwolf/DA2_Homework

Repository files navigation

Data analysis II - Homework Assignment (Humboldt University Berlin)

A multivariate data analysis of music streaming

Spotify, Music, Statistics, Streaming, Popularity

This project analyses musictheoretical variables, "audio features" (Jehan, 2005) for unsupervised learning. Artist specific information is additionally used to predict artist popularity (0,100) and whether the artist/song was a "chartbreaker" in the German Spotify market (Feb 2019 - Mar 2020) or not. The code accompanying the paper can be found in "Notebook_Submission.Rmd". A rendered version of the code can be found on RPubs. Feel free to reach out to me if you have questions!

Fun fact: the three most streamed tracks in Germany in this period were

  1. Dance Monkey by Tones and I (107 million streams)
  2. Roller by Apache 207 (80 million streams)
  3. bad guy by Billie Eilish (77 million streams)

Statistical methods used:

  • Correlation coefficients
  • Tests of independence
  • Nonlinear transformations
  • Principal Component Analysis
  • Exploratory Factor Analysis
  • Cluster Analysis
  • Regression Analysis
  • Decision trees and neural networks
  • Cross validation

Data sources:

Programming languages:

  • Python / Jupyter
  • R / Markdown
  • Stata
  • LaTex

About

This is the repository for the homework assignment in Datenanalyse 2. I'm analysing data from various sources such as Spotify, Keywordtool and the Youtube Data API to predict music streams.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages