Coder Social home page Coder Social logo

antoniavillarino / spotify Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 17.02 MB

Solo project for Data Analytics CodeOp Course. I analysed the sound-related features of Spotify songs, built a model to predict the popularity of a song and made an artist recommender system.

Jupyter Notebook 100.00%
spotify recommender-system

spotify's Introduction

The data behind Spotify music

Spotify is an audio streaming service with more than 70 million tracks and with over 356 million active users every month. With these numbers, Data Analysis is a must! The Spotify databases are enriched with lots of features: popularity, danceability, key... You can see them for yourself in its API.

For my analysis, I used this dataset collected with this API by a Kaggle user, Yamac Eren Ay. There are two main datasets: one for songs and another one for artists (the other datasets are derived from the first ones, by aggregation techniques). The songs dataset contains around 600k songs, released between 1922 and 2020.

  • See the slides of the final presentation.

1. Analysis

How has music taste changed over time?

First, I wanted to see how the Spotify features have changed over the years. To do that, I aggregated the songs by year and took the mean or the mode of each feature (see the code in Spotify_analysis.ipynb, or in this nbviewer link to see the interactive plots)

These are some of the resulting plots:

Mean loudness over the years

The songs are becoming:

  • longer
  • louder
  • and with faster tempos

Mean valence over the years

The valence of the songs (positiveness, see the Spotify API):

  • had more variance before the ’50s
  • peaked in the ’80s
  • went down until recent years

What makes a song popular?

When making the previous time series analysis, I found a surprising fact: the mean popularity, grows over time. In other words, more recent songs are more popular. How can that be? My intuition is that there are lots of old, popular songs... How is Spotify defining "popularity"?

Checking in its API, we find that popularity is based on the total number of plays the song has, and how recent those plays are. So, songs that are being played a lot now are more popular than songs that were played a lot in the past.

With this in mind, I aggregated the songs dataset by popularity, and made bivariate plots to see how the other features were related to it (you can also see the code in Spotify_analysis.ipynb, or in this nbviewer link to see the interactive plots):

Popularity vs. mean loudness

Popularity vs. danceability

  • Recent songs are more popular.
  • Popular songs are more danceable, loud, and not very long (3.5 mins).
  • Live concerts and acoustic songs are less popular.

2. Can popularity be predicted?

As some of the characteristics of the songs are correlated with popularity. Can I use this to predict the popularity of a song? (see the code in the predict_popularity.ipynb notebook)

Popularity, as defined by Spotify, goes from 0 to 100. To have a more manageable problem, I reshaped the popularity feature into 3 categories, based on its distribution: low (<20), medium (20 to 70) and high popularity (>70).

I tried different models and ended using XGBoost, with the following results:

confusion_matrix.png

The importance the model gives to each feature is also very interesting:

feature_importance.png

  • The release year is the most important feature to determine popularity.
  • Instrumentalness, speechiness, liveness and acousticness have a negative impact on the popularity of a song (as we saw in previous plots).
  • Explicitness and valence (positiveness) have a positive impact on popularity (which is consistent with the previous EDA)

3. Building a recommender system

My next step was to build an artist recommender system: when the user searches for an artist, which others should I recommend?

To do that, I used a basic content-based method, cosine similarity (see the code in recommender_system.ipynb).

This method assumes each row (each artist) is a vector with multiple components (the features). All the artists form a vector space, and an "angle" similarity can be obtained by computing the scalar product of two vectors:

Cosine similarity

In addition to the sound-related features, I encoded the genres of each artist. It turns out that Spotify uses a gazillion different genres, so I ended with a very sparse matrix of 3245 columns.

Some of the results I obtained:

AC/DC similar artists: Rose Tattoo, Stevie Wright

Beyoncé similar artists: Normani, Selena

4. Next

  • Gather data with more relevant features from AcousticBrainz
  • Using a “superstar” variable to predict popularity: Predictability of success
  • Use Spotify users’ data to build a collaborative filtering model for the recommender system

spotify's People

Contributors

antoniavillarino avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.