Coder Social home page Coder Social logo

dataandmusic's Introduction

49 Years of Music - A Data Driven Study of Lyrics

Using a combination of web scraping, API calls, and Data Science to map out changes in lyrics from 1970 to 2018

Please read the accompanying Medium Article here.

There are two Jupyter notebooks in this repository. They are:

  • 49 Years of Music - Collection and Analysis.ipynb, which contains the business question, understanding, data preparation, modelling (part of), evaluation (part of), and deployment of the data (similar to the old school CRISP-DM model)
  • Aggression Classifier.ipynb, which houses the code required to build an ANN to detect aggression.

The links to the rest of the data you'll need are here:

Background

I love music. I really enjoy watching the changes and reboots and revamping and resampling of sounds over time. I also enjoy hearaing brand new sounds that haven't been used before as well as updates to classic songs crossed over with jarring beats from things like Dubstep and the like,

I also have a bias to the 90s. The power zone of my teen years, I tend to state, without evidence, that this was 'the best' time for music. Naturally many people disagree, and I find that their teens/early 20s tend to be the zones they hone in on. Music is a psychological and emotional experience. Most of us discover who we are during those years. So it would stand to reason that we identify often identify the songs that make up the soundtrack of that period as 'the best music.'

Nevertheless, I've decided to try to prove/disprove my point with data science. Well, data science combined with web scraping and a lot of API calls to genius.com and then a Bag of Words model and then some other things. I walk through all of it in this longish article. I'm hoping it helps you in your quest for knowledge whether it be data collection (I used requests and BeautifulSoup), data preparation (done with Spacey), and finally, an ANN with Keras to assess aggression. At the very least, if nothing else, makes for an enjoyable sidebar to your day.

The Data

The Seeds

Firstly, I went to billboard.com and looked for a representative list of all songs from a given year. I found that there's a top 100 list that goes back to 1970 (thus our article title) so I decided to use that as a baseline for grabbing song titles and artists to populate our initial list. www.bobborst.com has a bunch of the missing data so I collected from there as well.

The Lyrics

I then took the data that we got from billboard.com and bobborst.com ran API look ups at genius.com to extract the lyric information possible for each song. Some of titles/artists don't match up exactly, so we'll get some misses every year. But we still should be able to get some solid representative data across each year. I put the resulting combination of song titles, artists, and lyrics into a pandas dataframe.

The Data Preparation

I used SpaCy to prep the data and evaluate the characteristics of the data to ensure that any biases that were obvious could be detected.

The Analysis

I extracted total words and unique words per song to assess the increase in complexity. I used a profanity word list from freewebheaders.com to check for profanity in each song and then plotted the frequency of occurrences. Finally, I built an ANN in Keras that predicts the aggression in a song's lyrics.

Enjoy!

dataandmusic's People

Contributors

sharpie-007 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.