Angel Dhungana, Avram Twitchell
A collaborative for project for CS 6140 for Spring 2019. The objective is to perform data mining on some real data set, with the goal to gain in depth experience in some aspect of the class, in a setting where the instructor can give guidance. The student is meant to demonstrate a deep understanding of some aspect of data mining.
We intend to apply data mining on the Million Song dataset. Specifically, we plan to data mine the lyrics, musical characteristics (e.g. tempo), artist, year, and genres data.
In musical criticism, there are commonly accepted narratives as to how certain artists or genres influence each other. We want to see if there exists a quantifiable structure to these influences. We will do this by examining similarities in lyrics and musical characteristics such as tempo, to see if relationships exist between artists, genres, and the year, that coincides with these commonly accepted narratives.
This problem is interesting on a few different fronts, but we will be focusing on two.
First, this examination can potentially give insight on how human beings interact, collaborate, borrow, steal, or draw inspiration from each other.
Second, it may offer insights on the evolution of music throughout the years. Using the time data, we could potentially see how these things shift.
- Install requirements and create data directories by running
make setup
- Download the subset by running
make download_subset
- run
make agg-data
to move files to raw_data - Run clustering by using `python3 run.py run k [n...]