Coder Social home page Coder Social logo

calbergs / spotify-api Goto Github PK

View Code? Open in Web Editor NEW
29.0 4.0 0.0 3.18 MB

Pipeline that extracts data from the Spotify API to build a more detailed version of Spotify Wrapped

Python 100.00%
airflow dbt python data-engineering metabase postgresql docker

spotify-api's Introduction

Spotify Data Pipeline

Data pipeline that extracts a user's song listening history from the Spotify API using Python, PostgreSQL, dbt, Metabase, Airflow, and Docker

Objective

Deep dive into a user's song listening history to retrieve information about top artists, top tracks, top genres, and more. This is a personal side project for fun to recreate Spotify Wrapped but at a more frequent cadence to get quicker and more detailed insights. This pipeline calls the Spotify API every hour from hours 0-6 and 14-23 UTC (basically whenever I'm awake) to extract a user's song listening history, load the responses into a database, apply transformations and visualize the metrics in a dashboard. Since the dataset is small and this doesn't need to be running 24/7 this is all built using open source tools and hosted locally to avoid any cost.

Tools & Technologies

Architecture

spotify drawio

Data Flow

  1. main.py script is triggered every hour (from hours 0-6 and 14-23 UTC) via Airflow to refresh the access token, make a connection to the Postgres database to check for the latest listened time, and call the Spotify API to retrieve the most recently played songs and corresponding genres.
  2. Responses are saved as CSV files in 'YYYY-MM-DD.csv' format. These are saved on the local file system and act as our replayable source since the Spotify API only allows requesting the 50 most recently played songs and not any historical data. These files will keep getting appended with the most recently played songs for the respective date.
  3. Data is copied into the Postgres Database into the respective tables, spotify_songs and spotify_genres.
  4. dbt run task is triggered to run transformations on top of the staging data to produce analytical and reporting tables/views.
  5. dbt test will run after successful completion of dbt run to ensure all tests pass.
  6. Tables/views are fed into Metabase and the metrics are visualized through a dashboard.
  7. Slack subscription is set up in Metabase to send a weekly summary every Monday.

Throughout this entire process if any Airflow task fails an automatic Slack alert will be sent to a custom Slack channel that was created.

DAG

Screenshot 2023-01-05 at 9 32 42 PM

Sample Slack Alert

Screenshot 2023-01-05 at 9 33 09 PM

Dashboard

Screenshot 2023-01-31 at 12 02 56 PM

Screenshot 2023-01-31 at 1 20 51 PM

Screenshot 2023-01-24 at 10 18 42 PM

Screenshot 2023-01-31 at 12 03 24 PM

Screenshot 2023-01-31 at 12 03 36 PM

Setup

  1. Get Spotify API Access
  2. Build Docker Containers for Airflow
  3. Set Up Airflow Connection to Postgres
  4. Install dbt Core
  5. Enable Airflow Slack Notifications
  6. Install Metabase

Further Improvements (Work In Progress)

  • Create a BranchPythonOperator to first check if the API payload is empty. If empty then proceed directly to the end task else continue to the downstream tasks.
  • Implement data quality checks to catch any potential errors in the dataset
  • Create unit tests to ensure pipeline is running as intended
  • Include CI/CD
  • Create more visualizations to uncover further insights once Spotify sends back my entire songs listening history from 10+ years back to the current date (this needed to be requested separately since the current API only allows requesting the 50 most recently played tracks)
  • If and whenever Spotify allows requesting historical data implement backfill capability

spotify-api's People

Contributors

calbergs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.