Coder Social home page Coder Social logo

data-modeling-sql's Introduction

Project: Data Modeling with Postgres

1. Discuss the purpose of this database in the context of the startup, Sparkify, and their analytical goals.

This database is about storing music and artist records. Data is extracted from the source, then transformed using Pandas DataFrame, and finally loaded into the database. There are two sets of data used in the ETL process: song and log data. Song data provides song and artist information, while log data is provides covers song, artist and some metadata of each song.

Source data stands into json files

2. State and justify your database schema design and ETL pipeline.

Fact Table:

songplays - records in log data associated with song plays i.e. records with page NextSong

songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent

Dimension Tables:

users - users in the app

user_id, first_name, last_name, gender, level

songs - songs in music database

song_id, title, artist_id, year, duration

artists - artists in music database

artist_id, name, location, lattitude, longitude

time - timestamps of records in songplays broken down into specific units

start_time, hour, day, week, month, year, weekday

How to run the program

Execute "create_tables.py".

Execute "etl.py".

Introduction of the files

test.ipynb displays the first rows of each table to check database.

create_tables.py drops and creates your tables.

etl.ipynb reads and processes a single file from song_data and log_data and loads the data into your tables. Thi

etl.py reads and processes files from song_data and log_data and loads them into your tables.

sql_queries.py contains all sql queries, and is imported into the last three files above.

data-modeling-sql's People

Contributors

duongquangduc avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.