data-modeling-sql's Introduction

Project: Data Modeling with Postgres

1. Discuss the purpose of this database in the context of the startup, Sparkify, and their analytical goals.

This database is about storing music and artist records. Data is extracted from the source, then transformed using Pandas DataFrame, and finally loaded into the database. There are two sets of data used in the ETL process: song and log data. Song data provides song and artist information, while log data is provides covers song, artist and some metadata of each song.

Source data stands into json files

2. State and justify your database schema design and ETL pipeline.

Fact Table:

songplays - records in log data associated with song plays i.e. records with page NextSong

songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent

Dimension Tables:

users - users in the app

user_id, first_name, last_name, gender, level

songs - songs in music database

song_id, title, artist_id, year, duration

artists - artists in music database

artist_id, name, location, lattitude, longitude

time - timestamps of records in songplays broken down into specific units

start_time, hour, day, week, month, year, weekday

How to run the program

Execute "create_tables.py".

Execute "etl.py".

Introduction of the files

test.ipynb displays the first rows of each table to check database.

create_tables.py drops and creates your tables.

etl.ipynb reads and processes a single file from song_data and log_data and loads the data into your tables. Thi

etl.py reads and processes files from song_data and log_data and loads them into your tables.

sql_queries.py contains all sql queries, and is imported into the last three files above.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

duongquangduc / data-modeling-sql Goto Github PK