Coder Social home page Coder Social logo

data_modeling_with_apache_cassandra's Introduction

Data Modeling with Apache Cassandra

This is the second project of Udacity's Data Engineering Nanodegree๐ŸŽ“.
The purpose is to build an ETL pipeline transferring data from CSV to Apache Cassandra database using Python and CQL

Background

A startup called ๐ŸŽตSparkify wants to analyze the data they've been collecting on songs and user activity on their new music streaming app.
The analytics team is particularly interested in understanding what songs users are listening to.
Currently, there is no easy way to query the data to generate the results, since the data reside in a directory of CSV files on user activity on the app.

File Description

  • Project_1B_ Project_Template.ipynb reads and a single file from event_data/song_data and collect all data into the new csv file named
  • event_datafile_new.csv is created after execution of the above file. This has all data from event_data/song_data.

ETL Pipeline

  • Database Schema Diagram
    song_play

    column datatype primary key
    session_id int partition key
    item_in_session int clustering key
    artist text
    song_title text
    song_length float

    user_playhistory

    column datatype primary key
    user_id int partition key 1
    session_id int partition key 2
    item_in_session int clustering key
    artist text
    song_title text
    first_name text
    last_name text

    song_userlist

    column datatype primary key
    song_title text partition key
    artist text clustering key
    first_name text
    last_name text

Example

Query

SELECT artist, song_title, song_length FROM music_history 
WHERE session_id = 338 and item_in_session = 4

Result

music_history

Query

SELECT artist, song_title, first_name, last_name FROM user_playhistory 
WHERE user_id = 10 and session_id = 182

Result

music_history2

Query

SELECT first_name, last_name FROM song_userlist 
WHERE song_title = 'All Hands Against His Own'

Result

music_history3

data_modeling_with_apache_cassandra's People

Contributors

kjh7176 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.