Coder Social home page Coder Social logo

nareshk1290 / udacity-data-engineering Goto Github PK

View Code? Open in Web Editor NEW
182.0 6.0 166.0 1.77 MB

Udacity Data Engineering Nano Degree (DEND)

Python 8.01% Jupyter Notebook 91.99%
udacity-dend postgresql cassandra spark aws s3 etl redshift star-schema airflow

udacity-data-engineering's Introduction

Data Engineering Nanodegree

Projects and resources developed in the DEND Nanodegree from Udacity.

Developed a relational database using PostgreSQL to model user activity data for a music streaming app. Skills include:

  • Created a relational database using PostgreSQL
  • Developed a Star Schema database using optimized definitions of Fact and Dimension tables. Normalization of tables.
  • Built out an ETL pipeline to optimize queries in order to understand what songs users listen to.

Proficiencies include: Python, PostgreSql, Star Schema, ETL pipelines, Normalization

Designed a NoSQL database using Apache Cassandra based on the original schema outlined in project one. Skills include:

  • Created a nosql database using Apache Cassandra (both locally and with docker containers)
  • Developed denormalized tables optimized for a specific set queries and business needs

Proficiencies used: Python, Apache Cassandra, Denormalization

Created a database warehouse utilizing Amazon Redshift. Skills include:

  • Creating a Redshift Cluster, IAM Roles, Security groups.
  • Develop an ETL Pipeline that copies data from S3 buckets into staging tables to be processed into a star schema
  • Developed a star schema with optimization to specific queries required by the data analytics team.

Proficiencies used: Python, Amazon Redshift, aws cli, Amazon SDK, SQL, PostgreSQL

Scaled up the current ETL pipeline by moving the data warehouse to a data lake. Skills include:

  • Create an EMR Hadoop Cluster
  • Further develop the ETL Pipeline copying datasets from S3 buckets, data processing using Spark and writing to S3 buckets using efficient partitioning and parquet formatting.
  • Fast-tracking the data lake buildout using (serverless) AWS Lambda and cataloging tables with AWS Glue Crawler.

Technologies used: Spark, S3, EMR, Athena, Amazon Glue, Parquet.

Automate the ETL pipeline and creation of data warehouse using Apache Airflow. Skills include:

  • Using Airflow to automate ETL pipelines using Airflow, Python, Amazon Redshift.
  • Writing custom operators to perform tasks such as staging data, filling the data warehouse, and validation through data quality checks.
  • Transforming data from various sources into a star schema optimized for the analytics team's use cases.

Technologies used: Apache Airflow, S3, Amazon Redshift, Python.

udacity-data-engineering's People

Contributors

nareshk1290 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.