This repository contains my solutions to the course "Udacity Data Engineering Nanodegree" in summer 2019
Project Folder | Description | Done |
---|---|---|
Project 1a - PostgreSQL | Building a star schema in PostgreSQL and inserting data via Python | ✔️ |
Project 1b - Cassandra | Building a star schema in Cassandra and inserting data via Python | ✔️ |
Project 2 - AWS Redshift | Building a star schema in AWS Redshift and inserting data from AWS S3 via Python | ✔️ |
Project 3 - Spark | Reading and transforming data from AWS S3 with Spark to parse them in partitioned parquet files | ✔️ |
Project 4 - Airflow Pipelines | Building an Airflow Pipeline to automate parsing and transforming files from AWS S3 to AWS Redshift | ✔️ |
Project 5 - Capstone Project | Integrating files from S3 into PostgreSQL via Spark | ✔️ |