Data-Engineering-with-AWS---Udacity

Quick start

This course has been taken in the platform Udacity, and takes over 160 hours to complete de degree and get the certificate. The issues related to the database structures, have fulfilled a lack of my knowledge and have taught me the bases of relational and no relational databases, data warehouses, data lakes and data pipelines specifically in the Amazon Web Services environment. I did not like too much the AWS drag&drop interface, in my opinion, the best way to acquire this knowledge comes from programming in console (CLI), and not from selecting buttons and building graphs. However, this is possible from the AWS Cloud Shell and the python scripts automatically generated. Another issue to improve is the support help from Udacity In particular, Apache Airflow raised exceptions that were not contemplated in the Udacity Frequent Answer and Questions, and the human help is too slow and not focused on the problem. Although Udacity-GPT is a great way to solve problems, it could be better to optimize the contact between tutor and student. However, I absolutely recommend this online course, the teachers are great and can solve any problem related to the content.

1. Data Modeling

Learn to create relational and NoSQL data models to fit the diverse needs of data costumers. Use ETL to build databases in PostgreSQL and Apache Cassandra:

Introduction to Data Modeling.
Relational Data Models with SQL and PostgreSQL.
NoSQL Data Models with Apache Cassandra.
Final Project: Data Modeling with Apache Cassandra.

2. Cloud Data Warehouses

In this course, we will learne to create cloud-based data warehouses. We will sharpen our data warehouses skills, deepen our understanding of data infrastructure, and be introduced to data engineering on the cloud using Amazon Web Services (AWS):

Introduction to Cloud Data Warehouses.
Introduction to Data Warehouses.
ETL and Data Warehouse Technology in the Cloud.
AWS Data Warehouse Technologies.
Implementing a Data Warehouse on AWS.
Final Project: Data Warehouse.

3. Spark and Data Lakes

In this course, we will learn about the big data ecosystem and how to use Spark to work with massive datasets. We will also learn about how to store big data in a data lake and query it with Spark:

Introduction to Spark and Data Lakes.
Big Data Ecosystem, Data Lakes and Spark.
Spark Essentials.
Using Spark in AWS.
Ingesting and Organizing Data in a Lakehouse.
Final Project: STEDI Human Balance Analytics.

4. Automate Data Pipelines

In this course, we will build pipelines leveraging Airflow DAGs to organize our tasks along with AWS resources such as S3 and Redshift:

Introduction to Automating Data Pipelines.
Data Pipelines.
Airflow and AWS.
Data Quality.
Production Data Pipelines.
Final Project: Data Pipelines.

huunhat1703tkbn / data-engineering-with-aws---udacity Goto Github PK