Start Data Engineering's Projects
Code for my "Efficient Data Processing in SQL" book.
Beginner data engineering project - batch edition
Simple stream processing pipeline
Near real time ETL to populate a dashboard.
Repo for CDC with debezium blog post
open data for blog content at https://www.startdataengineering.com/
Sample project to demonstrate data engineering best practices
Code to demonstrate data engineering metadata & logging best practices
A template repository to create a data project with IAC, CI/CD, Data migrations, & testing
Repository showing how to automate data testing as part of CI
Repo to explain development, CI/CD cycle in dbt
Multiple node presto cluster on docker container
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
Example repo to create end to end tests for data pipeline.
Code for "Efficient Data Processing in Spark" Course
public file hosting
Making data pipelines idempotent
Profile readme
Local development environment for python data projects, with Docker
End to end data engineering project
Apache Superset Demp
Code for dbt tutorial
Project for "Data pipeline design patterns" blog.
Minimalist Hugo theme based on Hyde
Simple repo to demonstrate how to submit a spark job to EMR from Airflow
Simple repo to demonstrate how to submit a spark job
Simple example showing how to trigger a spark job with AWS Lambda