This is a practical example of a data engineering project with real-estates. The connected blog post about Building a Data Engineering Project in 20 Minutes you can find on my website. Topics are:
- Getting the Data – Scraping with BeautifulSoup
- Storing on S3-MinIO
- Custom Change Data Capture (CDC)
- Adding Database features to S3 – Delta Lake & Spark
- Machine Learning part – Jupyter Notebook
- Ingesting Data Warehouse for low latency – Apache Druid
- The UI with Dashboards and more – Apache Superset
- Orchestrating everything together – Dagster
- DevOps engine – Kubernetes
The Status of the project you find here.