Coder Social home page Coder Social logo

datatalksclub / data-engineering-zoomcamp Goto Github PK

View Code? Open in Web Editor NEW
22.5K 406.0 4.8K 6.36 MB

Free Data Engineering course!

HCL 0.02% Dockerfile 0.10% Python 0.66% Shell 0.04% Jupyter Notebook 98.50% Java 0.67% PowerShell 0.01%
data-engineering kafka spark dbt docker prefect

data-engineering-zoomcamp's Introduction

Data Engineering Zoomcamp

Syllabus

Taking the course

2024 Cohort

Self-paced mode

All the materials of the course are freely available, so that you can take the course at your own pace

  • Follow the suggested syllabus (see below) week by week
  • You don't need to fill in the registration form. Just start watching the videos and join Slack
  • Check FAQ if you have problems
  • If you can't find a solution to your problem in FAQ, ask for help in Slack

Syllabus

Note: NYC TLC changed the format of the data we use to parquet. In the course we still use the CSV files accessible here.

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

More details

  • Data Lake
  • Workflow orchestration
  • Workflow orchestration with Mage
  • Homework

More details

  • Reading from apis
  • Building scalable pipelines
  • Normalising data
  • Incremental loading
  • Homework

More details

  • Data Warehouse
  • BigQuery
  • Partitioning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • BigQuery Machine Learning

More details

  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualizing the data with google data studio and metabase

More details

  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins

More details

  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

More details

More details

Putting everything we learned to practice

  • Week 1 and 2: working on your project
  • Week 3: reviewing your peers

More details

Overview

Prerequisites

To get the most out of this course, you should feel comfortable with coding and command line and know the basics of SQL. Prior experience with Python will be helpful, but you can pick Python relatively fast if you have experience with other programming languages.

Prior experience with data engineering is not required.

Instructors

Past instructors:

Asking for help in Slack

The best way to get support is to use DataTalks.Club's Slack. Join the #course-data-engineering channel.

To make discussions in Slack more organized:

Supporters and partners

Thanks to the course sponsors for making it possible to run this course

Do you want to support our course and our community? Please reach out to [email protected]

Star History

Star History Chart

data-engineering-zoomcamp's People

Contributors

alexeygrigorev avatar ankurchavda avatar ankushkhanna avatar balajirvp avatar boisalai avatar bsenst avatar canovasjm avatar data-think-2021 avatar discdiver avatar ellacharmed avatar froukje avatar gsajko avatar guoliveira avatar hegdehog avatar iamtodor avatar inner-outer-space avatar iremerturk avatar itnadigital avatar jboliv01 avatar kargarisaac avatar kwannoel avatar maria-fisher avatar mattppal avatar michaelshoemaker avatar padilha avatar sandy-75 avatar sejalv avatar victoriapm avatar vincenzogalante avatar ziritrion avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-engineering-zoomcamp's Issues

Week 4 - DBT build ARGs not being correctly passed after FROM

Hey DataTalks zoompcamp team!

I'm not sure if I'm the only person having this issue but it looks like the Dockerfile file provided in Week 4 for building dbt-bigquery fails to build. Namely the ARGs defined at the beginning of the Dockerfile aren't made available during each part of the multi-stage build.

I think each particular ARG needs to be defined after each FROM statement for it to be accessible during a particular build stage.

i.e:

...
FROM base as dbt-core
ARG [email protected]
RUN echo python -m pip install --no-cache-dir "git+https://github.com/dbt-labs/${dbt_core_ref}#egg=dbt-core&subdirectory=core" \
  && python -m pip install --no-cache-dir "git+https://github.com/dbt-labs/${dbt_core_ref}#egg=dbt-core&subdirectory=core"

The Docker Docs on this behaviour make a claim that you don't need to re-define a default value after FROM (i.e leave it as ARG dbt_core_ref but that hasn't worked on my system using Docker 20.10.12.

Week 2 - Task stuck at up_for_retry

Hi,

I tried to run data_intestrion_gcs_dag from the webserver but the first task stuck at up_for_retry stage. When I tried to check the log this is the only thing I found:

image

However, I can still run this dag via the Airflow CLI. I'm not sure what's the problem here - I built the image on two different machines, one worked fine, and one got stuck like this.

Google cloud installing for windows

pls add powershell commands from official site into manual

(New-Object Net.WebClient).DownloadFile("https://dl.google.com/dl/cloudsdk/channels/rapid/GoogleCloudSDKInstaller.exe", "$env:Temp\GoogleCloudSDKInstaller.exe")

& $env:Temp\GoogleCloudSDKInstaller.exe

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.