Coder Social home page Coder Social logo

data-testing-with-airflow's Introduction

Data Testing With Airflow

Build Status

This repository contains simple examples of how to implement some of the Nine Circles of Data Tests described in our blogpost. A docker container is provided to run the DTAP and Data Tests examples. The Mock Pipeline tests and DAG integrity tests are implemented in Travis CI tests.

DAG Integrity Tests

The DAG Integrity tests are integrated in our CI pipeline, and check if the DAG definition in your airflowfile is a valid DAG. This includes not only checking for typos, but also verifying there are no cycles in your DAGs, and that the operators are used correctly.

Mock Pipeline Tests

Mock Pipeline Tests are implemented as a CI pipeline stage, and function as unit tests for your individual DAG tasks. Dummy data is generated and used to verify that for each expected input, an expected output follows from your code.

Data Tests

In the dags directory, you will find a simple DAG with 3 tasks. Each of these tasks has a companion test that is integrated into the DAG. These tests are run on every DAG run and are meant to verify that your code makes sense when running on real data.

DTAP

In order to show our DTAP logic, we have included a Dockerfile, which builds a Docker image with Airflow and Spark installed. We then clone this repo 4 times, to represent each environment. To build the docker image:

docker build -t airflow_testing .

Once built, you can run it with:

docker run -p 8080:8080 airflow_testing

This image contains all necessary logic to initialize the DAGs and connections. One part that is simulated is the promotion of branches (i.e. environments). The 'promotion' of code from one branch (environment) to another requires write access to the git repo, something which we don't want to provide publicly :-). To see the environments and triggering in action, kick off the 'dev' DAG via the UI (or CLI) to see flow. Please note, the prod DAG will not run after the acc one by default, as we prefer to use so called green-light deployments, to verify the logic and prevent unwanted production DAGruns.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.