Coder Social home page Coder Social logo

asdfgeoff / airflow-operators Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 13 KB

A collection of useful operators for Apache Airflow

License: MIT License

Python 97.18% TSQL 2.46% Shell 0.36%
airflow airflow-operators airflow-operator airflow-plugins python python3 dag apache-airflow

airflow-operators's Introduction

Custom Airflow Operators

A collection of custom operators which are helpful for building data transformation pipelines in Apache Airflow.

Operators

RedshiftTableConstraintOperator

This operator performs boilerplate data quality checks against a specified table in a Redshift database. It can be placed at the end of your DAG to verify the integrity of your output, at the start to verify assumptions on upstream data sources before starting, or in between data transformation steps to make debugging easier.

How to use it

Copy the package RedshiftTableConstraintOperator to somewhere you can access from your dag definition .py file.

from .RedshiftTableConstraintOperator import RedshiftTableConstraintOperator

example_task = RedshiftTableConstraintOperator(
    task_id='example_task',
    schema='superb_schema',
    table='terrific_table',
    no_nulls=True,
    unique_rows=True,
    unique_subsets=['session_id'],
    provide_context=True)

The argument no_nulls can take either a boolean or a list of fields.

SQLTemplatedPythonOperator

This operator runs an arbitrary python function with a templated SQL file as input.

Useful for implementing bespoke data quality checks using boilerplate functions such as pct_less_than or pct_greater_than. By passing SQL file as template, airflow will display it in the Rendered template tab in the web UI, which makes it trivial to copy/paste the query for a given dagrun into your own IDE to order to debug potential problems.

How to use it
from SQLTemplatedPythonOperator import SQLTemplatedPythonOperator, assert_pct_less_than

DQ_check = SQLTemplatedPythonOperator(
   task_id='DQ_check',
   python_callable=assert_pct_less_than,
   sql='join_miss_pct.sql',
   op_args=[0.05],
   provide_context=True)

Tests

How to run

  1. Create a conda environment using conda env create -f environment.yml
  2. Run run_tests.sh file

airflow-operators's People

Contributors

asdfgeoff avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

tonnycao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.