Coder Social home page Coder Social logo

heroku_airflow's Introduction

Run Apache Airflow on Heroku

Apache Airflow can be used to create, schedule, and monitor workflows. It is commonly used to define ETL processes. An excellent example of an ETL workflow can be found here

Heroku Button deployment

Apache Airflow can be quickly and easily deployed to your own Heroku app by using this Heroku Button: Deploy

You will be prompted for a new Fernet key, which can be generated thusly:

dd if=/dev/urandom bs=32 count=1 2>/dev/null | openssl base64

After deployment a login user will need to be created. This can be done using the create_user command through Heroku bash (documentation)

heroku run bash
airflow create_user -u <username> -p <password> -r <Role> -f <FirstName> -l <LastName> -e <Email>

Manual Deployment

This is based largely on an excellent article (here) on deploying Apache Airflow onto the Heroku platform, with some minor updates and tweaks.

  1. Install or setup supported python version (I'm using pyenv so I just set the desired version in the project directory):

    echo "3.6.4" > .python-version
    
  2. Create Python virtual environment to install Airflow along with dependencies

    python3 -m venv .venv
    source .venv/bin/activate
    
  3. Install airflow, install cryptography module, and set Procfile to init db on initial run

    pip install "apache-airflow[postgres, password]"
    pip install "cryptography"
    pip freeze > requirements.txt
    
  4. Create a .gitignore file

    echo ".venv/" > .gitignore
    
  5. Initialize the git repository and create the Heroku app with a postgres add-on:

    git init
    git add .
    git commit -m "initial commit"
    
    heroku create
    heroku addons:create heroku-postgresql:hobby-dev
    
  6. We will use airflow.cfg for most of our application configuration, but any secure values should be kept as Heroku config variables. The airflow.cfg in this repository is already making use of the DATABASE_URL that was assigned when we created the database, but we will need a Fernet key in order to enable encryption for connection passwords stored in the database. You can generate/set one thusly:

    heroku config:set AIRFLOW__CORE__FERNET_KEY=`dd if=/dev/urandom bs=32 count=1 2>/dev/null | openssl base64`
    

    We'll also need to set AIRFLOW_HOME to /app so that Airflow knows where the airflow.cfg file is. Otherwise when the database initializes it will do so using sqlite, which on Heroku will only be created on an ephemeral file system that has the lifetime of the dyno running it:

    heroku config:set AIRFLOW_HOME=/app
    
  7. Heroku uses a Procfile, a text file that indicates which command should be used to start code running. For our initial run we just want to initialize the database, so that's what goes in our Procfile:

    echo "web: airflow initdb" > Procfile
    
  8. Commit once more and deploy to Heroku. This will build the project on Heroku and run the database initialization command from the Procfile.

    git add .
    git commit -m "Added configuration files."
    git push heroku master
    
  9. Once deployed, follow the log output and await completion of the database initialization:

    heroku logs --tail
    
  10. Now that the database is initialized, update Procfile to launch the web server:

    echo "web: airflow webserver --port \$PORT" > Procfile
    git add .
    git commit -m "Modify procfile to launch webserver"
    git push heroku master
    
  11. Now when you launch the app (heroku open) there should be a logon screen. There is no logon yet, so we need to create a new user. This can be done using the create_user command through Heroku bash (documentation)

    heroku run bash
    airflow create_user -u <username> -p <password> -r <Role> -f <FirstName> -l <LastName> -e <Email>
    
  12. Finally, modify the Procfile one last time to run both the web server and the scheduler.

    echo "web:  airflow webserver --port \$PORT --daemon & airflow scheduler" > Procfile
    
  13. Any DAGs you want to run can go in a dags subfolder within the project.

heroku_airflow's People

Contributors

jsoyland avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.