Coder Social home page Coder Social logo

airflow-docker's Introduction

Apache Airflow in Docker Compose

Apache Airflow is an open-source tool to programmatically author, schedule and monitor workflows. These workflows are designed in Python, and monitored, scheduled and managed with a web UI. Airflow can easily integrate with data sources like HTTP APIs, databases (MySQL, SQLite, Postgres...) and more.

For understanding in detail how Airflow is deployed using Docker Compose, read this post at Medium.

Airflow deployment in Docker

There's an Apache Airflow image in DockerHub. We can also build our own image with the following Dockerfile:

FROM python:3.7
RUN pip3 install 'apache-airflow'
RUN airflow initdb

CMD (airflow scheduler &) && airflow webserver

For running the container: docker run -it -p 8080:8080 -v :/root/airflow airflow

Improving performance with MySQL

By default, Airflow uses a SQLite database as a backend. This is the easiest option, but its performance is quite weak. Using a MySQL database instead would increase a lot the performance. With the Docker Compose of this repo, two containers will be deployed: airflow-engine with Airflow, and airflow-backend with MySQL. The Docker Compose file will also take care of opening the port for the Airflow web server, mapping a volume for persistance, and automatically setting up the connection of Airflow to the MySQL backend.

Improving security with Fernet key

By default, Airflow stores all the data into the database as plaintext, including third-party services credentials. To avoid this, it's highly recommended to setup a Fernet key, which will encrypt this sensible data. The airflow-engine/fernet.py file takes care of this.

Removing restrictions on XCom size

XCom is the Airflow message queue for exchanging data between Tasks. If you try to store in a XCom an object bigger than 65KB, it will crash. The airflow-engine/airflow.sh file takes care of this, by modifying the database structure.

Airflow + MySQL Deployment

docker-compose build
docker-compose up

airflow-docker's People

Contributors

gbarreiro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

airflow-docker's Issues

Issue when I create a user through "flask fab command"

It's working in the localhost http://0.0.0.0:8080/

But, once I create a user through a flask fab command, as a "WARNING - No user yet created, use flask fab command to do it."

I receive the following error:

Something bad has happened.
Please consider letting us know by creating a bug report using GitHub.

Python version: 3.7.10
Airflow version: 2.1.0
Node: 831456c1d09f

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/usr/local/lib/python3.7/site-packages/MySQLdb/connections.py", line 259, in query
_mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1054, "Unknown column 'dag.last_parsed_time' in 'field list'")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functionsrule.endpoint
File "/usr/local/lib/python3.7/site-packages/airflow/www/auth.py", line 34, in decorated
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/www/views.py", line 547, in index
filter_dag_ids = current_app.appbuilder.sm.get_accessible_dag_ids(g.user)
File "/usr/local/lib/python3.7/site-packages/airflow/www/security.py", line 298, in get_accessible_dag_ids
return {dag.dag_id for dag in accessible_dags}
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3535, in iter
return self._execute_and_instances(context)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in execute_context
e, statement, parameters, cursor, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from
=e
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise

raise exception
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/usr/local/lib/python3.7/site-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/usr/local/lib/python3.7/site-packages/MySQLdb/connections.py", line 259, in query
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1054, "Unknown column 'dag.last_parsed_time' in 'field list'")
[SQL: SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_parsed_time AS dag_last_parsed_time, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag.concurrency AS dag_concurrency, dag.has_task_concurrency_limits AS dag_has_task_concurrency_limits, dag.next_dagrun AS dag_next_dagrun, dag.next_dagrun_create_after AS dag_next_dagrun_create_after
FROM dag]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

airflow init db error

I'm assuming these are caused by airflow 2.0.0 update. I found a second issue where airflow db init fails along these lines:
apache/airflow#9069
There is a lot I don't understand. But I think I've managed to find a fix by added the following line to mysqlconnect.py:
config.set('core', 'sql_engine_collation_for_ids', 'utf8mb3_general_ci')
Finally, I'm seeing a continual stream of these errors from the backend:
mbind: Operation not permitted
It looks bad, but hasn't caused any critical issues yet.

Airflow.sh

Hi,

I had an issue when I try starting the container -

./airflow.sh: 33: ./airflow.sh: Syntax error: end of file unexpected (expecting "done")

Please advise. Thanks in advance.

Adding DAG's to DagBag

hello @gbarreiro excellent project ๐Ÿ‘ and excellent blog post.

I have a question regarding adding DAG's... how do I do it?

I've tried the following

11 WORKDIR /root/airflow/
12 COPY dags /root/airflow/dags/    <<<<<<<
13 COPY airflow.sh airflow.sh

It seems like the dags are not added/picked-up from the Docker /root/airflow/dags directory.

Any thoughts? Thank you

airflow.sh initdb command

I was potentially having an error using the commands in airflow.sh:
airflow initdb
Since using the following command I've successfully run the docker image.
airflow db init

Thanks for providing this and all the best.

Copied the tutorial exactly, doesn't work

I followed the tutorial. When running airflow-engine exceptions are raised and it stops. Ideas?

Error:

Traceback (most recent call last):
airflow-engine_1   |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context   
airflow-engine_1   |     cursor, statement, parameters, context
airflow-engine_1   |   File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute       
airflow-engine_1   |     cursor.execute(statement, parameters)
airflow-engine_1   | sqlite3.OperationalError: no such table: dag

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.