Coder Social home page Coder Social logo

airflow-dbt-demo's Introduction

Astronomer Platform Helm Charts

This repository contains the helm charts for deploying the Astronomer Platform into a Kubernetes cluster.

Astronomer is a commercial "Airflow as a Service" platform that runs on Kubernetes. Source code is made available for the benefit of our customers, if you'd like to use the platform reach out for a license.

Architecture

Astronomer Architecture

Docker images

Docker images for deploying and running Astronomer are currently available on Quay.io/Astronomer.

Documentation

You can read the Astronomer platform documentation at https://docs.astronomer.io/enterprise. For a record of all user-facing changes to the Astronomer platform, see Release Notes.

Contributing

We welcome any contributions:

  • Report all enhancements, bugs, and tasks as GitHub issues
  • Provide fixes or enhancements by opening pull requests in GitHub

Local Development

Install the following tools:

  • docker (make sure your user has permissions - try 'docker ps')
  • kubectl
  • kind
  • mkcert (make sure mkcert in PATH)
  • helm

Run this script from the root of this repository:

bin/reset-local-dev

Each time you run the script, the platform will be fully reset to the current helm chart.

Customizing the local deployment

Turn on or off parts of the platform

Modify the "tags:" in configs/local-dev.yaml

  • platform: core Astronomer components
  • logging (large impact on RAM use): ElasticSearch, Kibana, Fluentd (aka 'EFK' stack)
  • monitoring: Prometheus

Load a Docker image into KinD's nodes (so it's available for pods)

kind load docker-image $your_local_image_name_with_tag

Make use of that image

Make note of your pod name

kubectl get pods -n astronomer

Find the corresponding deployment, daemonset, or statefulset

kubectl get deployment -n astronomer

Replace the pod with the new image Look for "image" on the appropriate container and replace with the local tag, and set the pull policy to "Never".

kubectl edit deployment -n astronomer <your deployment>

Specify the Kubernetes version

bin/reset-local-dev -K 1.28.6

Locally test HA configurations

You need a powerful computer to run the HA testing locally. 28 GB or more of memory should be available to Docker.

Environment variables:

  • USE_HA: when set, will deploy using HA configurations
  • CORDON_NODE: when set, will cordon this node after kind create cluster
  • MULTI_NODE: when set, will deploy kind with two worker nodes

Scripts:

  • Use bin/run-ci to start the cluster
  • Modify / use bin/drain.sh to test draining

Example:

export USE_HA=1
export CORDON_NODE=kind-worker
export MULTI_NODE=1
bin/run-ci

After the platform is up, then do

bin/drain.sh

How to upgrade airflow chart json schema

Every time we upgrade the airflow chart we will also need to update the json schema file with the list of acceptable top level params (eventually this will be fixed on the OSS side but for now this needs to be a manual step https://github.com/astronomer/issues/issues/3774). Additionally the json schema url will need to be updated to something of the form https://raw.githubusercontent.com/apache/airflow/helm-chart/1.x.x/chart/values.schema.json. This param is found in astronomer/values.schema.json at the astronomer.houston.config.deployments.helm.airflow.$ref parameter

To get a list of the top level params it is best to switch to the apache/airflow tagged commit for that chart release. Then run the ag command to get a list of all top level params.

Example:

gch tags/helm-chart/1.2.0
ag "\.Values\.\w+" -o --no-filename --no-numbers | sort | uniq

The values output by this command will need to be inserted manually into astronomer/values.schema.json at the astronomer.houston.config.deployments.helm.airflow.allOf parameter. There are two additional params that need to be at this location outside of what is returned from above. They are podMutation and useAstroSecurityManager. These can be found by running the same ag command against the astronomer/airflow-chart values.yaml file.

Searching code

We include k8s schema files and calico CRD manifests in this repo to aid in testing, but their inclusion makes grepping for code a bit difficult in some cases. You can exclude those files from your `git grep`` results if you use the following syntax:

git grep .Values.global. -- ':!tests/k8s_schema' ':!bin/kind'

The -- ends the git command arguments and indicates that the rest of the arguments are filenames or pathspecs. pathspecs begin with a colon. :!tests/k8s_schema is a pathspec that instructs git to exclude the directory tests/k8s_schema.

Note that this pathspec syntax is a git feature, so this exclusion technique will not work with normal grep.

License

The code in this repo is licensed Apache 2.0 with Commons Clause, however it installs Astronomer components that have a commercial license, and requires a commercial subscription from Astronomer, Inc.

Optional schema validation

The ./values.schema.json.example file can be used to validate the helm values you are using work with the default airflow chart shipped with this repo. To use it remove the .example postfix from the file and proceed with the helm lint, install, and upgrade commands as normal.

airflow-dbt-demo's People

Contributors

denimalpaca avatar fhoda avatar josh-fell avatar petedejoy avatar spbail avatar virajmparekh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

airflow-dbt-demo's Issues

Registry v2 Parsing Error

Registry has issues parsing some dags.

[2022-10-18, 03:21:43 UTC] {pod_manager.py:243} INFO - FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/airflow/dbt/target/manifest.json'

Suggestion:
Move dbt manifest.json (and potentially all files dbt related) to the include directory or don't use absolute path when referencing from DAG.

cc @virajmparekh

Models directory is needed to run the dag

Hi,
I have been trying out this example DAG. I managed to run the dbt project on both local and airflow with the models folder. However, when the folder was removed, dbt found 0 models with only manifest.json (I ran dbt compile beforehand so the manifest file does contain all compiled sql).

The README says you only need those 3 files to run dbt, am I missing something?

`astro dev start` fails due to dependency

Hey guys,

Im currently trying to see if the product fit our needs and trying to compose a POC using DBT and Airflow
Found your demo and when im trying to install using the astronomer CLI, i get the following error:

+ grep -Eqx 'apache-airflow\s*[=~>]{1,2}.*' requirements.txt
+ pip install --no-cache-dir -q -r requirements.txt
ERROR: Cannot install -r requirements.txt (line 3) and jsonschema==3.2.0 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
The command '/bin/bash -o pipefail -e -u -x -c if grep -Eqx 'apache-airflow\s*[=~>]{1,2}.*' requirements.txt; then     echo >&2 "Do not upgrade by specifying 'apache-airflow' in your requirements.txt, change the base image instead!";  exit 1;   fi;   pip install --no-cache-dir -q -r requirements.txt' returned a non-zero code: 1
Error: command 'docker build -t airflow-dbt-demo_90ae4e/airflow:latest failed: failed to execute cmd: exit status 1

Would love to get your help to get the demo up and running :-)

Add licences please

Hey this looks nice but without a licences I don't dare uses it.
Could you please add one that you think is appropriate?

Generating multiple DAGs with Cross DAG Dependencies

Thanks for a really awesome demo of how to integrate dbt with airflow. We use airflow extensively and generate multiple DAGs, one for each functional area (marketing, finance, etc.). While we try to keep the DAGs as independent as possible, realistically, there are always a few cross-dag dependencies that we handle using ExternalTaskSensor. Is it possible to handle it within this architecture?

I tried to add tags to the tasks for each DAG. However, except for staging instantiating the DAG with dbt_tag resulted in a DAG with empty dbt_run and dbt_test task groups. Intuitively, here is what I am thinking should work:

  1. Use tags to identify separate DAGs.
  2. While processing a specific DAG, generate wait_for tasks using ExternalSensor for dependencies that are NOT in the same DAG.

I would love to get your thoughts on the best way to operationalize this.

`dbt compile` runs during each time dag is parsed by scheduler

This came out of conversation with @spbail over dbt slack.

As things stand now in the dbt_advanced_utility DAG dbt compile will run as part of dag_parser = DbtDagParser(..) step. This means that every time the DAG is parsed, which is pretty frequent although not sure what the default is, dbt compile will execute. This can be a very expensive operation taking at least 10-20 seconds, or up to minutes for very large dbt projects.

To minimize what's being parsed by the scheduler it probably makes sense to offload dbt compile to a "deployment step" in some way. At Updater we run compile plus do some other work in Circle so that the dag's themselves just load pickle files. (what's described in Part 2 ).The point of this repo is obviously a ready-to-use demo though so I'm not sure what's the best approach here.

dags don't run in demo

I'm a trying to better understand how airflow and dbt can be used together but I can't actually run the demo.
when I trigger dbt_basic_dag I get airflow.exceptions.AirflowNotFoundException: The conn_id `***` isn't defined
so i appears that the docker container doesn't have the environment variables defined needed to connect to postgres. is there a simple fix for this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.