Coder Social home page Coder Social logo

meltano-on-aws-batch's Introduction

meltano-batch

A simple setup of Meltano Extract and Load on AWS Batch, managing the infrastructure with Terraform.

A service to setup a repeatable Meltano EL process, with smoke-tests installed. Runs the Meltano ELT process only, and does not provide a Meltano frontend (which as of writing is not essential)

If you are looking for an even simpler approach, then I strongly recommend taking a look at Meltano-on-Github-Actions as it is much simpler and requires less Devops hassle.

The only reason not to use Github actions is if you require much longer running loads, or control over the infrastructure specifications, or the movement to be fully contained within an AWS environment.

Prerequisites

  1. Select an AWS Region. Be sure that all required services (e.g. AWS Batch, AWS Lambda) are available in the Region selected.
  2. Install Docker.
  3. Install HashiCorp Terraform.
  4. Install the latest version of the AWS CLI and confirm it is properly configured.

Setup

  1. Setup terraform
git clone [email protected]:mattarderne/meltano-batch.git
cd meltano-batch/terraform
terraform init
  1. Run terraform, which will create all necessary infrastructure.
terraform plan 
terraform apply 

Build and Push Docker Image

Once finished, Terraform will output the name of your newly created ECR Repository, e.g. 123456789.dkr.ecr.eu-west-1.amazonaws.com/meltano-batch-ecr-repo:latest Note this value as we will use it in subsequent steps (referred to as MY_REPO_NAME):

cd ..
cd meltano

# build the docker image
docker build -t aws-batch-meltano .

# (optional) test the docker image
docker run \
    --volume $(pwd)/output:/project/output \
    aws-batch-meltano \
    elt tap-smoke-test target-jsonl

# tag the image
$ docker tag aws-batch-meltano:latest <MY_REPO_NAME>:latest

# login to the ECR, replace <region>
aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <MY_REPO_NAME>

# push the image to the ECR repository
docker push <MY_REPO_NAME>:latest

The above scripts are automated in the meltano/deploy_aws_ecr.sh script

Create a Job

Now that the docker image has been deployed to the ECR, you can invoke a job with the below, which will print the logs. Replace <region>

aws lambda invoke --function-name submit-job-smoke-test  --region <region> \
outfile --log-type Tail \
--query 'LogResult' --output text |  base64 -d

You should be able to view a list of the jobs with below command. (returns an empty list, no idea why, please let me know if you do!)

aws batch list-jobs --job-queue meltano-batch-queue 

Meltano UI

Load the Meltano UI to have a look. Currently only for display purposes, but can be configured to display the meltano app and kick-off adhoc jobs. Using Apprunner (example in terraform/archive/apprunner.tf) is viable for deploying to production, but requires a backend DB to be configured in the Dockerfile.

docker run \
    --volume $(pwd)/output:/project/output \
    aws-batch-meltano \
    ui

Resourcing

Depending on the size of the data transferred, you may need to increas the AWS Batch resource "aws_batch_job_definition" by editing the following fields from 2-8 vcpus and 2GB to 8GB ram

  "vcpus": 2 -> 8,
  "memory": 2000 -> 8000,

Notifications

By default there are no notifications set. Ideally this should be set by an AWS SNS system.

There is the capability to turn on Slack notifications as follows,

  1. Change the below line in elt_tap_smoke_test-target_jsonl.tf: handler = "lambda.lambda_handler" to handler = "alerts.lambda_handler"
  2. Change the below line in main.py: source_file = "lambda/lambda.py" to source_file = "alerts/lambda.py"
  3. Create a slack webhook create a secret.tfvars file in the lambda directory, adding the webhook url
slack_webhook = "<slack_webook>"
  1. Change the var.slack_webhook_toggle in variables.tf file to true (lowercase)
  2. Install requests in the terraform/lambda directory
cd terraform #must be run in terraform
pip install --target ./lambda requests
  1. Run terraform apply -var-file="secret.tfvars"

Test with aws lambda ... command above. It should ping to slack. However it only is pinging when the job starts (or fails to start), not the outcome of the job. Proper setup should be with AWS Batch SNS Notifications

Todo

AWS

meltano-on-aws-batch's People

Contributors

mattarderne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.