Coder Social home page Coder Social logo

etl-engine-module's Introduction

ETL Engine Module

Introduction

This module is used in the project of the-mesh-for-data for interacting with ETL engines, such as Data Stage, to schedule, run and manage the ETL jobs as a client.

Data Stage Integration:

For the integration with IBM Data Stage, it will use the Data Stage API to run the job and get the job status.

Prerequisites

  • Kubernetes cluster 1.10+
  • Helm 3.0.0+

Installation

Modify values in Makefile

In Makefile:

  • Change DOCKER_USERNAME, DOCKER_PASSWORD, DOCKER_HOSTNAME, DOCKER_NAMESPACE, DOCKER_TAGNAME, DOCKER_IMG_NAME, and DOCKER_CHART_IMG_NAME to your own preferences.

Build Docker image for Python application

make docker-build

Push Docker image to your preferred container registry

make docker-push

Configure the chart

  • When testing the chart, configure settings by editing the values.yaml directly.
  • Modify repository in values.yaml to your preferred Docker image.
  • Modify copy/read action as needed with appropriate values.
  • At runtime, the m4d-manager will pass in the copy/read values to the module so you can leave them blank in your final chart.

Login to Helm registry

make helm-login

Lint and install Helm chart

make helm-verify

Push the Helm chart

make helm-chart-push

Uninstallation

make helm-uninstall

Deploy M4D module

  1. In your module yaml spec (etl-engine-module.yaml):

    • Change spec.chart.name to your preferred chart image.
    • Define flows and capabilities for your module.
    • The Mesh for Data manager checks the statusIndicators provided to see if the module is ready. In this example, if the Kubernetes job completes, the status will be succeeded and the manager will set the module as ready.
  2. Deploy M4DModule in m4d-system namespace:

kubectl create -f etl-engine-module.yaml -n m4d-system

Register data asset in Egeria and S3 bucket credentials in Vault (optional)

  1. Follow steps 3 and 4 in this example to register the data asset in the catalog and set the ASSET_ID environment variable
  2. Follow step 5 in this example to register HMAC credentials in Vault

Deploy M4D application which triggers module

  1. In m4dapplication.yaml:
    • Change metadata.name to your application name.
    • Define appInfo.purpose, appInfo.role, and spec.data
    • This ensures that a copy is triggered:
    copy:
      required:true
  2. Deploy M4DApplication in default namespace:
cat m4dapplication.yaml | sed "s/ASSET_ID/$ASSET_ID/g" | kubectl -n default apply -f -
  1. Check if M4DApplication successfully deployed:
kubectl get m4dapplication -n default
kubectl describe M4DApplication etl-engine-module-test -n default
  1. Check if module was triggered in m4d-blueprints:
kubectl get blueprint -n m4d-blueprints
kubectl describe blueprint etl-engine-module-test-default -n m4d-blueprints
kubectl get job -n m4d-blueprints
kubectl get pods -n m4d-blueprints

If you are using the etl-engine-module image, you should see this in the kubectl logs of your completed Pod:

$ kubectl logs rel1-etl-engine-module-x2tgs

Hello World Module!

Connection name is s3

Connection format is parquet

Vault credential address is http://vault.m4d-system:8200

Vault credential role is module

Vault credential secret path is /v1/kubernetes-secrets/secret-name?namespace=default

S3 bucket is m4d-test-bucket

S3 endpoint is s3.eu-gb.cloud-object-storage.appdomain.cloud

COPY SUCCEEDED

etl-engine-module's People

Contributors

eletonia avatar cong78 avatar revit13 avatar

Stargazers

Roee Shlomo avatar  avatar

Watchers

 avatar

Forkers

revit13

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.