This module is used in the project of the-mesh-for-data for interacting with ETL engines, such as Data Stage, to schedule, run and manage the ETL jobs as a client.
For the integration with IBM Data Stage, it will use the Data Stage API to run the job and get the job status.
- Kubernetes cluster 1.10+
- Helm 3.0.0+
In Makefile
:
- Change
DOCKER_USERNAME
,DOCKER_PASSWORD
,DOCKER_HOSTNAME
,DOCKER_NAMESPACE
,DOCKER_TAGNAME
,DOCKER_IMG_NAME
, andDOCKER_CHART_IMG_NAME
to your own preferences.
make docker-build
make docker-push
- When testing the chart, configure settings by editing the
values.yaml
directly. - Modify repository in
values.yaml
to your preferred Docker image. - Modify copy/read action as needed with appropriate values.
- At runtime, the
m4d-manager
will pass in the copy/read values to the module so you can leave them blank in your final chart.
make helm-login
make helm-verify
make helm-chart-push
make helm-uninstall
-
In your module yaml spec (
etl-engine-module.yaml
):- Change
spec.chart.name
to your preferred chart image. - Define
flows
andcapabilities
for your module. - The Mesh for Data manager checks the
statusIndicators
provided to see if the module is ready. In this example, if the Kubernetes job completes, the status will besucceeded
and the manager will set the module as ready.
- Change
-
Deploy
M4DModule
inm4d-system
namespace:
kubectl create -f etl-engine-module.yaml -n m4d-system
- Follow steps 3 and 4 in this example to register the data asset in the catalog and set the
ASSET_ID
environment variable - Follow step 5 in this example to register HMAC credentials in Vault
- In
m4dapplication.yaml
:- Change
metadata.name
to your application name. - Define
appInfo.purpose
,appInfo.role
, andspec.data
- This ensures that a copy is triggered:
copy: required:true
- Change
- Deploy
M4DApplication
indefault
namespace:
cat m4dapplication.yaml | sed "s/ASSET_ID/$ASSET_ID/g" | kubectl -n default apply -f -
- Check if
M4DApplication
successfully deployed:
kubectl get m4dapplication -n default
kubectl describe M4DApplication etl-engine-module-test -n default
- Check if module was triggered in
m4d-blueprints
:
kubectl get blueprint -n m4d-blueprints
kubectl describe blueprint etl-engine-module-test-default -n m4d-blueprints
kubectl get job -n m4d-blueprints
kubectl get pods -n m4d-blueprints
If you are using the etl-engine-module
image, you should see this in the kubectl logs
of your completed Pod:
$ kubectl logs rel1-etl-engine-module-x2tgs
Hello World Module!
Connection name is s3
Connection format is parquet
Vault credential address is http://vault.m4d-system:8200
Vault credential role is module
Vault credential secret path is /v1/kubernetes-secrets/secret-name?namespace=default
S3 bucket is m4d-test-bucket
S3 endpoint is s3.eu-gb.cloud-object-storage.appdomain.cloud
COPY SUCCEEDED