Databricks Labs CI/CD Templates: Automated Databricks CI/CD pipeline creation and deployment

Demo: https://www.youtube.com/watch?v=Gjns_Z0zxt8&feature=emb_logo

Short instructions:

Install Cookiecutter and dependencies from requirements.txt
cookiecutter [email protected]:databricks/mlflow-deployments.git (or the HTTPS equivalent)
Create new GitHub repo and push created project files there
Add DATABRICKS_HOST and DATABRICKS_TOKEN as Github secrets to the newly created repo
Implement DEV tests in dev-tests folder. These pipelines will be run on every push
Implement Integration Test pipelines in folder integration-test. These pipelines will be used for testing of new release
Implement production pipelines in pipeline folder.

Please note: 1)Python 3.8 is not supported yet

Project Organization

.
├── cicd1
│   └── model.py
├── create_cluster
├── deployment
│   └── databrickslabs_cicdtemplates-0.2.3-py3-none-any.whl
├── deployment.yaml
├── dev-tests
│   ├── pipeline1
│   │   ├── job_spec_aws.json
│   │   ├── job_spec_azure.json
│   │   └── pipeline_runner.py
│   └── pipeline2
│       ├── job_spec_aws.json
│       ├── job_spec_azure.json
│       └── pipeline_runner.py
├── integration-tests
│   ├── pipeline1
│   │   ├── job_spec_aws.json
│   │   ├── job_spec_azure.json
│   │   └── pipeline_runner.py
│   └── pipeline2
│       ├── job_spec_aws.json
│       ├── job_spec_azure.json
│       └── pipeline_runner.py
├── pipelines
│   ├── pipeline1
│   │   ├── job_spec_aws.json
│   │   ├── job_spec_azure.json
│   │   └── pipeline_runner.py
│   └── pipeline2
│       ├── job_spec_aws.json
│       ├── job_spec_azure.json
│       └── pipeline_runner.py
├── requirements.txt
├── run_now
├── run_pipeline
├── runtime_requirements.txt
├── setup.py
└── tests
    └── test_example.py

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Azure Devops Cookiecutter instructions

Once you have created your project/repo for Azure Devops, you should do the following:

Create a new Azure Devops Project/pipeline and link it to the "az_dev_ops/azure-pipelines.yml" file in your repo.
Create a variable group named "Databricks-environment" that will be used in your az_dev_ops/azure-pipelines.yml pipeline definition.
Under that new variable group, create the following variables:
- DATABRICKS_HOST: Databricks Host without orgid. Example "https://uksouth.azuredatabricks.net".
- DATABRICKS_TOKEN: Databricks Personal Access Token of the user that will be used to run the automated pipelines.
- MLFLOW_TRACKING_URI: Normally databricks.
- DATABRICKS_USERNAME: Username of the system user in the Databricks environment under which the artifacts will be registered.
- CURRENT_CLOUD: Optional. Use the "CURRENT_CLOUD" environment variable to overwrite the cloud where the data pipelines will run. It takes precedence over the "cloud" parameter in the deployment.yaml file.
If you want to change the name of the variable group, you should do it in Azure Devops first and then reflect that name in the variables/group section of your az_dev_ops/azure-pipelines.yml file.

Additional

Use the "CURRENT_CLOUD" environment variable to overwrite the cloud where the data pipelines will run. It takes precedence over the "cloud" parameter in the deployment.yaml file.

dkamotsky / cicd-templates Goto Github PK

cicd-templates's Introduction

Databricks Labs CI/CD Templates: Automated Databricks CI/CD pipeline creation and deployment

Project Organization

Azure Devops Cookiecutter instructions

Additional

cicd-templates's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent