Demo: https://www.youtube.com/watch?v=Gjns_Z0zxt8&feature=emb_logo
Short instructions:
- Install Cookiecutter and dependencies from requirements.txt
cookiecutter [email protected]:databricks/mlflow-deployments.git
(or the HTTPS equivalent)- Create new GitHub repo and push created project files there
- Add
DATABRICKS_HOST
andDATABRICKS_TOKEN
as Github secrets to the newly created repo - Implement DEV tests in dev-tests folder. These pipelines will be run on every push
- Implement Integration Test pipelines in folder integration-test. These pipelines will be used for testing of new release
- Implement production pipelines in pipeline folder.
Please note: 1)Python 3.8 is not supported yet
.
├── cicd1
│ └── model.py
├── create_cluster
├── deployment
│ └── databrickslabs_cicdtemplates-0.2.3-py3-none-any.whl
├── deployment.yaml
├── dev-tests
│ ├── pipeline1
│ │ ├── job_spec_aws.json
│ │ ├── job_spec_azure.json
│ │ └── pipeline_runner.py
│ └── pipeline2
│ ├── job_spec_aws.json
│ ├── job_spec_azure.json
│ └── pipeline_runner.py
├── integration-tests
│ ├── pipeline1
│ │ ├── job_spec_aws.json
│ │ ├── job_spec_azure.json
│ │ └── pipeline_runner.py
│ └── pipeline2
│ ├── job_spec_aws.json
│ ├── job_spec_azure.json
│ └── pipeline_runner.py
├── pipelines
│ ├── pipeline1
│ │ ├── job_spec_aws.json
│ │ ├── job_spec_azure.json
│ │ └── pipeline_runner.py
│ └── pipeline2
│ ├── job_spec_aws.json
│ ├── job_spec_azure.json
│ └── pipeline_runner.py
├── requirements.txt
├── run_now
├── run_pipeline
├── runtime_requirements.txt
├── setup.py
└── tests
└── test_example.py
Project based on the cookiecutter data science project template. #cookiecutterdatascience
Once you have created your project/repo for Azure Devops, you should do the following:
-
Create a new Azure Devops Project/pipeline and link it to the "az_dev_ops/azure-pipelines.yml" file in your repo.
-
Create a variable group named "Databricks-environment" that will be used in your az_dev_ops/azure-pipelines.yml pipeline definition.
-
Under that new variable group, create the following variables:
- DATABRICKS_HOST: Databricks Host without orgid. Example "https://uksouth.azuredatabricks.net".
- DATABRICKS_TOKEN: Databricks Personal Access Token of the user that will be used to run the automated pipelines.
- MLFLOW_TRACKING_URI: Normally databricks.
- DATABRICKS_USERNAME: Username of the system user in the Databricks environment under which the artifacts will be registered.
- CURRENT_CLOUD: Optional. Use the "CURRENT_CLOUD" environment variable to overwrite the cloud where the data pipelines will run. It takes precedence over the "cloud" parameter in the deployment.yaml file.
-
If you want to change the name of the variable group, you should do it in Azure Devops first and then reflect that name in the variables/group section of your az_dev_ops/azure-pipelines.yml file.
- Use the "CURRENT_CLOUD" environment variable to overwrite the cloud where the data pipelines will run. It takes precedence over the "cloud" parameter in the deployment.yaml file.