Coder Social home page Coder Social logo

williamwsyhk / azuredatabrickstrainingworkspacepreparation-public Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 18 KB

This repository contains terraform code to deploy Databricks workspace for training purpose in Azure.

License: MIT License

HCL 100.00%
azure databricks databricks-deploy terraform

azuredatabrickstrainingworkspacepreparation-public's Introduction

Introduction

This repository contains terraform code to deploy Databricks workspace for training purpose in Azure.

Resources to be created by this script

  1. Microsoft Entra ID Users and Groups (region-agnostic)
    • Instructors
    • Students
  2. Azure Storage Account for Databricks Unity Catalog (region-specific)
    • Important! One Azure region can only setup one Databricks Unity Catalog. If you want to reuse the existing Databricks Unity Catalog, then change the terraform code accordingly.
  3. Azure Databricks Workspace (region-specific)
  4. Azure Databricks Clusters
    • Instructors' Clusters
      • Data Engineering
      • Machine Learning
    • Students' Clusters
      • Data Engineering
      • Machine Learning
  5. Azure Databricks Training Materials ((c) Databricks)

Required Azure resources and accesses

  1. Azure Service Principal with access granted below.
    • Domain.Read.All
    • Group.ReadWrite.All
    • User.ReadWrite.All
  2. Azure Subscription with resource provider registered below.
    • Microsoft.Compute
    • Microsoft.Databricks
    • Microsoft.ManagedIdentity
    • Microsoft.Storage
  3. The Azure Service Principal from step 2 has access to manage resources in Azure Subscription from step 3.
  4. Databricks account on Azure (can be found with link here), which is already created by following this documentation.
  5. Databricks Group Databricks Unity Catalog Administrators (this is created separately from this project).
  6. Azure Service Principal have been added to Databricks Account.

Preparing secrets.tfvars for deploying with Service Principal

region = "<Azure region>"
tenant_id = "<Azure tenant ID>"
subscription_id = "<Azure subscription ID that contains all resources>"
client_id = "<Azure client (app) ID>"
client_secret = "<Azure client (app) secret>"
databricks_account_id = "<Azure Databricks account ID>"

Deployment Steps

  1. Install Azure CLI az & terraform
  2. Login Azure CLI, run az login --service-principal -u <app-id> -p <password-or-cert> --tenant <tenant-id>
  3. cd to the correct sub-folder first, e.g. cd ./20231101
  4. Install terraform providers, run terraform init
  5. Check and see if there is anything wrong, run terraform plan -var-file='<file>.tfvars' -out='<file>.tfplan'
  6. Deploy the infra, run terraform apply '<file>.tfplan'
  7. To remove the whole deployment, run terraform plan -destroy -var-file='<file>.tfvars' -out='<file-destroy>.tfplan' and then terraform apply '<file-destroy>.tfplan'

Caveats

In region eastasia, there is an issue to create Unity Catalog directly with terraform, thus requires manual creation in Databricks Account page, and then terraform import -var-file='<file>.tfvars' module.databricks.databricks_metastore.this '<metastore_id>'

Databricks users

The user list can be modified to suit your needs, e.g. number of users required. As this repository is served for creating training workspace, therefore the users are divided into 2 groups, Instructors and Students. The example format of the users are student01.databricks.<training-date-yyyyMMdd>@<your Azure domain>

Reference

Pre-requisite steps documents are listed in the links below.

Links

Terraform Providers

  • hashicorp/azuread
  • hashicorp/azurerm
  • databricks/databricks

azuredatabrickstrainingworkspacepreparation-public's People

Contributors

williamwsyhk avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.