Coder Social home page Coder Social logo

azure-databricks-operator's Introduction

Azure Databricks operator (for Kubernetes)

Build Status Go Report Card License: MIT

This project is experimental. Expect the API to change. It is not recommended for production environments.

Introduction

Kubernetes offers the facility of extending its API through the concept of Operators. This repository contains the resources and code to deploy an Azure Databricks Operator for Kubernetes.

The Databricks operator is useful in situations where Kubernetes hosted applications wish to launch and use Databricks data engineering and machine learning tasks.

Key benefits of using Azure Databricks operator

  1. Easy to use: Azure Databricks operations can be done by using Kubectl there is no need to learn or install data bricks utils command line and it’s python dependency

  2. Security: No need to distribute and use Databricks token, the data bricks token is used by operator

  3. Version control: All the YAML or helm charts which has azure data bricks operations (clusters, jobs, …) can be tracked

  4. Automation: Replicate azure data bricks operations on any data bricks workspace by applying same manifests or helm charts 

alt text

alt text

The project was built using

  1. Kubebuilder
  2. Golang SDK for DataBricks

How to use Azure Databricks operator

  1. Download the latest release manifests:
wget https://github.com/microsoft/azure-databricks-operator/releases/latest/download/release.zip
unzip release.zip
  1. Create the azure-databricks-operator-system namespace:
kubectl create namespace azure-databricks-operator-system
  1. Create Kubernetes secrets with values for DATABRICKS_HOST and DATABRICKS_TOKEN:
kubectl --namespace azure-databricks-operator-system \
    create secret generic dbrickssettings \
    --from-literal=DatabricksHost="https://xxxx.azuredatabricks.net" \
    --from-literal=DatabricksToken="xxxxx"
  1. Apply the manifests for the Operator and CRDs in release/config:
kubectl apply -f release/config

For details deployment guides please see deploy.md

Samples

  1. Create a spark cluster on demand and run a databricks notebook.

alt text

  1. Create an interactive spark cluster and Run a databricks job on exisiting cluster.

alt text

  1. Create azure databricks secret scope by using kuberentese secrets

alt text

For samples and simple use cases on how to use the operator please see samples.md

Quick start

On click start by using vscode

alt text

For more details please see contributing.md

Roadmap

Check roadmap.md for what has been supported and what's coming.

Resources

Few topics are discussed in the resources.md

  • Dev container
  • Build pipelines
  • Operator metrics
  • Kubernetes on WSL

Contributing

For instructions about setting up your environment to develop and extend the operator, please see contributing.md

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

azure-databricks-operator's People

Contributors

azadehkhojandi avatar creddy123 avatar crrodger avatar eliises avatar jasonthedeveloper avatar joshagudo avatar kant avatar lawrencegripper avatar magencio avatar martinpeck avatar microsoftopensource avatar msftgits avatar priyakumarank avatar storey247 avatar stuartleeks avatar xinsnake avatar xtellurian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-databricks-operator's Issues

Crash when submitting djob and run simultaneously with nil pointer dereference

If you submit a djob and run simultaneously the operator will crash with a nil pointer dereference error. Submitting simultaneously is our use-case that unfortunately we cannot avoid.

The desired behaviour would be similar to the handling of secretscopes when the underlying secret does not exist. Rather than crashing, report an error in the logs and continue the next reconcile cycle.

To reproduce given the following run referencing a djob:

apiVersion: databricks.microsoft.com/v1alpha1
kind: Djob
metadata:
  name: device-pessl
  namespace: dx
spec:
  new_cluster:
    spark_version: 5.3.x-scala2.11
    spark_conf:
      spark.databricks.delta.preview.enabled: "true"
    node_type_id: Standard_DS3_v2
    spark_env_vars:
      PYSPARK_PYTHON: '/databricks/python3/bin/python3'
    num_workers: 1
  notebook_task:
    notebook_path: "/Shared/notebooks/stream_builder-2.24.0"
  max_retries: 3
 
---
apiVersion: databricks.microsoft.com/v1alpha1
kind: Run
metadata:
  name: device-pessl-run
  namespace: dx
spec:
  job_name: device-pessl
  notebook_params:
    job_name: device-pessl

run:
kubectl apply -f job_and_run.yaml

output:

2019-12-17T10:03:27.791+1100	INFO	controllers.Djob	Starting reconcile loop for dx/device-pessl
2019-12-17T10:03:27.791+1100	INFO	controllers.Djob	Submit for dx/device-pessl
2019-12-17T10:03:27.791+1100	INFO	controllers.Djob	Submitting job device-pessl
2019-12-17T10:03:27.821+1100	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "run", "request": "dx/device-pessl-run"}
2019-12-17T10:03:27.822+1100	INFO	controllers.Run	Submitting run device-pessl-run
2019-12-17T10:03:27.821+1100	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"Run","namespace":"dx","name":"device-pessl-run","uid":"d94ff46e-6f39-4759-a6ed-3a18525fbdeb","apiVersion":"databricks.microsoft.com/v1alpha1","resourceVersion":"56723"}, "reason": "Added", "message": "Object finalizer is added"}
E1217 10:03:27.822531   47051 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 357 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x201b080, 0x3062de0)
	/Users/d886442/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/Users/d886442/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x82
panic(0x201b080, 0x3062de0)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/microsoft/azure-databricks-operator/controllers.(*RunReconciler).submit(0xc000290240, 0xc000278000, 0x1, 0x21ce575)
	/Users/d886442/projects/data-exchange/azure-databricks-operator/controllers/run_controller_databricks.go:71 +0x530
github.com/microsoft/azure-databricks-operator/controllers.(*RunReconciler).Reconcile(0xc000290240, 0xc0000cacfa, 0x2, 0xc0000cace0, 0x10, 0x307c760, 0x0, 0xc000054540, 0xc000105de8)
	/Users/d886442/projects/data-exchange/azure-databricks-operator/controllers/run_controller.go:80 +0x272
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001980c0, 0x20687a0, 0xc0001ceea0, 0x2068700)
	/Users/d886442/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x146
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001980c0, 0xc0005b2100)
	/Users/d886442/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xb5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0001980c0)
	/Users/d886442/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0005f5330)
	/Users/d886442/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x54
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0005f5330, 0x3b9aca00, 0x0, 0x1, 0xc0000ae300)
	/Users/d886442/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0005f5330, 0x3b9aca00, 0xc0000ae300)
	/Users/d886442/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/Users/d886442/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193 +0x326
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1e93220]

goroutine 357 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/Users/d886442/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:55 +0x105
panic(0x201b080, 0x3062de0)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/microsoft/azure-databricks-operator/controllers.(*RunReconciler).submit(0xc000290240, 0xc000278000, 0x1, 0x21ce575)
	/Users/d886442/projects/data-exchange/azure-databricks-operator/controllers/run_controller_databricks.go:71 +0x530
github.com/microsoft/azure-databricks-operator/controllers.(*RunReconciler).Reconcile(0xc000290240, 0xc0000cacfa, 0x2, 0xc0000cace0, 0x10, 0x307c760, 0x0, 0xc000054540, 0xc000105de8)
	/Users/d886442/projects/data-exchange/azure-databricks-operator/controllers/run_controller.go:80 +0x272
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001980c0, 0x20687a0, 0xc0001ceea0, 0x2068700)
	/Users/d886442/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256 +0x146
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001980c0, 0xc0005b2100)
	/Users/d886442/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232 +0xb5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc0001980c0)
	/Users/d886442/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0005f5330)
	/Users/d886442/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x54
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0005f5330, 0x3b9aca00, 0x0, 0x1, 0xc0000ae300)
	/Users/d886442/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0005f5330, 0x3b9aca00, 0xc0000ae300)
	/Users/d886442/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/Users/d886442/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:193

Delays in reconciliation under load

(Pre-emptive shout-out to @EliiseS, @storey247 and @lawrencegripper as this work has been a group effort)
As mentioned in #131 we have been performing some load tests against the operator. Our initial load run shows raised work-queue latency and an increasing work-queue depth.

image

It's worth noting that the histogram buckets for the latency are 0.1s, 1s, 10s, so a value of 10 on the graph in effect means somewhere between 1s and 10s.

Looking at the metrics for the mock api that we're using for the load tests, the reponse times for that look pretty constant:

image

What we can see in the mock api metrics are periods of time where there are no requests being made to the API (and these become more pronounced as the test load ramps up).

Looking at this, our hypothesis was that there is something causing the reconciliation loops to block.

Unable to delete secretscope in k8s if we fail to submit ACLs when creating it

When using the operator to create a secretscope with some secrets and ACLs, if we fail to set the ACLs for some reason, we won't be able to delete the secretscope from k8s. We will need to go to Databricks to delete the secretscope.

In my particular case, submitting the ACLs fail because my Databricks workspace doesn't support ACLs as it's not a Premium SKU. This is the error I get in the operator:

DEBUG controller-runtime.manager.events Warning {"object": {"kind":"SecretScope","namespace":"kubeflow","name":"test-secretscope","uid":"864a1362-218c-11ea-b7de-4a5fce98e05a","apiVersion":"databricks.microsoft.com/v1alpha1","resourceVersion":"11666051"}, "reason": "Failed", "message": "Failed to submit object: Response from server (403) {\"error_code\":\"PERMISSION_DENIED\",\"message\":\"ACL is not supported in your workspace.\"}"}

The problem can be seen in https://github.com/microsoft/azure-databricks-operator/blob/master/controllers/secretscope_controller_databricks.go. If "r.submitACLs(instance)" fails we will just return (L210) and we will never set instance.Status.SecretScope properly (L220). But we need instance.Status.SecretScope to be able to delete the secretscope from k8s (L226).

There could be at least 2 possible behaviors here:

  1. If we fail to set the ACLs, we rollback everything we did so we don't end up with a secretscope with invalid security.
  2. We ignore the error but we still set instance.Status.SecretScope so we can at least delete it from k8s.

Modeling Databricks API with Custom Resource Definitions

Megathread

Current state

NotebookJobs are the first (and currently only) custom resource definition (CRD) in this project. The Databricks API is quite rich, and cannot be modeled completely by this one object.

Desired state

A set of CRDs that can be used to model many or all operations that can be performed on the Databricks API.

Modeling the Databricks API

This thread is for discussion on what CRDs can be introduced in order to model the Databricks API 2.0

For reference, the databricks API is documented here

Cluster controller contains logic that can potentially delete a cluster in use

Looking at the code for creation of Clusters using the databricks operator there seems to be logic that could potentially delete a Cluster: https://github.com/microsoft/azure-databricks-operator/blob/master/controllers/dcluster_controller_databricks.go#L32

This seems incorrect, and indeed the business logic in the main controller seems to deem that this code can never be hit due it !IsSubmitted being the only way this code can get hit.

Suggest removing the code that deletes the cluster as this seems extremely dangerous. The last thing we would want is a cluster to be deleted if it is running a job/jobs etc.

Add cluster name support for dbricks run

currently, you can submit run on existing cluster by providing existing_cluster_id

apiVersion: databricks.microsoft.com/v1alpha1
kind: Run
metadata:
  name: drun-twitteringest1
spec:
  # create a run directly without a job
  existing_cluster_id: 1021-013622-bused793

You should be able to submit run on existing cluster by providing existing_cluster_name

apiVersion: databricks.microsoft.com/v1alpha1
kind: Run
metadata:
  name: drun-twitteringest1
spec:
  # create a run directly without a job
  existing_cluster_name: dcluster-interactive2

Investigate to resolve "error": "error when refreshing cluster: unexpected end of JSON input"

investigate on below error, it pollutes our logs. It doesn't break the operator and expected request executed successfully.

2019-10-15T02:09:42.255Z    ERROR    controller-runtime.controller    Reconciler error    {"controller": "dcluster", "request": "dx/interactive-always-on-nopassthrough", "error": "error when refreshing cluster: unexpected end of JSON input"}
github.com/go-logr/zapr.(*zapLogger).Error
    /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

Add events for errors

After a quick glance at the controllers, it seems that they are adding events for resources on successful steps which is awesome. For troubleshooting it would also be helpful to output events when there are error conditions

Consolidate Python API into native Golang operator code

Currently, we have Python and Golang as 2 separate applications and therefore containers. There's an overhead maintaining 2 set of development environments and the communication between using Swagger.

Work is under progress to re-implement the Python logic into Golang to eventually make everything one container.

Load testing the Operator - early output

With the work on #104 to add Metrics into Prometheus we've been able to combine this with a locust loadtest and the run some early tests with a mocked databricks API and graph the k8s-api-server, locust-metrics and databricks-operator-metrics to see how things scale.

Very early days but I'd be interested to know how much of this work you'd like to see contributed back to the project.

image

Shout out to the team for their work here @stuartleeks @EliiseS @martinpeck and @storey247

Delete SecretScope Api object doesn't delete SecretScope in databricks

If you delete the SecretScope Api object
https://australiaeast.azuredatabricks.net/api/2.0/secrets/scopes/list Still shows that SecretScope

if you call https://australiaeast.azuredatabricks.net/api/2.0/secrets/list?scope=dsecretscope-twitters it returns empty json object {} instead of

{
    "error_code": "RESOURCE_DOES_NOT_EXIST",
    "message": "Scope xxxx does not exist!"
} 

Error on secretscope delete/update

Hi,

I've been trying to delete and update secretScope and run into issue.
first update:
Rbac is missing some permission to patch the existing

'events "***" is forbidden: User "system:serviceaccount:azure-databricks-operator-system:default" cannot patch resource "events" in API group "" in the namespace "****"' (will not retry!)

Second delete:
Cascade of events after a kubectl delete secretscope
it goes in a CrashLoop at the operator pod level.
only a delete of the pod will bring it back (plus some manual cleanup)
To get rid of the secretscope in kube I have to delete the finalizer

2019-10-01T01:07:41.687Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "secretscope", "request": "***/*********", "error": "error when submitting secret scope to the API: Response from server (400) {\"error_code\":\"RESOURCE_ALREADY_EXISTS\",\"message\":\"Scope ***** already exists!\"}"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
2019-10-01T01:07:42.687Z	INFO	controllers.SecretScope	Finish reconcile loop for dx/lbr-alarm-95986
E1001 01:07:42.687910       1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:76
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:65
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/panic.go:522
/usr/local/go/src/runtime/panic.go:82
/usr/local/go/src/runtime/signal_unix.go:390
/workspace/controllers/secretscope_controller_databricks.go:193
/workspace/controllers/secretscope_controller_finalizer.go:37
/workspace/controllers/secretscope_controller.go:66
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:1337
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1155031]

goroutine 320 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:58 +0x105
panic(0x12cf380, 0x215a8a0)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/microsoft/azure-databricks-operator/controllers.(*SecretScopeReconciler).delete(0xc000222240, 0xc006ba5680, 0x1, 0x1470e27)
	/workspace/controllers/secretscope_controller_databricks.go:193 +0x41
github.com/microsoft/azure-databricks-operator/controllers.(*SecretScopeReconciler).handleFinalizer(0xc000222240, 0xc006ba5680, 0xc00041a120, 0xc00626b170)
	/workspace/controllers/secretscope_controller_finalizer.go:37 +0x95
github.com/microsoft/azure-databricks-operator/controllers.(*SecretScopeReconciler).Reconcile(0xc000222240, 0xc00349455a, 0x2, 0xc003494540, 0xf, 0x216e900, 0x0, 0x0, 0x0)
	/workspace/controllers/secretscope_controller.go:66 +0x7af
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000206000, 0x131a4e0, 0xc004bbca40, 0x131a400)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216 +0x149
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000206000, 0xc000add300)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192 +0xb5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc000206000)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0005e90a0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x54
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0005e90a0, 0x3b9aca00, 0x0, 0xc000000001, 0xc0004ae9c0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0005e90a0, 0x3b9aca00, 0xc0004ae9c0)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:157 +0x311

Errors creating cluster

We see these errors reconciling clusters:

2019-11-24T23:19:58.371Z        ERROR   controller-runtime.controller   Reconciler error        {"controller": "dcluster", "request": "dx/interactive-always-on-nopassthrough", "error": "err
or when refreshing cluster: unexpected end of JSON input"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

Here is the YAML:

apiVersion: databricks.microsoft.com/v1alpha1
kind: Dcluster
spec:
  autotermination_minutes: 120
  cluster_name: interactive-always-on-nopassthrough
  custom_tags:
  - key: ResourceClass
    value: Serverless
  driver_node_type_id: Standard_D3_v2
  enable_elastic_disk: true
  init_scripts:
  - dbfs:
      destination: dbfs:/databricks/init_scripts/openssl_fix.sh
  - dbfs:
      destination: dbfs:/databricks/init_scripts/log_analytics/listeners.sh
  node_type_id: Standard_D3_v2
  num_workers: 5
  spark_conf:
    spark.databricks.cluster.profile: serverless
    spark.databricks.delta.preview.enabled: "true"
    spark.databricks.repl.allowedLanguages: sql,python
    spark.hadoop.hive.server2.enable.doAs: "false"
  spark_env_vars:
    PYSPARK_PYTHON: /databricks/python3/bin/python3
  spark_version: latest-stable-scala2.11
status:
  cluster_info:
    cluster_cores: "0"
    cluster_id: 1113-043420-odors235

generating random string consolidation

There's a helpers.go under api/alphav1 with funcs for generating random string characters but there's also randomStringWithCharset in controllers/suit_test.go. It might be good to rip it out and put helpers.go in a folder accessible to all tests.

Add databricks golang sdk Mock

Currently, we have integration tests as part of our tests we call databricks API and create resources.

We should be to isolate k8s controllers and Group API types and be able to test them without calling databricks API.

memory leak - oomkilled regularly

even with 1Gig of memory for limit, we get oomkilled regularly

manager:
    Container ID:  docker://62d43076d560f831bd01c41766444aa7d4795f5433b45ac88d8f1fcad2c423ef
    Image:         mcr.microsoft.com/k8s/azure-databricks/operator:7bb0a68096d6e32f78ebffca6c5c3f5e507eff8e
    Image ID:      docker-pullable://mcr.microsoft.com/k8s/azure-databricks/operator@sha256:dc10d4b0f23d9077ca2a55a65d1b1655a8da4750a5dc49da7b32d90533c033fa
    Port:          <none>
    Host Port:     <none>
    Command:
      /manager
    Args:
      --metrics-addr=127.0.0.1:8080
    State:          Running
      Started:      Thu, 17 Oct 2019 13:00:12 +1100
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Thu, 17 Oct 2019 09:53:25 +1100
      Finished:     Thu, 17 Oct 2019 13:00:11 +1100
    Ready:          True
    Restart Count:  15
    Limits:
      cpu:     500m
      memory:  1Gi
    Requests:
      cpu:     200m
      memory:  512Mi

Run sits in invalid state without Status or RunId if call to RunsGetOutput fails

There are several lifecycle states for a run: https://docs.databricks.com/dev-tools/api/latest/jobs.html#runlifecyclestate.

Terminal states are TERMINATED, SKIPPED AND INTERNAL_ERROR. The other states are PENDING, RUNNING AND TERMINATING. But in certain circumstances I see that a Run (which i can see inside DB using the webui) is still showing as State: <blank> and RunId: <blank>

Now, the issue:
This condition works most of the time, but sometimes it doesn't. When the issue happens, the logs show the the following error:

2019-11-22T10:03:39.781Z        INFO    controllers.Run Refreshing run test-run
2019-11-22T10:03:49.781Z        ERROR   controller-runtime.controller   Reconciler error        {"controller": "run", "request": "kubeflow/test-run", "error": "error when refreshing run: Get https://westeurope.azuredatabricks.net/api/2.0/jobs/runs/get-output?run_id=18: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

So I guess there is some bug in the operator that is not setting life_cycle_state to a valid value when getting that exception and the step to create the run is not updating the K8s status correctly in an error state

Notebook doesn't know the name of the secret scope

The operator is creating Databricks Secret Scopes with name instance.ObjectMeta.Name and appending "_scope"

However our notebooks don't know the value of instance.ObjectMeta.Name and hence cannot construct the secret scope name.

Creating K8s SecretScope object may fail if the Databricks SecretScope exists

Setting up SecretScope may fail even if the kubernetes secret referenced in it exists.
Empty scope in databricks will be created but no content.
on kubernetes the SecretScope as no status...

status: {}
As opposed to:

status:
  secretscope:
    backend_type: DATABRICKS
    name: lbr-device-95986
Operator will then try to set it up again but since there is already an empty Scope in the remote databricks it will fail, everytime.

2019-10-16T03:35:06.786Z    ERROR    controller-runtime.controller    Reconciler error    {"controller": "secretscope", "request": "dx/lbr-alarm-95986", "error": "error when submitting secret scope to the API: Response from server (400) {\"error_code\":\"RESOURCE_ALREADY_EXISTS\",\"message\":\"Scope lbr-alarm-95986 already exists!\"}"}
github.com/go-logr/zapr.(*zapLogger).Error
    /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

I don't have log of the initial failure.

I tried to just delete the Scope in databricks but somehow, nothing happen. operator didn't try to set it up again.

in that scenario, we have to manually delete Run Djob and SecretScope and reinstall them.
Another Issue comes in when we try to delete a SecretScope that was unsuccessful. The operator crashes.
We need to manually delete the secretScope (edit remove finalizer)

2019-10-16T03:44:02.406Z    INFO    controllers.SecretScope    Starting reconcile loop for dx/lbr-alarm-95986
2019-10-16T03:44:02.406Z    INFO    controllers.SecretScope    Finish reconcile loop for dx/lbr-alarm-95986
E1016 03:44:02.406550       1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:76
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:65
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/panic.go:522
/usr/local/go/src/runtime/panic.go:82
/usr/local/go/src/runtime/signal_unix.go:390
/workspace/controllers/secretscope_controller_databricks.go:193
/workspace/controllers/secretscope_controller_finalizer.go:37
/workspace/controllers/secretscope_controller.go:66
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:1337
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11554d1]
goroutine 307 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:58 +0x105
panic(0x12cf5c0, 0x215a8a0)
    /usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/microsoft/azure-databricks-operator/controllers.(*SecretScopeReconciler).delete(0xc0002ea360, 0xc00311a000, 0x1, 0x1471067)
    /workspace/controllers/secretscope_controller_databricks.go:193 +0x41
github.com/microsoft/azure-databricks-operator/controllers.(*SecretScopeReconciler).handleFinalizer(0xc0002ea360, 0xc00311a000, 0xc00000dba0, 0xc0022fc750)
    /workspace/controllers/secretscope_controller_finalizer.go:37 +0x95
github.com/microsoft/azure-databricks-operator/controllers.(*SecretScopeReconciler).Reconcile(0xc0002ea360, 0xc0015d822a, 0x2, 0xc0015d8200, 0xf, 0x216e900, 0x0, 0x0, 0x0)
    /workspace/controllers/secretscope_controller.go:66 +0x7af
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000378be0, 0x131a720, 0xc001efef60, 0x131a700)
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216 +0x149
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000378be0, 0xc000276a00)
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192 +0xb5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc000378be0)
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00008bee0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152 +0x54
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00008bee0, 0x3b9aca00, 0x0, 0x1, 0xc0000881e0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc00008bee0, 0x3b9aca00, 0xc0000881e0)
    /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
    /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:157 +0x311
rpc error: code = Unknown desc = Error: No such container: 0bc65f23c711e08b3f871e3081aba9a0f10f7398e8ca37d5af233b41a1527a9f

dcluster doesn't show the STATE and NUMWORKERS

How to replicate:
Use below Yaml file

---
apiVersion: databricks.microsoft.com/v1alpha1
kind: Dcluster
metadata:
  name: interactive-cluster-1
spec:
  spark_version: latest-stable-scala2.11
  node_type_id: Standard_D3_v2
  autoscale:
    min_workers: 1
    max_workers: 6
  driver_node_type_id: Standard_D3_v2
  custom_tags:
  - key: Tag
    value: CustomTag1
  spark_env_vars:
    PYSPARK_PYTHON: /databricks/python3/bin/python3
  enable_elastic_disk: true

the run kubectl apply -f docs/samples/2_job_rub/interactive-cluster1.yaml

kubectl get dcluster                         
NAME                    AGE    CLUSTERID             STATE   NUMWORKERS
interactive-cluster-2   110m   1018-033048-slew898     

then run kubectl get dcluster

Applying ACL on non Premium Databricks

If you're running a Databricks instance which is not on the premium tier, ACL is not available.

Regardless if your config has acls set or not, the operator will still try to list all ACLs. Listing ACLs will return Error: {"error_code":"PERMISSION_DENIED","message":"ACL is not supported in your workspace."} if you are not on the premium tier.

If ACLs are not available, the config will fail and be put back onto the reconcile loop. It will try create the secret scope again and because it already exists, fail and but put back on the loop.

Instead what should happen is:

  • If acls is not set in the config, don't call submitACLs.
  • If acls is set in the config and not on premium tier, don't put job back on the reconcile loop and an event should be logged.

Run CRD reports incorrect state information when Run fails in DataBricks

  • Create a job & run using sample databricks_v1alpha1_djob.yaml
  • The job will not work as the jar file specified in the spec is invalid.
  • Try and create a Run object for this job using the databricks_v1alpha1_run_job.yaml
  • The run will fail because the jar file is invalid, but the Operator throws an exception and never updates its state.
  • Log shows the following error:
2019-11-12T14:36:56.612Z        ERROR   controller-runtime.controller   Reconciler error        {"controller": "run", "request": "default/run-sample", "error": "error when refreshing run: Run result unavailable: job failed with error message\n Library installation failed for library jar: \"dbfs:/my-jar.jar\"\n. Error messages:\njava.lang.Throwable: java.io.FileNotFoundException: dbfs:/my-jar.jar"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
  • This error then proceeds to keep recurring in the logs

Expected behaviour:

  • No error reported
  • Correct status reported

Randomise Names in Tests

To prevent tests from failing because of left over, undeleted deployments in the cluster, it would be good if we could make names of our deployments random.

r.Update(context.Background(), instance) throws exception for newly submitted job

Reconciler error {"controller": "notebookjob", "request": "default/sample1run16", "error": "error when refreshing job to API: error when updating NotebookJob: Operation cannot be fulfilled on notebookjobs.databricks.microsoft.com "sample1run16": the object has been modified; please apply your changes to the latest version and try again"}

Controllers do not report upstream dependency metrics

Operator calls a number of upstream dependencies via DataBricks sdk. Unfortunately at present there is no way of finding how many calls happen or how long they take.

Thinking about this from a performance standpoint it would be nice to have instrumentation on the hot-path of the code for commonly used entities such as job, cluster and run; with a framework that would allow easy extension into the other operations.

Metrics should most likely be exposed by standard K8s metric tooling Prometheus

Failed to create Dcluster object

I've just installed v0.30 and attempted to create a DCluster using the config/samples yaml.

Kubernetes version 1.13.10

I get the following error in the operator logs:

2019-10-14T17:55:03.104Z        INFO    controllers.Dcluster    Starting reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:03.104Z        INFO    controllers.Dcluster    AddFinalizer for kubeflow/dcluster-sample
2019-10-14T17:55:03.128Z        INFO    controllers.Dcluster    Finish reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:03.128Z        DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "dcluster", "request": "kubeflow/dcluster-sample"}
2019-10-14T17:55:03.128Z        INFO    controllers.Dcluster    Starting reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:03.128Z        INFO    controllers.Dcluster    Submit for kubeflow/dcluster-sample
2019-10-14T17:55:03.128Z        INFO    controllers.Dcluster    Create cluster dcluster-sample
2019-10-14T17:55:03.128Z        DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"Dcluster","namespace":"kubeflow","name":"dcluster-sample","uid":"bf7ba102-eeab-11e9-a0ba-1e18e514b3df","apiVersion":"databricks.microsoft.com/v1alpha1","resourceVersion":"8427"}, "reason": "Added", "message": "Object finalizer is added"}
2019-10-14T17:55:10.006Z        INFO    controllers.Dcluster    Finish reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:10.006Z        DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "dcluster", "request": "kubeflow/dcluster-sample"}
2019-10-14T17:55:10.006Z        INFO    controllers.Dcluster    Starting reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:10.007Z        INFO    controllers.Dcluster    Refresh for kubeflow/dcluster-sample
2019-10-14T17:55:10.007Z        INFO    controllers.Dcluster    Refresh cluster dcluster-sample
2019-10-14T17:55:10.007Z        DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"Dcluster","namespace":"kubeflow","name":"dcluster-sample","uid":"bf7ba102-eeab-11e9-a0ba-1e18e514b3df","apiVersion":"databricks.microsoft.com/v1alpha1","resourceVersion":"8443"}, "reason": "Submitted", "message": "Object is submitted"}
2019-10-14T17:55:10.704Z        INFO    controllers.Dcluster    Finish reconcile loop for kubeflow/dcluster-sample
2019-10-14T17:55:10.704Z        ERROR   controller-runtime.controller   Reconciler error        {"controller": "dcluster", "request": "kubeflow/dcluster-sample", "error": "error when refreshing cluster: unexpected end of JSON input"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
$ k get dclusters.databricks.microsoft.com                                                                                                           NAME              AGE   CLUSTERID              STATE   NUMWORKERS
dcluster-sample   2m    1014-175509-erred163

$ k describe dclusters.databricks.microsoft.com  dcluster-sample                                                                                     Name:         dcluster-sample
Namespace:    kubeflow
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"databricks.microsoft.com/v1alpha1","kind":"Dcluster","metadata":{"annotations":{},"name":"dcluster-sample","namespace":"kub...
API Version:  databricks.microsoft.com/v1alpha1
Kind:         Dcluster
Metadata:
  Creation Timestamp:  2019-10-14T17:55:03Z
  Finalizers:
    dcluster.finalizers.databricks.microsoft.com
  Generation:        2
  Resource Version:  8443
  Self Link:         /apis/databricks.microsoft.com/v1alpha1/namespaces/kubeflow/dclusters/dcluster-sample
  UID:               bf7ba102-eeab-11e9-a0ba-1e18e514b3df
Spec:
  Autoscale:
    max_workers:  5
    min_workers:  2
  cluster_name:   dcluster-sample
  node_type_id:   Standard_D3_v2
  spark_version:  5.3.x-scala2.11
Status:
  cluster_info:
    cluster_cores:  0
    cluster_id:     1014-175509-erred163
Events:
  Type    Reason     Age    From                 Message
  ----    ------     ----   ----                 -------
  Normal  Added      3m40s  dcluster-controller  Object finalizer is added
  Normal  Submitted  3m33s  dcluster-controller  Object is submitted

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.