canonical / bundle-kubeflow Goto Github PK

View Code? Open in Web Editor NEW

95.0 12.0 49.0 2.41 MB

Charmed Kubeflow

License: Apache License 2.0

Python 86.48% Shell 13.52%

kubernetes juju kubeflow operator-framework operator bundle charmed-kubeflow

bundle-kubeflow's Introduction

Kubeflow Operators

Introduction

Charmed Kubeflow is a full set of Kubernetes operators to deliver the 30+ applications and services that make up the latest version of Kubeflow, for easy operations anywhere, from workstations to on-prem, to public cloud and edge.

A charm is a software package that includes an operator together with metadata that supports the integration of many operators in a coherent aggregated system.

This technology leverages the Juju Operator Lifecycle Manager to provide day-0 to day-2 operations of Kubeflow.

Visit charmed-kubeflow.io for more information.

Install

For any Kubernetes, follow the installation instructions.

Testing

To deploy this bundle and run tests locally, do the following:

Set up Kubernetes, Juju, and deploy the bundle you're interested in (kubeflow or kubeflow-lite) using the installation guide. This must include populating the .kube/config file with your Kubernetes cluster as the active context. Do not create a namespace with the same name as the username, this will cause pipelines tests to fail. Beware of using admin as the dex-auth static-username as the tests attempt to create a profile and admin conflicts with an existing default profile.

Install test prerequisites:

sudo snap install juju-wait --classic
sudo snap install juju-kubectl --classic
sudo snap install charmcraft --classic
sudo apt update
sudo apt install -y libssl-dev firefox-geckodriver
sudo pip3 install tox
sudo pip3 install -r requirements.txt

Run tests on your bundle with tox. As many tests need authentication, make sure you pass the username and password you set in step (1) through environment variable or argument, for example:
- full bundle (using command line arguments):
```
tox -e tests -- -m full --username [email protected] --password user123
```
- lite bundle (using environment variables):
```
export [email protected]
export KUBEFLOW_AUTH_PASSWORD=user1234
tox -e tests -- -m lite
```

Subsets of the tests are also available using pytest's substring expression selector (e.g.: tox -e tests -- -m full --username [email protected] --password user123 -k 'selenium' to run just the selenium tests).

Documentation

Read the official documentation.

bundle-kubeflow's People

Contributors

Stargazers

Watchers

Forkers

knkski phvalguima aym-frikha amr-elsehemy rui-vas lqdev davidspek ycliuhw macduff23 domfleischmann dnplas terrorizer1980 ca-scribner gitohm hellobiek blogericg miaaltieri cyberflamego breaktime11 violethaze74 paulomach malawskim lugia1989 shayancanonical delameta-ai-research gtrkiller mateoflorido triet1102 nobuto-m samkenxstream snapbuy choyuansu jack-w-shaw isabella232 trellixvulnteam taurus-forever millerhooks christellejulias redlinejoes jonathanelbailey i-chvets zn-sellena2000 jylee-devstack phyowai581978 phoevos hazem-soussi

bundle-kubeflow's Issues

Installation on macOS not working

ran KUBEFLOW_DEBUG=true microk8s.enable kubeflow and fails with the following:

Just trying to get a local kubeflow environment going with microk8s

Running:
macOS Catalina 10.15.4
multipass 1.1.0+mac
multipassd 1.1.0+mac

17:52:06 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/seldon-core-15/meta/any?include=id&include=supported-series&include=published {
17:52:07 DEBUG httpbakery client.go:245 } -> error
17:52:07 DEBUG juju.api monitor.go:35 RPC connection died
17:52:07 DEBUG juju.api monitor.go:35 RPC connection died
17:52:07 DEBUG juju.rpc server.go:328 error closing codec: write tcp 192.168.64.4:37816->10.152.183.80:17070: write: broken pipe
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/seldon-core-15": connection is shut down
17:52:07 DEBUG cmd supercommand.go:519 error stack:
/workspace/_build/src/github.com/juju/juju/rpc/client.go:14: connection is shut down
/workspace/_build/src/github.com/juju/juju/rpc/client.go:178:
/workspace/_build/src/github.com/juju/juju/api/apiclient.go:1187:
/workspace/_build/src/github.com/juju/juju/api/client.go:459:
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/store.go:68:
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:543: cannot add charm "cs:~kubeflow-charmers/seldon-core-15"
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:475:
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:164:
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/deploy.go:969: cannot deploy bundle
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/deploy.go:1540:

Command '('microk8s-juju.wrapper', '--debug', 'deploy', 'cs:kubeflow', '--channel', 'stable', '--overlay', '/tmp/tmp46o45t78')' returned non-zero exit status 1
Failed to enable kubeflow

22:10:04 DEBUG juju.kubernetes.provider events.go:51 getting the latest event for "involvedObject.name=controller,involvedObject.kind=StatefulSet"
22:10:04 INFO juju.juju api.go:302 API endpoints changed from [] to [10.152.183.231:17070]
22:10:04 INFO cmd controller.go:89 Contacting Juju controller at 10.152.183.231 to verify accessibility...
22:10:04 INFO juju.juju api.go:67 connecting to API addresses: [10.152.183.231:17070]
22:20:02 ERROR juju.cmd.juju.commands bootstrap.go:776 unable to contact api server after 1 attempts: unable to connect to API: dial tcp 10.152.183.231:17070: connect: connection refused
22:20:02 DEBUG juju.cmd.juju.commands bootstrap.go:777 (error details: [{/build/juju/parts/juju/go/src/github.com/juju/juju/cmd/juju/common/controller.go:128: unable to contact api server after 1 attempts} {/build/juju/parts/juju/go/src/github.com/juju/juju/cmd/juju/common/controller.go:44: } {/build/juju/parts/juju/go/src/github.com/juju/juju/cmd/modelcmd/modelcommand.go:405: } {/build/juju/parts/juju/go/src/github.com/juju/juju/cmd/modelcmd/modelcommand.go:424: } {/build/juju/parts/juju/go/src/github.com/juju/juju/cmd/modelcmd/base.go:214: } {/build/juju/parts/juju/go/src/github.com/juju/juju/juju/api.go:72: } {/build/juju/parts/juju/go/src/github.com/juju/juju/api/apiclient.go:207: } {/build/juju/parts/juju/go/src/github.com/juju/juju/api/apiclient.go:622: } {/build/juju/parts/juju/go/src/github.com/juju/juju/api/apiclient.go:967: } {/build/juju/parts/juju/go/src/github.com/juju/juju/api/apiclient.go:1071: unable to connect to API} {/build/juju/parts/juju/go/src/github.com/juju/juju/api/apiclient.go:1096: } {dial tcp 10.152.183.231:17070: connect: connection refused}])

katib random example does not work

Hello,

after applying the random example for Katib, the random experiment stick on running but does never complete:

This is an output of one of the trials:
https://paste.ubuntu.com/p/XpqDWWSmfQ/

The pods are showing complete but they are not:

afrikha@afrikha-canonical:~/CPE/demos/bundle-kubeflow/lma$ kubectl get pods -n admin
NAME READY STATUS RESTARTS AGE
random-example-cfdbg84k-w7fx8 0/1 Completed 0 23m
random-example-cgpwrfvb-lv6wt 0/1 Completed 0 23m
random-example-lhzflq7t-6qb6g 0/1 Completed 0 23m
random-example-random-64b574949c-pmcpl 1/1 Running 0 24m

scripts/cli.py has hardcoded units numbers

During the deployment of CDK, I had a tiny issue, which I resolved by adding a new unit of kubernetes-worker and removing kubernetes-worker/0. So I have kubernetes-worker/1, /2 and /3, with #2 being the master.

I then deployed kubeflow on top of it with the script. It failed here because no kubernetes-worker/0 was found. Ideally this should not be hardcoded and should depend on the actual units deployed. Now I am unsure at which state the kubeflow deployment failed and if I should redeploy everything instead.

DEBUG:root:seldon-cluster-manager is lead by seldon-cluster-manager/0
DEBUG:root:modeldb-store is lead by modeldb-store/0
DEBUG:root:tensorboard is lead by tensorboard/0
Traceback (most recent call last):
  File "scripts/cli.py", line 448, in <module>
    cli()
  File "/home/crodriguez/.local/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/crodriguez/.local/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/crodriguez/.local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/crodriguez/.local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/crodriguez/.local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "scripts/cli.py", line 287, in deploy_to
    pub_addr = get_pub_addr(cloud, controller, model)
  File "scripts/cli.py", line 200, in get_pub_addr
    return f'{get_pub_ip(controller, model)}.xip.io'
  File "scripts/cli.py", line 180, in get_pub_ip
    return status['applications']['kubernetes-worker']['units']['kubernetes-worker/0'][
KeyError: 'kubernetes-worker/0'

Thanks!

katib db error

Hello, after deploying kubeflow bundle with k8s using this script I've got issues with katib-db:

Version: '10.3.17-MariaDB-1:10.3.17+maria~bionic' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
2020-06-17 19:29:36 9 [Warning] Access denied for user 'root'@'10.1.11.22' (using password: YES)
2020-06-17 19:29:41 11 [Warning] Access denied for user 'root'@'10.1.11.22' (using password: YES)
2020-06-17 19:29:46 13 [Warning] Access denied for user 'root'@'10.1.11.22' (using password: YES)
2020-06-17 19:29:51 14 [Warning] Access denied for user 'root'@'10.1.11.22' (using password: YES)
2020-06-17 19:29:56 16 [Warning] Access denied for user 'root'@'10.1.11.22' (using password: YES)
2020-06-17 19:30:01 17 [Warning] Access denied for user 'root'@'10.1.11.22' (using password: YES)
2020-06-17 19:30:06 19 [Warning] Access denied for user 'root'@'10.1.11.22' (using password: YES)
2020-06-17 19:30:11 20 [Warning] Access denied for user 'root'@'10.1.11.22' (using password: YES)
2020-06-17 19:30:16 22 [Warning] Access denied for user 'root'@'10.1.11.22' (using password: YES)
2020-06-17 19:30:21 23 [Warning] Access denied for user 'root'@'10.1.11.22' (using password: YES)

Any idea about that ?

Can't successfully run seldon deployment with tensorflow image (wrong port)

Creating seldon deployment that is run using tensorflow/serving image.
Here is the yaml file of the job - https://pastebin.ubuntu.com/p/KpNyz8qDf9/
Essentially the job is instructing the manager to probe model container on port 9000 (which is not defined anywhere), while the container is listening on the port 8501 (API).
Consequently the manager never claims the model to be ready, though model container starts successfully.

Log output from the model container: https://pastebin.ubuntu.com/p/8PqmgQC3mV/
Pod description: https://pastebin.ubuntu.com/p/Qnmfj26Jmw/

ingress "host" value should be configurable for accessing ambassador

When running microk8s with kubeflow in the VM (on Windows) the ingress is created with "localhost" value, so the ingress doesn't allow to connect to the ambassador. Should be configurable name value.

Extreme resource draining issue after deploying kubeflow

I'm following the readme to deploy kubeflow juju bundle on top of microk8s,
I'm using an ubuntu 18.04 virtual machine via virtual box allocating 16GB ram and 4 cores,
All works fine, and finally I'm able to access kubeflow and deploy pipelines.
But the machine gets slower and slower by time, till it starts to hang and almost freeze, after an hour of usage .
looking for the resource allocation on ubuntu, I find the 4 core processors allocated all hitting 95% to 100% utilization.

Restarting the vm also doesn't help because once it starts the resource allocation reaches the above limits even without running any experiment .

Is there any explanation for this ?

Multiple pods in crashloopbackoff

Hello,
I deployed a ubuntu 18.04 VM with 8GB ram, 40GB disk space.
On it I deployed the juju snaps, microk8s, and the latest bundle-kubeflow. Multiple pods are not coming online and present various errors. It seems like a recurrent issue.

ubuntu@ai-demo:~$ m get pods -n kubeflow
NAME                                           READY   STATUS             RESTARTS   AGE
ambassador-79574bd65b-654hx                    1/1     Running            0          57m
ambassador-operator-0                          1/1     Running            0          58m
argo-controller-64cc95f77b-nl6s2               1/1     Running            0          50m
argo-controller-operator-0                     1/1     Running            0          58m
argo-ui-84d8b568d8-krm7n                       1/1     Running            0          57m
argo-ui-operator-0                             1/1     Running            0          57m
jupyter-controller-7bccb55f46-sjl97            1/1     Running            0          57m
jupyter-controller-operator-0                  1/1     Running            0          57m
jupyter-web-9dc84c45b-mx9fr                    1/1     Running            0          57m
jupyter-web-operator-0                         1/1     Running            0          57m
katib-controller-56dd5bf95b-s45s5              1/1     Running            0          55m
katib-controller-operator-0                    1/1     Running            0          57m
katib-db-0                                     1/1     Running            1          56m
katib-db-operator-0                            1/1     Running            0          57m
katib-manager-5d6cc65b8c-vmhjc                 0/1     CrashLoopBackOff   13         48m
katib-manager-operator-0                       1/1     Running            0          57m
katib-ui-76974795f9-7r85z                      1/1     Running            0          56m
katib-ui-operator-0                            1/1     Running            0          57m
kubeflow-dashboard-757c877956-jclqd            1/1     Running            0          50m
kubeflow-dashboard-operator-0                  1/1     Running            0          57m
kubeflow-gatekeeper-6f9fcf8c55-gcdfw           1/1     Running            0          54m
kubeflow-gatekeeper-operator-0                 1/1     Running            0          57m
kubeflow-login-97d55d69f-9vzhg                 1/1     Running            0          55m
kubeflow-login-operator-0                      1/1     Running            0          56m
kubeflow-profiles-57fd5c6d78-6fzqm             2/2     Running            0          54m
kubeflow-profiles-operator-0                   1/1     Running            0          56m
metacontroller-5ccc9b744d-sw49n                1/1     Running            0          54m
metacontroller-operator-0                      1/1     Running            0          56m
metadata-controller-7f94875696-s24tr           0/1     CrashLoopBackOff   5          47m
metadata-controller-operator-0                 1/1     Running            0          56m
metadata-db-0                                  1/1     Running            1          54m
metadata-db-operator-0                         1/1     Running            0          56m
metadata-ui-58bdd9b6bc-ntzjk                   1/1     Running            0          50m
metadata-ui-operator-0                         1/1     Running            0          56m
minio-0                                        1/1     Running            0          54m
minio-operator-0                               1/1     Running            0          55m
modeldb-backend-797f77c488-9vrkz               1/2     Error              5          44m
modeldb-backend-operator-0                     1/1     Running            0          55m
modeldb-db-0                                   1/1     Running            1          54m
modeldb-db-operator-0                          1/1     Running            0          55m
modeldb-store-fbf49bdf8-7mlqh                  1/1     Running            0          54m
modeldb-store-operator-0                       1/1     Running            0          55m
modeldb-ui-78b6dd66b8-zwpdf                    1/1     Running            0          45m
modeldb-ui-operator-0                          1/1     Running            0          55m
pipelines-api-6c6f459c98-q2grc                 0/1     CrashLoopBackOff   9          44m
pipelines-api-operator-0                       1/1     Running            0          54m
pipelines-db-0                                 1/1     Running            1          53m
pipelines-db-operator-0                        1/1     Running            0          53m
pipelines-persistence-664c75f577-95x7d         0/1     CrashLoopBackOff   7          48m
pipelines-persistence-operator-0               1/1     Running            0          53m
pipelines-scheduledworkflow-79cc64c7c4-n4nwm   0/1     Init:0/1           0          50m
pipelines-scheduledworkflow-operator-0         1/1     Running            0          53m
pipelines-ui-867bd9ccf4-7fkwv                  1/1     Running            0          44m
pipelines-ui-operator-0                        1/1     Running            0          53m
pipelines-viewer-74c4f8bcd-rzxml               1/1     Running            0          50m
pipelines-viewer-operator-0                    1/1     Running            0          52m
pytorch-operator-79b6bf8d4c-bz2bt              1/1     Running            0          49m
pytorch-operator-operator-0                    1/1     Running            0          52m
redis-5b9c9c4b45-ctcc7                         1/1     Running            0          48m
redis-operator-0                               1/1     Running            0          52m
seldon-api-frontend-74cbb778cc-t4w4g           1/1     Running            0          45m
seldon-api-frontend-operator-0                 1/1     Running            0          52m
seldon-cluster-manager-747565f949-47k9z        1/1     Running            1          45m
seldon-cluster-manager-operator-0              1/1     Running            0          52m
tensorboard-85b6fc699f-vj9fg                   1/1     Running            0          49m
tensorboard-operator-0                         1/1     Running            0          52m
tf-job-dashboard-9b5b659bb-x8cqm               1/1     Running            0          49m
tf-job-dashboard-operator-0                    1/1     Running            0          52m
tf-job-operator-6bc8cb454c-gwxls               1/1     Running            0          48m
tf-job-operator-operator-0                     1/1     Running            0          52m

First pod in problem is the katib-manager. It seems like a readiness issue, the port isn't responding.

ubuntu@ai-demo:~/bundle-kubeflow$ m describe pod katib-manager-5d6cc65b8c-vmhjc -n kubeflow
Name:         katib-manager-5d6cc65b8c-vmhjc
Namespace:    kubeflow
Priority:     0
Node:         ai-demo/10.180.213.139
Start Time:   Tue, 03 Dec 2019 16:29:50 -0600
Labels:       juju-app=katib-manager
              pod-template-hash=5d6cc65b8c
Annotations:  apparmor.security.beta.kubernetes.io/pod: runtime/default
              juju.io/controller: e1a8ad96-a251-47a5-8c33-3b8a8fe66bbd
              juju.io/model: b0ee3e19-b70e-49a4-81c4-fb9e4242b426
              juju.io/unit: katib-manager/0
              seccomp.security.beta.kubernetes.io/pod: docker/default
Status:       Running
IP:           10.1.44.71
IPs:
  IP:           10.1.44.71
Controlled By:  ReplicaSet/katib-manager-5d6cc65b8c
Init Containers:
  juju-pod-init:
    Container ID:  containerd://5f015946a8c76d221fba8ac787b67f2dac9581f9f79105469118ed2403a22c1b
    Image:         jujusolutions/jujud-operator:2.7.0
    Image ID:      docker.io/jujusolutions/jujud-operator@sha256:375eee66a4a7af6128cb84c32a94a1abeffa4f4872e063ba935296701776b5e5
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools
      
      mkdir -p $JUJU_TOOLS_DIR
      cp /opt/jujud $JUJU_TOOLS_DIR/jujud
      initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
      if test -n "$initCmd"; then
      $JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
      else
      exit 0
      fi
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 03 Dec 2019 16:30:15 -0600
      Finished:     Tue, 03 Dec 2019 16:31:38 -0600
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Containers:
  katib-manager:
    Container ID:  containerd://0e46f446f24c1c64d109308fd4b64fcbd348462993ceb8493be4bc7c4d2ca1af
    Image:         registry.jujucharms.com/kubeflow-charmers/katib-manager/oci-image@sha256:28dddef61f71a8e8de0999c67ec60c38d2c1a91d6e24b96ec1e5ba4401add07e
    Image ID:      registry.jujucharms.com/kubeflow-charmers/katib-manager/oci-image@sha256:28dddef61f71a8e8de0999c67ec60c38d2c1a91d6e24b96ec1e5ba4401add07e
    Port:          6789/TCP
    Host Port:     0/TCP
    Command:
      ./katib-manager
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Tue, 03 Dec 2019 17:20:48 -0600
      Finished:     Tue, 03 Dec 2019 17:21:27 -0600
    Ready:          False
    Restart Count:  15
    Liveness:       exec [/bin/grpc_health_probe -addr=:6789] delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      exec [/bin/grpc_health_probe -addr=:6789] delay=5s timeout=1s period=60s #success=1 #failure=5
    Environment:
      DB_NAME:      mysql
      DB_PASSWORD:  TW6VQALVDQ41NSQLF4EMG32D0F568T
      MYSQL_HOST:   10.152.183.67
      MYSQL_PORT:   3306
    Mounts:
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  juju-data-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  default-token-j9wx5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-j9wx5
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  <unknown>             default-scheduler  Successfully assigned kubeflow/katib-manager-5d6cc65b8c-vmhjc to ai-demo
  Normal   Pulled     54m                   kubelet, ai-demo   Container image "jujusolutions/jujud-operator:2.7.0" already present on machine
  Normal   Created    54m                   kubelet, ai-demo   Created container juju-pod-init
  Normal   Started    53m                   kubelet, ai-demo   Started container juju-pod-init
  Normal   Pulling    52m                   kubelet, ai-demo   Pulling image "registry.jujucharms.com/kubeflow-charmers/katib-manager/oci-image@sha256:28dddef61f71a8e8de0999c67ec60c38d2c1a91d6e24b96ec1e5ba4401add07e"
  Normal   Pulled     38m                   kubelet, ai-demo   Successfully pulled image "registry.jujucharms.com/kubeflow-charmers/katib-manager/oci-image@sha256:28dddef61f71a8e8de0999c67ec60c38d2c1a91d6e24b96ec1e5ba4401add07e"
  Warning  Unhealthy  36m (x2 over 37m)     kubelet, ai-demo   Readiness probe failed: timeout: failed to connect service ":6789" within 1s
  Normal   Created    36m (x3 over 38m)     kubelet, ai-demo   Created container katib-manager
  Normal   Started    36m (x3 over 38m)     kubelet, ai-demo   Started container katib-manager
  Warning  Unhealthy  34m (x17 over 37m)    kubelet, ai-demo   Liveness probe failed: timeout: failed to connect service ":6789" within 1s
  Warning  BackOff    13m (x78 over 33m)    kubelet, ai-demo   Back-off restarting failed container
  Normal   Killing    9m1s (x14 over 37m)   kubelet, ai-demo   Container katib-manager failed liveness probe, will be restarted
  Normal   Pulled     3m58s (x14 over 37m)  kubelet, ai-demo   Container image "registry.jujucharms.com/kubeflow-charmers/katib-manager/oci-image@sha256:28dddef61f71a8e8de0999c67ec60c38d2c1a91d6e24b96ec1e5ba4401add07e" already present on machine

Second one is the pod metadata-controller that shows an error when connecting to the mariadb server

ubuntu@ai-demo:~/bundle-kubeflow$ m describe pod metadata-controller-7f94875696-s24tr -n kubeflow
Name:         metadata-controller-7f94875696-s24tr
Namespace:    kubeflow
Priority:     0
Node:         ai-demo/10.180.213.139
Start Time:   Tue, 03 Dec 2019 16:30:26 -0600
Labels:       juju-app=metadata-controller
              pod-template-hash=7f94875696
Annotations:  apparmor.security.beta.kubernetes.io/pod: runtime/default
              juju.io/controller: e1a8ad96-a251-47a5-8c33-3b8a8fe66bbd
              juju.io/model: b0ee3e19-b70e-49a4-81c4-fb9e4242b426
              juju.io/unit: metadata-controller/0
              seccomp.security.beta.kubernetes.io/pod: docker/default
Status:       Running
IP:           10.1.44.72
IPs:
  IP:           10.1.44.72
Controlled By:  ReplicaSet/metadata-controller-7f94875696
Init Containers:
  juju-pod-init:
    Container ID:  containerd://13c5bfd0b5ce2d9d45b281a70f11aa5eb5c99aabe4a6360c629ac901de7861ef
    Image:         jujusolutions/jujud-operator:2.7.0
    Image ID:      docker.io/jujusolutions/jujud-operator@sha256:375eee66a4a7af6128cb84c32a94a1abeffa4f4872e063ba935296701776b5e5
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools
      
      mkdir -p $JUJU_TOOLS_DIR
      cp /opt/jujud $JUJU_TOOLS_DIR/jujud
      initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
      if test -n "$initCmd"; then
      $JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
      else
      exit 0
      fi
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 03 Dec 2019 16:30:44 -0600
      Finished:     Tue, 03 Dec 2019 16:32:55 -0600
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Containers:
  metadata:
    Container ID:  containerd://a2fc9241a7dae1bf22bb3285e71541eca020c58fec928c8aa24a39073e07635e
    Image:         registry.jujucharms.com/kubeflow-charmers/metadata-controller/oci-image@sha256:f2a0756e9c41f10cbd178e420e37ef0aaa5d60bbed34300a66b1c99745838d36
    Image ID:      registry.jujucharms.com/kubeflow-charmers/metadata-controller/oci-image@sha256:f2a0756e9c41f10cbd178e420e37ef0aaa5d60bbed34300a66b1c99745838d36
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      ./server/server
      --http_port=8080
      --mysql_service_host=10.152.183.188
      --mysql_service_port=3306
      --mysql_service_user=root
      --mysql_service_password=root
      --mlmd_db_name=metadb
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 04 Dec 2019 08:51:20 -0600
      Finished:     Wed, 04 Dec 2019 08:51:20 -0600
    Ready:          False
    Restart Count:  188
    Environment:
      MYSQL_ROOT_PASSWORD:  root
    Mounts:
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  juju-data-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  default-token-j9wx5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-j9wx5
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                   From              Message
  ----     ------   ----                  ----              -------
  Normal   Pulled   21m (x184 over 15h)   kubelet, ai-demo  Container image "registry.jujucharms.com/kubeflow-charmers/metadata-controller/oci-image@sha256:f2a0756e9c41f10cbd178e420e37ef0aaa5d60bbed34300a66b1c99745838d36" already present on machine
  Warning  BackOff  85s (x4287 over 15h)  kubelet, ai-demo  Back-off restarting failed container
ubuntu@ai-demo:~/bundle-kubeflow$
ubuntu@ai-demo:~/bundle-kubeflow$ m logs metadata-controller-7f94875696-s24tr -n kubeflow
F1204 14:51:20.776472       1 main.go:90] Failed to create ML Metadata Store with config mysql:<host:"10.152.183.188" port:3306 database:"metadb" user:"root" password:"root" > : mysql_real_connect failed: errno: 1130, error: Host '10.1.44.1' is not allowed to connect to this MariaDB server.
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000135100, 0xc0001b8000, 0x124, 0x17a)
	external/com_github_golang_glog/glog.go:769 +0xb1
github.com/golang/glog.(*loggingT).output(0x1633360, 0xc000000003, 0xc0001af2d0, 0x14eadd3, 0x7, 0x5a, 0x0)
	external/com_github_golang_glog/glog.go:720 +0x2f6
github.com/golang/glog.(*loggingT).printf(0x1633360, 0x3, 0xf6fee1, 0x37, 0xc00019be30, 0x2, 0x2)
	external/com_github_golang_glog/glog.go:655 +0x14e
github.com/golang/glog.Fatalf(...)
	external/com_github_golang_glog/glog.go:1148
main.mlmdStoreOrDie(0x0)
	server/main.go:90 +0x1c3
main.main()
	server/main.go:101 +0xe0

Third is the modeldb-backend

ubuntu@ai-demo:~/bundle-kubeflow$ m describe pod modeldb-backend-797f77c488-9vrkz -n kubeflow
Name:         modeldb-backend-797f77c488-9vrkz
Namespace:    kubeflow
Priority:     0
Node:         ai-demo/10.180.213.139
Start Time:   Tue, 03 Dec 2019 16:33:37 -0600
Labels:       juju-app=modeldb-backend
              pod-template-hash=797f77c488
Annotations:  apparmor.security.beta.kubernetes.io/pod: runtime/default
              juju.io/controller: e1a8ad96-a251-47a5-8c33-3b8a8fe66bbd
              juju.io/model: b0ee3e19-b70e-49a4-81c4-fb9e4242b426
              juju.io/unit: modeldb-backend/0
              seccomp.security.beta.kubernetes.io/pod: docker/default
Status:       Running
IP:           10.1.44.77
IPs:
  IP:           10.1.44.77
Controlled By:  ReplicaSet/modeldb-backend-797f77c488
Init Containers:
  juju-pod-init:
    Container ID:  containerd://548ecc057d06bcf349b64d920e905676696137fd44bcc43cc152ead0c69f1f16
    Image:         jujusolutions/jujud-operator:2.7.0
    Image ID:      docker.io/jujusolutions/jujud-operator@sha256:375eee66a4a7af6128cb84c32a94a1abeffa4f4872e063ba935296701776b5e5
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools
      
      mkdir -p $JUJU_TOOLS_DIR
      cp /opt/jujud $JUJU_TOOLS_DIR/jujud
      initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
      if test -n "$initCmd"; then
      $JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
      else
      exit 0
      fi
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 03 Dec 2019 16:33:40 -0600
      Finished:     Tue, 03 Dec 2019 16:34:42 -0600
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Containers:
  modeldb-backend:
    Container ID:  containerd://3a4251c4c8357c53159935efd7346a982500c6e4c05596cd28f65956687d7e99
    Image:         registry.jujucharms.com/kubeflow-charmers/modeldb-backend/oci-image@sha256:67e70b991598fe8fca12058e2cee1abc342ab26a0047ec4779cb6d8483d87161
    Image ID:      registry.jujucharms.com/kubeflow-charmers/modeldb-backend/oci-image@sha256:67e70b991598fe8fca12058e2cee1abc342ab26a0047ec4779cb6d8483d87161
    Port:          8085/TCP
    Host Port:     0/TCP
    Command:
      bash
    Args:
      -c
      ./wait-for-it.sh 10.152.183.242:3306 --timeout=10 && java -jar modeldb-1.0-SNAPSHOT-client-build.jar 
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 04 Dec 2019 08:52:30 -0600
      Finished:     Wed, 04 Dec 2019 08:52:36 -0600
    Ready:          False
    Restart Count:  182
    Environment:
      VERTA_MODELDB_CONFIG:  /config-backend/config.yaml
    Mounts:
      /config-backend/ from modeldb-backend-config-config (rw)
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
  modeldb-backend-proxy:
    Container ID:  containerd://bbe3449a4f54917085882a13b587f6446d9fe639a917b2639b352a750f3664af
    Image:         vertaaiofficial/modeldb-backend-proxy:kubeflow
    Image ID:      docker.io/vertaaiofficial/modeldb-backend-proxy@sha256:5e21c2f82df9b05f7309772dd2be946a8ef24ba43bd2579aa6af22c4827c9205
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /go/bin/proxy
    Args:
      -project_endpoint
      localhost:8085
      -experiment_endpoint
      localhost:8085
      -experiment_run_endpoint
      localhost:8085
    State:          Running
      Started:      Tue, 03 Dec 2019 17:14:40 -0600
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  juju-data-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  modeldb-backend-config-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      modeldb-backend-config-config
    Optional:  false
  default-token-j9wx5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-j9wx5
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                     From              Message
  ----     ------   ----                    ----              -------
  Warning  BackOff  4m42s (x4143 over 15h)  kubelet, ai-demo  Back-off restarting failed container

ubuntu@ai-demo:~/bundle-kubeflow$ m logs modeldb-backend-797f77c488-9vrkz -n kubeflow
Error from server (BadRequest): a container name must be specified for pod modeldb-backend-797f77c488-9vrkz, choose one of: [modeldb-backend modeldb-backend-proxy] or one of the init containers: [juju-pod-init]
ubuntu@ai-demo:~/bundle-kubeflow$

And last one is the pipelines api pod

ubuntu@ai-demo:~/bundle-kubeflow$ m describe pod pipelines-api-6c6f459c98-q2grc -n kubeflow
Name:         pipelines-api-6c6f459c98-q2grc
Namespace:    kubeflow
Priority:     0
Node:         ai-demo/10.180.213.139
Start Time:   Tue, 03 Dec 2019 16:33:54 -0600
Labels:       juju-app=pipelines-api
              pod-template-hash=6c6f459c98
Annotations:  apparmor.security.beta.kubernetes.io/pod: runtime/default
              juju.io/controller: e1a8ad96-a251-47a5-8c33-3b8a8fe66bbd
              juju.io/model: b0ee3e19-b70e-49a4-81c4-fb9e4242b426
              juju.io/unit: pipelines-api/0
              seccomp.security.beta.kubernetes.io/pod: docker/default
Status:       Running
IP:           10.1.44.78
IPs:
  IP:           10.1.44.78
Controlled By:  ReplicaSet/pipelines-api-6c6f459c98
Init Containers:
  juju-pod-init:
    Container ID:  containerd://22633e8e541e6bfcfe78f6de032dac424267c956275f0d2748b4a816a9e99266
    Image:         jujusolutions/jujud-operator:2.7.0
    Image ID:      docker.io/jujusolutions/jujud-operator@sha256:375eee66a4a7af6128cb84c32a94a1abeffa4f4872e063ba935296701776b5e5
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools
      
      mkdir -p $JUJU_TOOLS_DIR
      cp /opt/jujud $JUJU_TOOLS_DIR/jujud
      initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
      if test -n "$initCmd"; then
      $JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
      else
      exit 0
      fi
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 03 Dec 2019 16:33:56 -0600
      Finished:     Tue, 03 Dec 2019 16:34:43 -0600
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from pipelines-api-token-tzmx5 (ro)
Containers:
  pipelines-api:
    Container ID:   containerd://6b215518d4010e935e62b580d49d774dc59a95437f490fe4b7c5ecb1e0c30eb9
    Image:          registry.jujucharms.com/kubeflow-charmers/pipelines-api/oci-image@sha256:9ce417ed6e5a4c2ba2804d4a2694542b8a0cfc50e7a2cc9a0e08053cd06a41d8
    Image ID:       registry.jujucharms.com/kubeflow-charmers/pipelines-api/oci-image@sha256:9ce417ed6e5a4c2ba2804d4a2694542b8a0cfc50e7a2cc9a0e08053cd06a41d8
    Ports:          8887/TCP, 8888/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 04 Dec 2019 08:53:44 -0600
      Finished:     Wed, 04 Dec 2019 08:53:52 -0600
    Ready:          False
    Restart Count:  188
    Environment:
      MINIO_SERVICE_SERVICE_HOST:  minio
      MINIO_SERVICE_SERVICE_PORT:  9000
      MYSQL_SERVICE_HOST:          10.152.183.128
      MYSQL_SERVICE_PORT:          3306
      POD_NAMESPACE:               kubeflow
    Mounts:
      /config from pipelines-api-config-config (rw)
      /samples from pipelines-api-samples-config (rw)
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from pipelines-api-token-tzmx5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  juju-data-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  pipelines-api-config-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      pipelines-api-config-config
    Optional:  false
  pipelines-api-samples-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      pipelines-api-samples-config
    Optional:  false
  pipelines-api-token-tzmx5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pipelines-api-token-tzmx5
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                   From              Message
  ----     ------   ----                  ----              -------
  Warning  BackOff  22s (x4287 over 16h)  kubelet, ai-demo  Back-off restarting failed container


ubuntu@ai-demo:~/bundle-kubeflow$ m logs pipelines-api-6c6f459c98-q2grc -n kubeflow
I1204 14:53:44.834007       6 client_manager.go:123] Initializing client manager
F1204 14:53:52.940333       6 error.go:296] commands out of sync. Did you run multiple statements at once?
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000199000, 0xc0001dcc80, 0x6b, 0x9b)
	external/com_github_golang_glog/glog.go:769 +0xd4
github.com/golang/glog.(*loggingT).output(0x2962840, 0xc000000003, 0xc0001ef130, 0x2844815, 0x8, 0x128, 0x0)
	external/com_github_golang_glog/glog.go:720 +0x329
github.com/golang/glog.(*loggingT).printf(0x2962840, 0x3, 0x1a7dc3d, 0x2, 0xc0006139b0, 0x1, 0x1)
	external/com_github_golang_glog/glog.go:655 +0x14b
github.com/golang/glog.Fatalf(0x1a7dc3d, 0x2, 0xc0006139b0, 0x1, 0x1)
	external/com_github_golang_glog/glog.go:1148 +0x67
github.com/kubeflow/pipelines/backend/src/common/util.TerminateIfError(0x1c2fb80, 0xc000442830)
	backend/src/common/util/error.go:296 +0x79
main.initMysql(0xc00035f64a, 0x5, 0x12a05f200, 0x0, 0x0)
	backend/src/apiserver/client_manager.go:260 +0x37d
main.initDBClient(0x12a05f200, 0x15)
	backend/src/apiserver/client_manager.go:190 +0x5c0
main.(*ClientManager).init(0xc000613cd8)
	backend/src/apiserver/client_manager.go:125 +0x80
main.newClientManager(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	backend/src/apiserver/client_manager.go:300 +0x7b
main.main()
	backend/src/apiserver/main.go:56 +0x5e
ubuntu@ai-demo:~/bundle-kubeflow$

Can't successfully run seldon deployment with tensorflow image (wrong port)

Log output from the model container: https://pastebin.ubuntu.com/p/8PqmgQC3mV/
Pod description: https://pastebin.ubuntu.com/p/Qnmfj26Jmw/

Ambassador redirects to https

Deployed bundle #170
Ambassador is related to the gatekeeper and login.
Ambassador is exposed via loadbalancer on the default port 80.

When trying to access via clusterip/loadbalancer ip getting timeout.
curl to the address returns the following:
$ curl 172.29.146.13
Temporary Redirect.

Clearly the envoy redirects the traffic to the port 443 which there is no listener for.

Allow injecting secrets into user-created pods

Feature request to allow automatic sidecar integration for the jupyterhub notebook pods for example.
Example:
A data scientists that has access to a notebook need to download data from an external storage solution. This solution has a special authentication mechanism like Kerberos that manage access to data. A sidecar container to the notebook pod can be a good solution to do automatic Kerberos authentication and provide the user with a token that allows him to download his data.

This is a simple use case but it can be expanded to multiple other use cases.
We can use Istio to provide this mechanism or any other automatic sidecar integration mechanism.

Error installing microk8s with juju 2.8

Hello,
I've been attempting the installation in the past few days again and it seems like there are new unaddressed issues between juju 2.8 and microk8s

here's what happens when I run this line
sudo snap install microk8s --classic

after installing juju and cd into bundle-kubeflow dir

error: cannot perform the following tasks Run hook connect-plug-peers of snap "juju" (run hook "connect-plug-peers": error: error running snapctl: cannot perform the following tasks: start of [juju.fetch-oci] (# systemctl start snap.juju.fetch-oci.service Job for snap.juju.fetch-oci.service failed because the control process exited with error code. See "systemctl status snap.juju.fetch-oci.service" and "journalctl -xe" for details. ) start of [juju.fetch-oci] (exit status 1) )

I tried falling back to juju 2.7 but I'm facing other issues with cert-manager-webhook-operator pod can't be located ..

attached a photo to show the problem,

is there any solution to these problems either for v2.7 or v2.8

Katib example does not work

Hello,

Using the kubeflow bundle I was trying to test katib functionalities. I tried to test the example from kubeflow documentation : https://www.kubeflow.org/docs/components/hyperparameter-tuning/hyperparameter/#examples but I had the issue:

Error from server (InternalError): error when creating "https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/random-example.yaml": Internal error occurred: failed calling webhook "mutating.experiment.katib.kubeflow.org": Post https://katib-controller.kubeflow.svc:443/mutate-experiments?timeout=30s: dial tcp 10.1.42.11:443: connect: connection refused

Do you have any idea about that ?

MPI-Operator should be included into the bundle

According to the corporate user, MPI-Operator is a must when it comes to the distributed training and use of Horovod library.
Currently it is not a part of the bundle and never was.

microk8s.enable kubeflow hangs

install microk8s via

sudo snap install --classic microk8s --channel=1.15/edge/kubeflow

I got rev 802 of microk8s. With command "microk8s.enable kubeflow" it hangs with the following out for more than 10 mins.

Fetching latest public cloud list...
Updated your list of public clouds with 8 cloud regions added:
    added cloud region:
        - aws-gov/us-gov-east-1
        - aws/ap-east-1
        - aws/ap-northeast-3
        - aws/eu-north-1
        - aws/me-south-1
        - azure/francesouth
        - azure/southafricanorth
        - azure/southafricawest
Creating Juju controller "uk8s" on microk8s/localhost
Creating k8s resources for controller "controller-uk8s"

scripts/cli.py microk8s setup fails due to permission issues.

When executing the following command stated in the instructions:
python3 scripts/cli.py microk8s setup --controller uk8s
The command isn't executed correctly and gives the following error:

Insufficient permissions to access MicroK8s.
You can either try again with sudo or add the user ubuntu to the 'microk8s' group:

    sudo usermod -a -G microk8s ubuntu

The new group will be available on the user's next login.
Command '('microk8s.enable', 'dns')' returned non-zero exit status 1.

This is due to the recent changes in microk8s and can be fixed by executing this:

sudo usermod -a -G microk8s ubuntu
newgrp microk8s

Some changes either in the instructions or in the script might be needed.

The script fetches charms that are not compatible with the latest stable version of juju

It seems to me that either a juju version should be highlighted during the install, or that we should not fetch charms that depends on non-stable versions of juju (2.7-rc1)

ubuntu@node07ob20:~/bundle-kubeflow$ python3 scripts/cli.py deploy-to uk8s
Enter a password to set for the Kubeflow dashboard: 
Repeat for confirmation: 
+ juju add-model kubeflow microk8s
Added 'kubeflow' model on microk8s/localhost with credential 'microk8s' for user 'admin'
+ juju deploy kubeflow --channel stable --overlay=/tmp/tmpeze0ym3g
Located bundle "cs:bundle/kubeflow-133"
Resolving charm: cs:~kubeflow-charmers/ambassador-45
Resolving charm: cs:~kubeflow-charmers/argo-controller-44
Resolving charm: cs:~kubeflow-charmers/argo-ui-45
Resolving charm: cs:~kubeflow-charmers/jupyter-controller-45
Resolving charm: cs:~kubeflow-charmers/jupyter-web-47
Resolving charm: cs:~kubeflow-charmers/katib-controller-38
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/katib-manager-37
Resolving charm: cs:~kubeflow-charmers/katib-ui-37
Resolving charm: cs:~kubeflow-charmers/kubeflow-dashboard-2
Resolving charm: cs:~kubeflow-charmers/kubeflow-gatekeeper-4
Resolving charm: cs:~kubeflow-charmers/kubeflow-login-4
Resolving charm: cs:~kubeflow-charmers/kubeflow-profiles-5
Resolving charm: cs:~kubeflow-charmers/metacontroller-34
Resolving charm: cs:~kubeflow-charmers/metadata-controller-2
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/metadata-ui-2
Resolving charm: cs:~kubeflow-charmers/minio-45
Resolving charm: cs:~kubeflow-charmers/modeldb-backend-36
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/modeldb-store-36
Resolving charm: cs:~kubeflow-charmers/modeldb-ui-36
Resolving charm: cs:~kubeflow-charmers/pipelines-api-45
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/pipelines-persistence-45
Resolving charm: cs:~kubeflow-charmers/pipelines-scheduledworkflow-46
Resolving charm: cs:~kubeflow-charmers/pipelines-ui-45
Resolving charm: cs:~kubeflow-charmers/pipelines-viewer-46
Resolving charm: cs:~kubeflow-charmers/pytorch-operator-46
Resolving charm: cs:~kubeflow-charmers/redis-36
Resolving charm: cs:~kubeflow-charmers/seldon-api-frontend-44
Resolving charm: cs:~kubeflow-charmers/seldon-cluster-manager-45
Resolving charm: cs:~kubeflow-charmers/tensorboard-44
Resolving charm: cs:~kubeflow-charmers/tf-job-dashboard-46
Resolving charm: cs:~kubeflow-charmers/tf-job-operator-44
Executing changes:
- upload charm cs:~kubeflow-charmers/ambassador-45 for series kubernetes
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-45": charm's min version (2.7-rc1) is higher than this juju model's version (2.6.10)
Command '('juju', 'deploy', 'kubeflow', '--channel', 'stable', '--overlay=/tmp/tmpeze0ym3g')' returned non-zero exit status 1.

Charm build fails ('NoneType' object is not subscriptable)

My goal is to install kubeflow 1.0 on microk8s, which seems to use this bundle when running microk8s.enable kubeflow. From what I can tell, the master branch already supports kubeflow 1.0, but the current edge version of microk8s in the snap store seems to install an older version of the bundle, even when using KUBEFLOW_CHANNEL=edge. Moreover, when I run the deployment command using --channel=edge from within this repository (as per the README), I'm still getting a pre-1.0 version of kubeflow.

My environment consists of Ubuntu 18.04.4 LTS (amd64), microk8s 1.17/stable; I've followed the README (though juju-helpers required --edge to install) and I additionally installed 'pyyaml' via pip3 and 'charm' via snap.

Instead of downloading charms from the store, I now tried to use the --build option for deployment. The entire deployment command is: python3 scripts/cli.py deploy-to uk8s --build --public-address <machine_name>. It fails with this error message:

Traceback (most recent call last):
  File "/snap/charm/440/bin/charm-build", line 11, in <module>
    load_entry_point('charm-tools==2.7.3', 'console_scripts', 'charm-build')()
  File "/snap/charm/440/lib/python3.6/site-packages/charmtools/build/builder.py", line 941, in main
    build()
  File "/snap/charm/440/lib/python3.6/site-packages/charmtools/build/builder.py", line 649, in __call__
    self.generate()
  File "/snap/charm/440/lib/python3.6/site-packages/charmtools/build/builder.py", line 592, in generate
    layers = self.fetch()
  File "/snap/charm/440/lib/python3.6/site-packages/charmtools/build/builder.py", line 272, in fetch
    self.post_metrics('charm', self.name, False)
  File "/snap/charm/440/lib/python3.6/site-packages/charmtools/build/builder.py", line 529, in post_metrics
    cid = conf['cid']
TypeError: 'NoneType' object is not subscriptable

Cannot expose ambassador app, node had condition DiskPression

Hi,
I deployed CDK on AWS using the bundle. All well. I deployed then kubeflow on top of it and I ran into some issues with the script (see my other bug) and I've worked around them manually. Now, when I try to expose the ambassador app to access the kubeflow dashboard, the first ambassador-auth unit goes into this error: The node was low on resource: ephemeral-storage. Container ambassador was using 93222, which exceeds its request of 0., and all the other ambassador-auth units go into error with the error message Pod The node had condition: [DiskPressure].. It also looks like because of this error, juju tried to scale out and spawned 8 more units of the ambassador-auth, and they all ended up in the same state. The machines used for this are used only for this test, nothing else runs on them.

How can I resolve this issue? I am unable to deploy fully kubeflow at the moment.

My CDK status

juju kubeflow model status

kubectl status

Something weird too that I do not know if it is due to kubeflow or juju..probably juju. But I tried to scale down the number of units for ambassador-auth and it did the opposite.

$ juju scale-application ambassador-auth 2
ambassador-auth scaled to 2 units

gave that result... I have 17 units in error now o.o

pipelines-api stuck in CrashLoopBackOff

We're deploy and bootstrapping to a k8s cluster on AWS using Juju 2.7.4. Then I'm deploying kubeflow with juju deploy cs:bundle/kubeflow. All units go to 'active' according to Juju, but the pipeline-api App stays in 'waiting' and kubectl shows an issue with the pipeline-api pod. Specifically a failure to create the Minio bucket.

I'm using 8CPU, 16GB memory, 100GB storage for my 3 kubernetes-workers.

Here's some output from kubectl:

ubuntu@aws-cpe:~$ kubectl describe pod pipelines-api-5bd7b89ff8-6b4x6 --namespace controller-foundations-k8s
Name:         pipelines-api-5bd7b89ff8-6b4x6
Namespace:    controller-foundations-k8s
Priority:     0
Node:         ip-172-31-46-57.ec2.internal/172.31.46.57
Start Time:   Thu, 19 Mar 2020 17:00:41 +0000
Labels:       juju-app=pipelines-api
              pod-template-hash=5bd7b89ff8
Annotations:  apparmor.security.beta.kubernetes.io/pod: runtime/default
              juju.io/controller: 1763004b-11b9-453c-84d5-35efe0dbbc5a
              juju.io/model: 96868d80-461e-4110-862f-2803a36513d6
              juju.io/unit: pipelines-api/0
              seccomp.security.beta.kubernetes.io/pod: docker/default
Status:       Running
IP:           10.1.26.24
IPs:
  IP:           10.1.26.24
Controlled By:  ReplicaSet/pipelines-api-5bd7b89ff8
Init Containers:
  juju-pod-init:
    Container ID:  containerd://a841de81fb7cbdad37f3244e5057849f3225d8e666455c9e45e01b961a32635d
    Image:         jujusolutions/jujud-operator:2.7.4
    Image ID:      docker.io/jujusolutions/jujud-operator@sha256:aaa1920ddf1eeddf4bf1a2f5bd9cdbc452ad693393ef3dea0ec863ba92927167
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools
      
      mkdir -p $JUJU_TOOLS_DIR
      cp /opt/jujud $JUJU_TOOLS_DIR/jujud
      initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
      if test -n "$initCmd"; then
      $JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
      else
      exit 0
      fi
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 19 Mar 2020 17:00:44 +0000
      Finished:     Thu, 19 Mar 2020 17:01:15 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from pipelines-api-token-m2qw2 (ro)
Containers:
  pipelines-api:
    Container ID:   containerd://ae998843e52a3942cd07f0b4d6ac54dadb4fb3f6c520e8d8c41bafe55c6e3433
    Image:          registry.jujucharms.com/kubeflow-charmers/pipelines-api/oci-image@sha256:9ce417ed6e5a4c2ba2804d4a2694542b8a0cfc50e7a2cc9a0e08053cd06a41d8
    Image ID:       registry.jujucharms.com/kubeflow-charmers/pipelines-api/oci-image@sha256:9ce417ed6e5a4c2ba2804d4a2694542b8a0cfc50e7a2cc9a0e08053cd06a41d8
    Ports:          8887/TCP, 8888/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Thu, 19 Mar 2020 17:08:30 +0000
      Finished:     Thu, 19 Mar 2020 17:08:30 +0000
    Ready:          False
    Restart Count:  6
    Environment:
      MINIO_SERVICE_SERVICE_HOST:  minio
      MINIO_SERVICE_SERVICE_PORT:  9000
      MYSQL_SERVICE_HOST:          10.152.183.106
      MYSQL_SERVICE_PORT:          3306
      POD_NAMESPACE:               controller
    Mounts:
      /config from pipelines-api-config-config (rw)
      /samples from pipelines-api-samples-config (rw)
      /usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
      /var/lib/juju from juju-data-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from pipelines-api-token-m2qw2 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  juju-data-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  pipelines-api-config-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      pipelines-api-config-config
    Optional:  false
  pipelines-api-samples-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      pipelines-api-samples-config
    Optional:  false
  pipelines-api-token-m2qw2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pipelines-api-token-m2qw2
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                                   Message
  ----     ------     ----                   ----                                   -------
  Normal   Scheduled  <unknown>              default-scheduler                      Successfully assigned controller-foundations-k8s/pipelines-api-5bd7b89ff8-6b4x6 to ip-172-31-46-57.ec2.internal
  Normal   Pulled     10m                    kubelet, ip-172-31-46-57.ec2.internal  Container image "jujusolutions/jujud-operator:2.7.4" already present on machine
  Normal   Created    10m                    kubelet, ip-172-31-46-57.ec2.internal  Created container juju-pod-init
  Normal   Started    10m                    kubelet, ip-172-31-46-57.ec2.internal  Started container juju-pod-init
  Normal   Pulling    10m                    kubelet, ip-172-31-46-57.ec2.internal  Pulling image "registry.jujucharms.com/kubeflow-charmers/pipelines-api/oci-image@sha256:9ce417ed6e5a4c2ba2804d4a2694542b8a0cfc50e7a2cc9a0e08053cd06a41d8"
  Normal   Pulled     8m55s                  kubelet, ip-172-31-46-57.ec2.internal  Successfully pulled image "registry.jujucharms.com/kubeflow-charmers/pipelines-api/oci-image@sha256:9ce417ed6e5a4c2ba2804d4a2694542b8a0cfc50e7a2cc9a0e08053cd06a41d8"
  Normal   Created    8m (x4 over 8m55s)     kubelet, ip-172-31-46-57.ec2.internal  Created container pipelines-api
  Normal   Started    7m59s (x4 over 8m55s)  kubelet, ip-172-31-46-57.ec2.internal  Started container pipelines-api
  Normal   Pulled     5m42s (x5 over 8m53s)  kubelet, ip-172-31-46-57.ec2.internal  Container image "registry.jujucharms.com/kubeflow-charmers/pipelines-api/oci-image@sha256:9ce417ed6e5a4c2ba2804d4a2694542b8a0cfc50e7a2cc9a0e08053cd06a41d8" already present on machine
  Warning  BackOff    33s (x40 over 8m52s)   kubelet, ip-172-31-46-57.ec2.internal  Back-off restarting failed container

ubuntu@aws-cpe:~$ kubectl logs -p pipelines-api-5bd7b89ff8-6b4x6 --namespace controller-foundations-k8s
I0319 17:18:40.236397       6 client_manager.go:123] Initializing client manager
[mysql] 2020/03/19 17:18:40 packets.go:427: busy buffer
[mysql] 2020/03/19 17:18:40 packets.go:408: busy buffer
E0319 17:18:40.462220       6 default_experiment_store.go:73] Failed to commit transaction to initialize default experiment table
[mysql] 2020/03/19 17:18:40 packets.go:427: busy buffer
[mysql] 2020/03/19 17:18:40 packets.go:408: busy buffer
E0319 17:18:40.468786       6 db_status_store.go:71] Failed to commit transaction to initialize database status table
[mysql] 2020/03/19 17:18:40 packets.go:427: busy buffer
[mysql] 2020/03/19 17:18:40 packets.go:408: busy buffer
E0319 17:18:40.474776       6 default_experiment_store.go:73] Failed to commit transaction to initialize default experiment table
F0319 17:18:40.483332       6 client_manager.go:291] Failed to create Minio bucket. Error: <nil>
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc00039a500, 0xc00056c500, 0x61, 0x9b)
	external/com_github_golang_glog/glog.go:769 +0xd4
github.com/golang/glog.(*loggingT).output(0x2962840, 0xc000000003, 0xc0003cf130, 0x2842f30, 0x11, 0x123, 0x0)
	external/com_github_golang_glog/glog.go:720 +0x329
github.com/golang/glog.(*loggingT).printf(0x2962840, 0xc000000003, 0x1aafa44, 0x28, 0xc00071fb00, 0x1, 0x1)
	external/com_github_golang_glog/glog.go:655 +0x14b
github.com/golang/glog.Fatalf(0x1aafa44, 0x28, 0xc00071fb00, 0x1, 0x1)
	external/com_github_golang_glog/glog.go:1148 +0x67
main.createMinioBucket(0xc0000ee820, 0xc000468960, 0xa)
	backend/src/apiserver/client_manager.go:291 +0x221
main.initMinioClient(0x12a05f200, 0x15, 0x12a05f200)
	backend/src/apiserver/client_manager.go:277 +0x1e5
main.(*ClientManager).init(0xc00071fcd8)
	backend/src/apiserver/client_manager.go:140 +0x3de
main.newClientManager(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	backend/src/apiserver/client_manager.go:300 +0x7b
main.main()
	backend/src/apiserver/main.go:56 +0x5e

AttributeError: 'MySQLClient' object has no attribute 'root_password'

This is a new issue for my problem at the bottom of #186.

python3 scripts/cli.py deploy-to uk8s --build reproducibly fails with

+ juju wait -wv -m kubeflow
DEBUG:root:katib-manager/0 workload status is error since 2020-03-25 22:20:41+00:00
ERROR:root:katib-manager/0 failed: workload status is error
Command '('juju', 'wait', '-wv', '-m', 'kubeflow')' returned non-zero exit status 1.

microk8s.kubectl logs -n kubeflow --tail 200 -l juju-operator=katib-manager prints this error:

2020-03-25 22:26:03 INFO juju-log mysql:9: Invoking reactive handler: reactive/katib_manager.py:21:start_charm
2020-03-25 22:26:03 INFO juju-log mysql:9: status-set: maintenance: configuring container
2020-03-25 22:26:03 ERROR juju-log mysql:9: Hook error:
Traceback (most recent call last):
  File "lib/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "lib/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "lib/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "lib/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-katib-manager-0/charm/reactive/katib_manager.py", line 48, in start_charm
    'DB_PASSWORD': mysql.root_password(),
AttributeError: 'MySQLClient' object has no attribute 'root_password'

2020-03-25 22:26:03 DEBUG mysql-relation-changed Traceback (most recent call last):
2020-03-25 22:26:03 DEBUG mysql-relation-changed   File "/var/lib/juju/agents/unit-katib-manager-0/charm/hooks/mysql-relation-changed", line 19, in <module>
2020-03-25 22:26:03 DEBUG mysql-relation-changed     main()
2020-03-25 22:26:03 DEBUG mysql-relation-changed   File "lib/charms/reactive/__init__.py", line 74, in main
2020-03-25 22:26:03 DEBUG mysql-relation-changed     bus.dispatch(restricted=restricted_mode)
2020-03-25 22:26:03 DEBUG mysql-relation-changed   File "lib/charms/reactive/bus.py", line 390, in dispatch
2020-03-25 22:26:03 DEBUG mysql-relation-changed     _invoke(other_handlers)
2020-03-25 22:26:03 DEBUG mysql-relation-changed   File "lib/charms/reactive/bus.py", line 359, in _invoke
2020-03-25 22:26:03 DEBUG mysql-relation-changed     handler.invoke()
2020-03-25 22:26:03 DEBUG mysql-relation-changed   File "lib/charms/reactive/bus.py", line 181, in invoke
2020-03-25 22:26:03 DEBUG mysql-relation-changed     self._action(*args)
2020-03-25 22:26:03 DEBUG mysql-relation-changed   File "/var/lib/juju/agents/unit-katib-manager-0/charm/reactive/katib_manager.py", line
48, in start_charm
2020-03-25 22:26:03 DEBUG mysql-relation-changed     'DB_PASSWORD': mysql.root_password(),
2020-03-25 22:26:03 DEBUG mysql-relation-changed AttributeError: 'MySQLClient' object has no attribute 'root_password'
2020-03-25 22:26:03 ERROR juju.worker.uniter.operation runhook.go:132 hook "mysql-relation-changed" failed: exit status 1

I can confirm that the password overlay file is properly generated during setup. I think it has something to do with the mysql interface (or the mariadb-k8s charm? -- I'm quite new to all this), since for metadata-api, which also requires the mysql interface, a similar error occurs:

2020-03-25 22:26:24 INFO juju-log mysql:12: Invoking reactive handler: reactive/metadata_api.py:26:start_charm
2020-03-25 22:26:24 INFO juju-log mysql:12: status-set: maintenance: configuring container
2020-03-25 22:26:24 ERROR juju-log mysql:12: Hook error:
Traceback (most recent call last):
  File "lib/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "lib/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "lib/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "lib/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-metadata-api-0/charm/reactive/metadata_api.py", line 50, in start_charm
    f"--mysql_service_password={mysql.root_password()}",
AttributeError: 'MySQLClient' object has no attribute 'root_password'

2020-03-25 22:26:24 DEBUG mysql-relation-changed Traceback (most recent call last):
2020-03-25 22:26:24 DEBUG mysql-relation-changed   File "/var/lib/juju/agents/unit-metadata-api-0/charm/hooks/mysql-relation-changed", line 19, in <module>
2020-03-25 22:26:24 DEBUG mysql-relation-changed     main()
2020-03-25 22:26:24 DEBUG mysql-relation-changed   File "lib/charms/reactive/__init__.py", line 74, in main
2020-03-25 22:26:24 DEBUG mysql-relation-changed     bus.dispatch(restricted=restricted_mode)
2020-03-25 22:26:24 DEBUG mysql-relation-changed   File "lib/charms/reactive/bus.py", line 390, in dispatch
2020-03-25 22:26:24 DEBUG mysql-relation-changed     _invoke(other_handlers)
2020-03-25 22:26:24 DEBUG mysql-relation-changed   File "lib/charms/reactive/bus.py", line 359, in _invoke
2020-03-25 22:26:24 DEBUG mysql-relation-changed     handler.invoke()
2020-03-25 22:26:24 DEBUG mysql-relation-changed   File "lib/charms/reactive/bus.py", line 181, in invoke
2020-03-25 22:26:24 DEBUG mysql-relation-changed     self._action(*args)
2020-03-25 22:26:24 DEBUG mysql-relation-changed   File "/var/lib/juju/agents/unit-metadata-api-0/charm/reactive/metadata_api.py", line 50, in start_charm
2020-03-25 22:26:24 DEBUG mysql-relation-changed     f"--mysql_service_password={mysql.root_password()}",
2020-03-25 22:26:24 DEBUG mysql-relation-changed AttributeError: 'MySQLClient' object has no attribute 'root_password'
2020-03-25 22:26:24 ERROR juju.worker.uniter.operation runhook.go:132 hook "mysql-relation-changed" failed: exit status 1

I initially suspected that this could have something to do with my prior experiments, which might have polluted some build cache and hence an incompatible version of mariadb-k8s is being used. I tried to clean up the system state by re-installing the juju snaps (snap remove --purge, ...) and removed the /tmp/charm-builds and ~/.cache/charm directories, but the problem persists.

invalid charm url in application oidc-gatekeeper

The bundle cannot deploy.

crodriguez@camille-hp:~/bundle-kubeflow$ python3 scripts/cli.py deploy-to charmed-kubernetes-us-east-1
Enter a password to set for the Kubeflow dashboard:
Repeat for confirmation:

juju add-model kubeflow charmed-kubernetes
Added 'kubeflow' model on charmed-kubernetes/us-east-1 with credential 'charmed-kubernetes' for user 'admin'
juju deploy -m kubeflow kubeflow --channel stable --overlay=/tmp/tmpgl9nrt03
Located bundle "cs:bundle/kubeflow-170"
ERROR cannot deploy bundle: the provided bundle has the following errors:
empty charm path
invalid charm URL in application "oidc-gatekeeper": cannot parse URL "": name "" not valid
empty charm path
invalid charm URL in application "dex-auth": cannot parse URL "": name "" not valid
Command '('juju', 'deploy', '-m', 'kubeflow', 'kubeflow', '--channel', 'stable', '--overlay=/tmp/tmpgl9nrt03')' returned non-zero exit status 1.

Can we have a default bundle that deploys without needing to configure dex/oidc and simply give us a working kubeflow deployment for demo use? From what I read, we need to configure dex and oidc to be able to deploy it

cli script fails on setup

Hello everyone,

I am having issues with the script cli.py. I am deploying on AWS.
I launched python3 scripts/cli.py ck setup --controller ckkf and it ran successfully until it hit this:

DEBUG:root:containerd is lead by containerd/0
DEBUG:root:easyrsa is lead by easyrsa/0
DEBUG:root:etcd is lead by etcd/0
+ juju kubectl apply -f storage/aws-ebs.yml
storageclass.storage.k8s.io/k8s-ebs created
+ juju scp -m ckkf:default kubernetes-master/0:~/config /tmp/tmp0rh5mvbs
+ juju add-k8s ckkf -c ckkf --region=aws/us-east-1 --storage juju-operator-storage

ERROR storage class "juju-operator-storage" not found
Command '('juju', 'add-k8s', 'ckkf', '-c', 'ckkf', '--region=aws/us-east-1', '--storage', 'juju-operator-storage')' returned non-zero exit status 1.

I retried it on my own after and the command worked.

$ juju add-k8s ckkf -c ckkf --region=aws/us-east-1 --storage juju-operator-storage

k8s substrate "ec2/us-east-1" added as cloud "ckkf" with EBS Volume default storage provisioned
by the new "juju-operator-storage" storage class.```

From the juju status, my CDK deployment looks all good, so I'm running the following script to deploy kubeflow on it. It hits another error. Is it because the previous script didn't finish and I have to re-deploy everything?

$ python3 scripts/cli.py deploy-to ckkf --cloud ckkf
Enter a password to set for the Kubeflow dashboard:
Repeat for confirmation:

juju add-model kubeflow ckkf
Added 'kubeflow' model on ckkf/us-east-1 with credential 'ckkf' for user 'admin'
juju kubectl apply -f resources/katib-configmap.yaml
Error from server (NotFound): error when creating "resources/katib-configmap.yaml": namespaces "kubeflow" not found
Error: SubcommandError("kubectl --kubeconfig /tmp/.tmp1ss8AK apply -f resources/katib-configmap.yaml -n kubeflow", "exit code: 1")
Command '('juju', 'kubectl', 'apply', '-f', 'resources/katib-configmap.yaml')' returned non-zero exit status 1.


I see how I can work around this (create the namespace myself), but it would be good to investigate and fix the issue in the script too.

ensure prereqs are available in scripts

The ./scripts/deploy-* bits require juju, juju-wait, kubectl, and maybe microk8s. It'd be nice to fail fast if a which $cmd isn't present, and give a nice "snap install $x" message if needed.

Alternatively, do the snap install in these scripts on a failing which.

when deploying with microk8s ingress gets broken on restart

Windows-based deployment with multipass.
Because this is a VM I have to modify ingress to point to the <IP.xip.io> rather than to the localhost.
When I stop microk8s and start again, the value "localhost" is again set in the ingress, also "tls" section is gone.

Default image version tensorflow-1.13.1-notebook-gpu does not work

Hello the deafult notebook image version tensorflow-1.13.1-notebook-gpu does not work:

Error: context deadline exceeded
kubelet ip-172-31-55-47.ec2.internal
spec.containers{asdascxz}
1
2019-10-25T15:58 UTC
2019-10-25T15:58 UTC
warning
Error: failed to reserve container name "asdascxz_asdascxz-0_kubeflow_bd97f0d0-2dae-437a-938e-c5f0fdab78b8_0": name "asdascxz_asdascxz-0_kubeflow_bd97f0d0-2dae-437a-938e-c5f0fdab78b8_0" is reserved for "a76b7494613b2cb12c3238b47126f9067553c8c9db99b6130f93fb4917b8af0c"
kubelet ip-172-31-55-47.ec2.internal
spec.containers{asdascxz}
1
2019-10-25T15:58 UTC
2019-10-25T15:58 UTC
Container image "gcr.io/kubeflow-images-public/tensorflow-1.13.1-notebook-gpu:v0.5.0" already present on machine
kubelet ip-172-31-55-47.ec2.internal
spec.containers{asdascxz}
3
2019-10-25T15:58 UTC
2019-10-25T15:59 UTC
warning
Error: failed to reserve container name "asdascxz_asdascxz-0_kubeflow_bd97f0d0-2dae-437a-938e-c5f0fdab78b8_0": name "asdascxz_asdascxz-0_kubeflow_bd97f0d0-2dae-437a-938e-c5f0fdab78b8_0" is reserved for "8a2230b4c8ccd7f2a1afb3689d76ee0748b9d07336e14c60a25baa16f150a578"
kubelet ip-172-31-55-47.ec2.internal
spec.containers{asdascxz}

But it works with this image version tensorflow-1.9.0-notebook-gpu.

Charm katib-manager does not pick post-configured password for mariadb

Deployed katib-manager charm with mariadb as katib-db. If I leave root-password empty and define it as post-config, this value is not propagated to katib-manager. I can still see:

            pod-template-hash=85684bfc8

Annotations: apparmor.security.beta.kubernetes.io/pod: runtime/default
juju.io/controller: 643e0282-b116-417b-8e73-f3d71b9e1310
juju.io/model: 8b49bc59-cb3f-467d-8922-dd7f18eac13e
seccomp.security.beta.kubernetes.io/pod: docker/default
Status: Running
Controlled By: ReplicaSet/katib-manager-85684bfc8
Containers:
katib-manager:
Container ID: containerd://5d85360856351ecd814eb3132e73a337485325606fba214c5a3fe7850007b5f3
Image: registry.jujucharms.com/kubeflow-charmers/katib-manager/manager-image@sha256:a2bbdb0f8683216949b27c4885d86c645b232262f494f2fa3691d2a4d5281bfb
Image ID: registry.jujucharms.com/kubeflow-charmers/katib-manager/manager-image@sha256:a2bbdb0f8683216949b27c4885d86c645b232262f494f2fa3691d2a4d5281bfb
Port: 6789/TCP
Host Port: 0/TCP
Command:
./katib-manager
State: Running
Last State: Terminated
Reason: Error
Exit Code: 2
Ready: False
Restart Count: 4
Liveness: exec [/bin/grpc_health_probe -addr=:6789] delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/bin/grpc_health_probe -addr=:6789] delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
MYSQL_ROOT_PASSWORD: <<------------------------------

Broken references on jaas.ai

Hi,

From bundle's page: https://jaas.ai/kubeflow/bundle/112, I try to access scripts mentioned on this page and get 404 (ex.: https://jaas.ai/kubeflow/bundle/scripts/deploy-microk8s).

An alternative is to reference github scripts, such as: https://github.com/juju-solutions/bundle-kubeflow/blob/master/scripts/deploy-microk8s

python3 scripts/cli.py remove kubeflow fails because of persistent storage

Removal of the kubeflow model fails because of persistent storage.

crodriguez@camille-hp:~/bundle-kubeflow$ python3 scripts/cli.py remove-from uk8s
+ juju destroy-model uk8s:kubeflow
WARNING! This command will destroy the "kubeflow" model.
This includes all containers, applications, data and other resources.

Continue [y/N]? y
Destroying modelERROR cannot destroy model "kubeflow"

The model has persistent storage remaining:
	3 volumes and 3 filesystems

To destroy the storage, run the destroy-model
command again with the "--destroy-storage" option.

To release the storage from Juju's management
without destroying it, use the "--release-storage"
option instead. The storage can then be imported
into another Juju model.

Command '('juju', 'destroy-model', 'uk8s:kubeflow')' returned non-zero exit status 1.

The juju command must have the --destroy-storage option to perform successfully.

Not able to create a notebook

Hello, trying to follow this tutorial to deploy Kubeflow on top of CDK. Everything seems to be deployed but I'm not able to create a new notebook:

The output of "kubectl get pods -A":
https://paste.ubuntu.com/p/BzTnJX8Y76/

The output of "kubectl logs --tail 1000 --all-containers -l juju-app=jupyter-controller -n kubeflow" :
https://paste.ubuntu.com/p/sBCpb3txjD/

Charm mariadb is not fulfilling its own description of generating random password if root-password is empty

If I leave root-password empty on my bundle, mariadb charm will hang on "waiting" until we set it post-config. The options description mention that root-password will be generated randomly if left undefined however it does not happen. Also, this password needs to be passed via relations to other services.

I've seen a strange behavior when defining both "database" and "root-password" on mariadb: https://bugs.launchpad.net/juju/+bug/1842252
Not sure exactly why this happens, but seems juju is just ignoring "root-password" definition.

s/config/log

Typo from #180. Change this to hookenv.log instead of .config:

https://github.com/juju-solutions/bundle-kubeflow/blob/38ccbed95f189aff33aad1db88d48e728702ed1c/charms/cert-manager-webhook/reactive/webhook.py#L63

Kaggle image does not work correctly with this bundle

Following this tutorial for Kaggle: https://github.com/canonical-labs/kaggle-kubeflow-tutorial

I was unable to access Kaggle notebook after creating it: the error I have in the browser " upstream connect error or disconnect/reset before headers "

kubeflow for microk8s deployment fail

Trying to deploy Kubeflow for microk8s and following the readme file I had this issue:

Executing changes:

upload charm cs:~kubeflow-charmers/ambassador-78 for series kubernetes
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-78": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-78": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-78/archive?channel=stable: x509: certificate is valid for h2864380.stratoserver.net, not api.jujucharms.com
Command '('juju', 'deploy', '-m', 'kubeflow', 'kubeflow', '--channel', 'stable', '--overlay=/tmp/tmpss1zciuq')' returned non-zero exit status 1.

ERROR option provided but not defined: --all

There is an error in the deploy script, the option --all does not exist for juju list-clouds.

ubuntu@node07ob20:~/bundle-kubeflow$ python3 scripts/cli.py deploy-to uk8s
Enter a password to set for the Kubeflow dashboard: 
Repeat for confirmation: 
Traceback (most recent call last):
  File "scripts/cli.py", line 455, in <module>
    cli()
  File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "scripts/cli.py", line 234, in deploy_to
    output = get_output('juju', 'list-clouds', '-c', controller, '--format=json', '--all')
  File "scripts/cli.py", line 58, in get_output
    args, check=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE
  File "/usr/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '('juju', 'list-clouds', '-c', 'uk8s', '--format=json', '--all')' returned non-zero exit status 2.
ubuntu@node07ob20:~/bundle-kubeflow$ juju list-clouds -c uk8s --format=json --all
ERROR option provided but not defined: --all

I removed the --all in the cli.py script and it started deploying.

Dex/OIDC fails to configure

First, the manual says to configure "public-url". Not sure what it will be for the extended cluster.
I assumed this is the address of the ambassador (.xip.io).
Configured public-url='http://.xip.io:80' for both dex-auth and oidc-gatekeeper.
Once both dex and oidc charms finished the config, oidc complaints:

time="2020-01-26T11:03:23Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: Get https://172.29.146.175.xip.io/dex/.well-known/openid-configuration: x509: certificate is valid for ingress.local, not 172.29.146.175.xip.io"

And obviously the ambassador doesn't allow to login, instead showing "OIDC Setup is not complete yet."

Wrong model name causes error at the end of deployment

At the end of the deployment with the command python3 scripts/cli.py deploy-to uk8s, kubeflow is successful but it prints out an error with the juju status.

+ juju config ambassador juju-external-hostname=localhost
+ juju expose ambassador
Congratulations, Kubeflow is now available. Took 229 seconds.
ERROR model uk8s:admin/default not found
Command '('juju', 'status', '-m', 'uk8s:default', '--format=json')' returned non-zero exit status 1.

The central dashboard is available at http://10.1.1.83.xip.io/
To tear down Kubeflow and associated infrastructure, run this command:
juju kubeflow remove-from uk8s
For more information, see documentation at:
https://github.com/juju-solutions/bundle-kubeflow/blob/master/README.md

I expect that the correct command would be juju status -m uk8s:kubeflow, since the model deployed with this script is named kubeflow, not default.

Issue deploying stable kubeflow

I have this issue when deploying kubeflow on top of CDK:

python3 scripts/cli.py deploy-to cdkkf

Enter a password to set for the Kubeflow dashboard:
Repeat for confirmation:

juju add-model kubeflow cdkkf
Added 'kubeflow' model on cdkkf/us-east-1 with credential 'admin' for user 'admin'
juju kubectl apply -f resources/katib-configmap.yaml
configmap/katib-config created
juju kubectl apply -f resources/trial-template.yaml
configmap/trial-template created
juju deploy kubeflow --channel stable --overlay=/tmp/tmpczzzxln5
Located bundle "cs:bundle/kubeflow-115"
ERROR cannot deploy bundle: the provided bundle has the following errors:
empty charm path
invalid charm URL in application "pipelines-db": cannot parse URL "": name "" not valid
empty charm path
invalid charm URL in application "modeldb-db": cannot parse URL "": name "" not valid
Command '('juju', 'deploy', 'kubeflow', '--channel', 'stable', '--overlay=/tmp/tmpczzzxln5')' returned non-zero exit status 1.

It seems pipelines-db and modeldb-db does not exist anymore.

cli.py not working for aws/ckkf

I am trying to setup ckkf as per https://github.com/juju-solutions/bundle-kubeflow#setup-charmed-kubernetes

sudo snap install juju --classic
sudo snap install juju-wait --classic
sudo snap install juju-helpers --edge --classic

sudo git clone https://github.com/juju-solutions/bundle-kubeflow.git
cd bundle-kubeflow
sudo snap install microk8s --classic --channel=stable

sudo microk8s.status --wait-ready
sudo microk8s.kubectl cluster-info

cd bundle-kubeflow
sudo juju add-credential aws
sudo python3 scripts/cli.py ck setup --controller ckkf
sudo juju add-k8s ckkf -c ckkf --cloud aws --region us-east-1 --storage juju-operator-storage
sudo python3 scripts/cli.py deploy-to ckkf --cloud aws --public-address  {public-ip}

Error

Enter a password to set for the Kubeflow dashboard: 
Repeat for confirmation: 
+ juju add-model kubeflow aws --config update-status-hook-interval=30s
Added 'kubeflow' model on aws/us-east-1 with credential 'my-aws-creds' for user 'admin'
+ juju deploy -m kubeflow kubeflow --channel stable --overlay=/tmp/tmporof3iu_
Located bundle "cs:bundle/kubeflow-185"
Resolving charm: cs:~kubeflow-charmers/ambassador-78
Resolving charm: cs:~kubeflow-charmers/argo-controller-162
Resolving charm: cs:~kubeflow-charmers/argo-ui-78
Resolving charm: cs:~kubeflow-charmers/cert-manager-controller-9
Resolving charm: cs:~kubeflow-charmers/cert-manager-webhook-9
Resolving charm: cs:~kubeflow-charmers/dex-auth-20
Resolving charm: cs:~kubeflow-charmers/jupyter-controller-176
Resolving charm: cs:~kubeflow-charmers/jupyter-web-81
Resolving charm: cs:~kubeflow-charmers/katib-controller-76
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/katib-manager-75
Resolving charm: cs:~kubeflow-charmers/katib-ui-71
Resolving charm: cs:~kubeflow-charmers/kubeflow-dashboard-36
Resolving charm: cs:~kubeflow-charmers/kubeflow-profiles-41
Resolving charm: cs:~kubeflow-charmers/metacontroller-68
Resolving charm: cs:~kubeflow-charmers/metadata-api-32
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/metadata-envoy-14
Resolving charm: cs:~kubeflow-charmers/metadata-grpc-13
Resolving charm: cs:~kubeflow-charmers/metadata-ui-34
Resolving charm: cs:~kubeflow-charmers/minio-78
Resolving charm: cs:~kubeflow-charmers/modeldb-backend-76
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/modeldb-store-69
Resolving charm: cs:~kubeflow-charmers/modeldb-ui-68
Resolving charm: cs:~kubeflow-charmers/oidc-gatekeeper-19
Resolving charm: cs:~kubeflow-charmers/pipelines-api-82
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/pipelines-persistence-167
Resolving charm: cs:~kubeflow-charmers/pipelines-scheduledworkflow-163
Resolving charm: cs:~kubeflow-charmers/pipelines-ui-78
Resolving charm: cs:~kubeflow-charmers/pipelines-viewer-102
Resolving charm: cs:~kubeflow-charmers/pipelines-visualization-13
Resolving charm: cs:~kubeflow-charmers/pytorch-operator-163
Resolving charm: cs:~kubeflow-charmers/seldon-core-15
Resolving charm: cs:~kubeflow-charmers/tf-job-operator-159
Executing changes:
- upload charm cs:~kubeflow-charmers/ambassador-78 for series kubernetes
- deploy application ambassador with 1 unit on kubernetes using cs:~kubeflow-charmers/ambassador-78
ERROR cannot deploy bundle: cannot deploy application "ambassador": cannot add application "ambassador": series "kubernetes" in a non container model not valid
Command '('juju', 'deploy', '-m', 'kubeflow', 'kubeflow', '--channel', 'stable', '--overlay=/tmp/tmporof3iu_')' returned non-zero exit status 1.

[Question] How do I get the address of the Kubeflow Dashboard?

I was able to successfully deploy kubeflow with Microk8s and access the dashboard. However, I restarted my computer and was no longer able to access the site at the provided ambassador URL. Is there a way to get that address to make sure I'm pointing my browser to the correct address?

default pipelines don't run properly

All examples shipped with the kubeflow bundle give me this error when trying to run them "invalid spec: templates.print.outputs.artifacts.mlpipeline-ui-metadata: kubelet executor does not support outputs from base image layer. must use emptyDir"

Can't create seldon deployment with volume

When creating seldon deployment with no volumes - pods are getting created instantly.
When adding volume context as well as volume mount into the containers context - "describe seldondeployments" shows no event.

For reference - paste of the deployment I'm using.
https://pastebin.ubuntu.com/p/Wpcf3Gdb9s/

Misleading text printed out 'juju kubeflow deploy-to ckkf'

After deploying CDK with python3 scripts/cli.py ck setup --controller ckkf, the text printed out indicates to run a juju command to deploy kubeflow, however that command does not exist. It is misleading for new users. The command that should be run with the current version of the script is python3 scripts/cli.py deploy-to ckkf --cloud ckkf.

To deploy Kubeflow on top of Charmed Kubernetes, run `juju kubeflow deploy-to ckkf`.

To tear down Charmed Kubernetes and associated infrastructure, run this command:

    # Run `juju destroy-controller --help` for a full listing of options,
    # such as how to release storage instead of destroying it.
    juju destroy-controller ckkf --destroy-storage

For more information, see documentation at:

https://github.com/juju-solutions/bundle-kubeflow/blob/master/README.md


crodriguez@camille-hp:~/bundle-kubeflow$ juju kubeflow deploy-to ckkf
ERROR juju: "kubeflow" is not a juju command. See "juju --help".

Did you mean:
	deploy

` juju kubeflow remove-from uk8s` does not exist

At the end of python3 scripts/cli.py deploy-to uk8s, it prints out this 👍

To tear down Kubeflow and associated infrastructure, run this command:
juju kubeflow remove-from uk8s

However, this juju command does not exist.

crodriguez@camille-hp:~/bundle-kubeflow$ juju kubeflow remove-from uk8s
ERROR unrecognized command: juju kubeflow

I personally use juju destroy-model kubeflow --destroy-storage -y to remove the model.

Documentation | Unclear Juju snaps have to be installed

In the Deploying setup section, it's unclear that a few Juju snaps have to be installed. Therefore, when deploying / enabling Kubeflow, there are errors.

Deployment will fail if a controller without region exists in juju

I deployed following these instructions https://jaas.ai/kubeflow/bundle/120#setup-microk8s.
At the step python3 scripts/cli.py deploy-to uk8s , the deployment fails towards the end with this error:

DEBUG:root:pytorch-operator is lead by pytorch-operator/0
DEBUG:root:ambassador-auth is lead by ambassador-auth/0
juju kubectl apply -f
/home/crodriguez/bundle-kubeflow/scripts/../charms/pipelines-scheduledworkflow/resources/sa.yaml
Error: YamlError(Message("missing field region", Some(Pos { marker: Marker { index: 1840, line:
36, col: 8 }, path: "controllers.lxd-controller" })))
Command '('juju', 'kubectl', 'apply', '-f',
'/home/crodriguez/bundle-kubeflow/scripts/../charms/pipelines-scheduledworkflow/resources/sa.yaml')' returned non-zero exit status 1.

I do have a controller without a region, but that is my lxd controller, it should not break the deployment of a model inside my uk8s controller.

pipelines-api fails deployment because mariadb hangs while waiting for root-password

If we fix mariadb root-password post-config, Mariadb pod will be deployed after pipelines-api is configured. The main problem is that pipelines-api won't recover mariadb hostname after, so it keeps failing with:

$ kubectl logs -n kubeflow pipelines-api-86f4bb58b4-8kcqt
I0901 09:19:05.630356 7 client_manager.go:123] Initializing client manager
F0901 09:19:14.107282 7 error.go:296] dial tcp: lookup : no such host
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000347200, 0xc000584c80, 0x56, 0x9b)
external/com_github_golang_glog/glog.go:769 +0xd4
github.com/golang/glog.(*loggingT).output(0x2962840, 0xc000000003, 0xc0004291e0, 0x2844815, 0x8, 0x128, 0x0)
external/com_github_golang_glog/glog.go:720 +0x329
github.com/golang/glog.(*loggingT).printf(0x2962840, 0x3, 0x1a7dc3d, 0x2, 0xc00054f9b0, 0x1, 0x1)
external/com_github_golang_glog/glog.go:655 +0x14b
github.com/golang/glog.Fatalf(0x1a7dc3d, 0x2, 0xc00054f9b0, 0x1, 0x1)
external/com_github_golang_glog/glog.go:1148 +0x67
github.com/kubeflow/pipelines/backend/src/common/util.TerminateIfError(0x1c31960, 0xc0002725f0)
backend/src/common/util/error.go:296 +0x79
main.initMysql(0xc00036ed4a, 0x5, 0x12a05f200, 0x0, 0x0)
backend/src/apiserver/client_manager.go:260 +0x37d
main.initDBClient(0x12a05f200, 0x15)
backend/src/apiserver/client_manager.go:190 +0x5c0
main.(*ClientManager).init(0xc00054fcd8)
backend/src/apiserver/client_manager.go:125 +0x80
main.newClientManager(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
backend/src/apiserver/client_manager.go:300 +0x7b
main.main()
backend/src/apiserver/main.go:56 +0x5e

Issue with Install

Hi, when I was installing juju-solutions kubeflow I ran into an issue at this part

python3 scripts/cli.py microk8s setup --controller uk8s

The error message I got was:

+ juju bootstrap microk8s uk8s ERROR controller "uk8s" already exists Command '('juju', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 1.

How do I fix this error?

Thanks

Kubeflow dashboard does not update login info when username/password are changed with juju

I've deployed kubeflow on microk8s following https://jaas.ai/kubeflow/bundle/120

The default username set for the kubeflow dashboard is admin, password set by myself.
I used juju to change this config in ambassador-auth, as shown here :
crodriguez@camille-hp:~/bundle-kubeflow$ juju config ambassador-auth username
camille
crodriguez@camille-hp:~/bundle-kubeflow$ juju config ambassador-auth password
password123

Now, clear cache in my browser and logging into the kubeflow browser, it asks for authentication. The new config (camille/password123) does not grant me access. Only the older config does (admin/****).

CMR relation for keystone integration

It would be nice to have charmed kubeflow connected to keystone through CMR. In a standard bare-metal charmed k8s deployment we typically use keystone for authentication service for k8s. It would be nice to use the same authentication service with Kubeflow instead of adding another authentication service like Dex to connect to LDAP. It will reduce management effort for administrators.