Hello,
I deployed a ubuntu 18.04 VM with 8GB ram, 40GB disk space.
On it I deployed the juju snaps, microk8s, and the latest bundle-kubeflow. Multiple pods are not coming online and present various errors. It seems like a recurrent issue.
ubuntu@ai-demo:~$ m get pods -n kubeflow
NAME READY STATUS RESTARTS AGE
ambassador-79574bd65b-654hx 1/1 Running 0 57m
ambassador-operator-0 1/1 Running 0 58m
argo-controller-64cc95f77b-nl6s2 1/1 Running 0 50m
argo-controller-operator-0 1/1 Running 0 58m
argo-ui-84d8b568d8-krm7n 1/1 Running 0 57m
argo-ui-operator-0 1/1 Running 0 57m
jupyter-controller-7bccb55f46-sjl97 1/1 Running 0 57m
jupyter-controller-operator-0 1/1 Running 0 57m
jupyter-web-9dc84c45b-mx9fr 1/1 Running 0 57m
jupyter-web-operator-0 1/1 Running 0 57m
katib-controller-56dd5bf95b-s45s5 1/1 Running 0 55m
katib-controller-operator-0 1/1 Running 0 57m
katib-db-0 1/1 Running 1 56m
katib-db-operator-0 1/1 Running 0 57m
katib-manager-5d6cc65b8c-vmhjc 0/1 CrashLoopBackOff 13 48m
katib-manager-operator-0 1/1 Running 0 57m
katib-ui-76974795f9-7r85z 1/1 Running 0 56m
katib-ui-operator-0 1/1 Running 0 57m
kubeflow-dashboard-757c877956-jclqd 1/1 Running 0 50m
kubeflow-dashboard-operator-0 1/1 Running 0 57m
kubeflow-gatekeeper-6f9fcf8c55-gcdfw 1/1 Running 0 54m
kubeflow-gatekeeper-operator-0 1/1 Running 0 57m
kubeflow-login-97d55d69f-9vzhg 1/1 Running 0 55m
kubeflow-login-operator-0 1/1 Running 0 56m
kubeflow-profiles-57fd5c6d78-6fzqm 2/2 Running 0 54m
kubeflow-profiles-operator-0 1/1 Running 0 56m
metacontroller-5ccc9b744d-sw49n 1/1 Running 0 54m
metacontroller-operator-0 1/1 Running 0 56m
metadata-controller-7f94875696-s24tr 0/1 CrashLoopBackOff 5 47m
metadata-controller-operator-0 1/1 Running 0 56m
metadata-db-0 1/1 Running 1 54m
metadata-db-operator-0 1/1 Running 0 56m
metadata-ui-58bdd9b6bc-ntzjk 1/1 Running 0 50m
metadata-ui-operator-0 1/1 Running 0 56m
minio-0 1/1 Running 0 54m
minio-operator-0 1/1 Running 0 55m
modeldb-backend-797f77c488-9vrkz 1/2 Error 5 44m
modeldb-backend-operator-0 1/1 Running 0 55m
modeldb-db-0 1/1 Running 1 54m
modeldb-db-operator-0 1/1 Running 0 55m
modeldb-store-fbf49bdf8-7mlqh 1/1 Running 0 54m
modeldb-store-operator-0 1/1 Running 0 55m
modeldb-ui-78b6dd66b8-zwpdf 1/1 Running 0 45m
modeldb-ui-operator-0 1/1 Running 0 55m
pipelines-api-6c6f459c98-q2grc 0/1 CrashLoopBackOff 9 44m
pipelines-api-operator-0 1/1 Running 0 54m
pipelines-db-0 1/1 Running 1 53m
pipelines-db-operator-0 1/1 Running 0 53m
pipelines-persistence-664c75f577-95x7d 0/1 CrashLoopBackOff 7 48m
pipelines-persistence-operator-0 1/1 Running 0 53m
pipelines-scheduledworkflow-79cc64c7c4-n4nwm 0/1 Init:0/1 0 50m
pipelines-scheduledworkflow-operator-0 1/1 Running 0 53m
pipelines-ui-867bd9ccf4-7fkwv 1/1 Running 0 44m
pipelines-ui-operator-0 1/1 Running 0 53m
pipelines-viewer-74c4f8bcd-rzxml 1/1 Running 0 50m
pipelines-viewer-operator-0 1/1 Running 0 52m
pytorch-operator-79b6bf8d4c-bz2bt 1/1 Running 0 49m
pytorch-operator-operator-0 1/1 Running 0 52m
redis-5b9c9c4b45-ctcc7 1/1 Running 0 48m
redis-operator-0 1/1 Running 0 52m
seldon-api-frontend-74cbb778cc-t4w4g 1/1 Running 0 45m
seldon-api-frontend-operator-0 1/1 Running 0 52m
seldon-cluster-manager-747565f949-47k9z 1/1 Running 1 45m
seldon-cluster-manager-operator-0 1/1 Running 0 52m
tensorboard-85b6fc699f-vj9fg 1/1 Running 0 49m
tensorboard-operator-0 1/1 Running 0 52m
tf-job-dashboard-9b5b659bb-x8cqm 1/1 Running 0 49m
tf-job-dashboard-operator-0 1/1 Running 0 52m
tf-job-operator-6bc8cb454c-gwxls 1/1 Running 0 48m
tf-job-operator-operator-0 1/1 Running 0 52m
First pod in problem is the katib-manager. It seems like a readiness issue, the port isn't responding.
ubuntu@ai-demo:~/bundle-kubeflow$ m describe pod katib-manager-5d6cc65b8c-vmhjc -n kubeflow
Name: katib-manager-5d6cc65b8c-vmhjc
Namespace: kubeflow
Priority: 0
Node: ai-demo/10.180.213.139
Start Time: Tue, 03 Dec 2019 16:29:50 -0600
Labels: juju-app=katib-manager
pod-template-hash=5d6cc65b8c
Annotations: apparmor.security.beta.kubernetes.io/pod: runtime/default
juju.io/controller: e1a8ad96-a251-47a5-8c33-3b8a8fe66bbd
juju.io/model: b0ee3e19-b70e-49a4-81c4-fb9e4242b426
juju.io/unit: katib-manager/0
seccomp.security.beta.kubernetes.io/pod: docker/default
Status: Running
IP: 10.1.44.71
IPs:
IP: 10.1.44.71
Controlled By: ReplicaSet/katib-manager-5d6cc65b8c
Init Containers:
juju-pod-init:
Container ID: containerd://5f015946a8c76d221fba8ac787b67f2dac9581f9f79105469118ed2403a22c1b
Image: jujusolutions/jujud-operator:2.7.0
Image ID: docker.io/jujusolutions/jujud-operator@sha256:375eee66a4a7af6128cb84c32a94a1abeffa4f4872e063ba935296701776b5e5
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
export JUJU_DATA_DIR=/var/lib/juju
export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools
mkdir -p $JUJU_TOOLS_DIR
cp /opt/jujud $JUJU_TOOLS_DIR/jujud
initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
if test -n "$initCmd"; then
$JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
else
exit 0
fi
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 03 Dec 2019 16:30:15 -0600
Finished: Tue, 03 Dec 2019 16:31:38 -0600
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/juju from juju-data-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Containers:
katib-manager:
Container ID: containerd://0e46f446f24c1c64d109308fd4b64fcbd348462993ceb8493be4bc7c4d2ca1af
Image: registry.jujucharms.com/kubeflow-charmers/katib-manager/oci-image@sha256:28dddef61f71a8e8de0999c67ec60c38d2c1a91d6e24b96ec1e5ba4401add07e
Image ID: registry.jujucharms.com/kubeflow-charmers/katib-manager/oci-image@sha256:28dddef61f71a8e8de0999c67ec60c38d2c1a91d6e24b96ec1e5ba4401add07e
Port: 6789/TCP
Host Port: 0/TCP
Command:
./katib-manager
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Tue, 03 Dec 2019 17:20:48 -0600
Finished: Tue, 03 Dec 2019 17:21:27 -0600
Ready: False
Restart Count: 15
Liveness: exec [/bin/grpc_health_probe -addr=:6789] delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/bin/grpc_health_probe -addr=:6789] delay=5s timeout=1s period=60s #success=1 #failure=5
Environment:
DB_NAME: mysql
DB_PASSWORD: TW6VQALVDQ41NSQLF4EMG32D0F568T
MYSQL_HOST: 10.152.183.67
MYSQL_PORT: 3306
Mounts:
/usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
/var/lib/juju from juju-data-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
juju-data-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
default-token-j9wx5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-j9wx5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kubeflow/katib-manager-5d6cc65b8c-vmhjc to ai-demo
Normal Pulled 54m kubelet, ai-demo Container image "jujusolutions/jujud-operator:2.7.0" already present on machine
Normal Created 54m kubelet, ai-demo Created container juju-pod-init
Normal Started 53m kubelet, ai-demo Started container juju-pod-init
Normal Pulling 52m kubelet, ai-demo Pulling image "registry.jujucharms.com/kubeflow-charmers/katib-manager/oci-image@sha256:28dddef61f71a8e8de0999c67ec60c38d2c1a91d6e24b96ec1e5ba4401add07e"
Normal Pulled 38m kubelet, ai-demo Successfully pulled image "registry.jujucharms.com/kubeflow-charmers/katib-manager/oci-image@sha256:28dddef61f71a8e8de0999c67ec60c38d2c1a91d6e24b96ec1e5ba4401add07e"
Warning Unhealthy 36m (x2 over 37m) kubelet, ai-demo Readiness probe failed: timeout: failed to connect service ":6789" within 1s
Normal Created 36m (x3 over 38m) kubelet, ai-demo Created container katib-manager
Normal Started 36m (x3 over 38m) kubelet, ai-demo Started container katib-manager
Warning Unhealthy 34m (x17 over 37m) kubelet, ai-demo Liveness probe failed: timeout: failed to connect service ":6789" within 1s
Warning BackOff 13m (x78 over 33m) kubelet, ai-demo Back-off restarting failed container
Normal Killing 9m1s (x14 over 37m) kubelet, ai-demo Container katib-manager failed liveness probe, will be restarted
Normal Pulled 3m58s (x14 over 37m) kubelet, ai-demo Container image "registry.jujucharms.com/kubeflow-charmers/katib-manager/oci-image@sha256:28dddef61f71a8e8de0999c67ec60c38d2c1a91d6e24b96ec1e5ba4401add07e" already present on machine
Second one is the pod metadata-controller that shows an error when connecting to the mariadb server
ubuntu@ai-demo:~/bundle-kubeflow$ m describe pod metadata-controller-7f94875696-s24tr -n kubeflow
Name: metadata-controller-7f94875696-s24tr
Namespace: kubeflow
Priority: 0
Node: ai-demo/10.180.213.139
Start Time: Tue, 03 Dec 2019 16:30:26 -0600
Labels: juju-app=metadata-controller
pod-template-hash=7f94875696
Annotations: apparmor.security.beta.kubernetes.io/pod: runtime/default
juju.io/controller: e1a8ad96-a251-47a5-8c33-3b8a8fe66bbd
juju.io/model: b0ee3e19-b70e-49a4-81c4-fb9e4242b426
juju.io/unit: metadata-controller/0
seccomp.security.beta.kubernetes.io/pod: docker/default
Status: Running
IP: 10.1.44.72
IPs:
IP: 10.1.44.72
Controlled By: ReplicaSet/metadata-controller-7f94875696
Init Containers:
juju-pod-init:
Container ID: containerd://13c5bfd0b5ce2d9d45b281a70f11aa5eb5c99aabe4a6360c629ac901de7861ef
Image: jujusolutions/jujud-operator:2.7.0
Image ID: docker.io/jujusolutions/jujud-operator@sha256:375eee66a4a7af6128cb84c32a94a1abeffa4f4872e063ba935296701776b5e5
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
export JUJU_DATA_DIR=/var/lib/juju
export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools
mkdir -p $JUJU_TOOLS_DIR
cp /opt/jujud $JUJU_TOOLS_DIR/jujud
initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
if test -n "$initCmd"; then
$JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
else
exit 0
fi
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 03 Dec 2019 16:30:44 -0600
Finished: Tue, 03 Dec 2019 16:32:55 -0600
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/juju from juju-data-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Containers:
metadata:
Container ID: containerd://a2fc9241a7dae1bf22bb3285e71541eca020c58fec928c8aa24a39073e07635e
Image: registry.jujucharms.com/kubeflow-charmers/metadata-controller/oci-image@sha256:f2a0756e9c41f10cbd178e420e37ef0aaa5d60bbed34300a66b1c99745838d36
Image ID: registry.jujucharms.com/kubeflow-charmers/metadata-controller/oci-image@sha256:f2a0756e9c41f10cbd178e420e37ef0aaa5d60bbed34300a66b1c99745838d36
Port: 8080/TCP
Host Port: 0/TCP
Command:
./server/server
--http_port=8080
--mysql_service_host=10.152.183.188
--mysql_service_port=3306
--mysql_service_user=root
--mysql_service_password=root
--mlmd_db_name=metadb
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 04 Dec 2019 08:51:20 -0600
Finished: Wed, 04 Dec 2019 08:51:20 -0600
Ready: False
Restart Count: 188
Environment:
MYSQL_ROOT_PASSWORD: root
Mounts:
/usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
/var/lib/juju from juju-data-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
juju-data-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
default-token-j9wx5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-j9wx5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 21m (x184 over 15h) kubelet, ai-demo Container image "registry.jujucharms.com/kubeflow-charmers/metadata-controller/oci-image@sha256:f2a0756e9c41f10cbd178e420e37ef0aaa5d60bbed34300a66b1c99745838d36" already present on machine
Warning BackOff 85s (x4287 over 15h) kubelet, ai-demo Back-off restarting failed container
ubuntu@ai-demo:~/bundle-kubeflow$
ubuntu@ai-demo:~/bundle-kubeflow$ m logs metadata-controller-7f94875696-s24tr -n kubeflow
F1204 14:51:20.776472 1 main.go:90] Failed to create ML Metadata Store with config mysql:<host:"10.152.183.188" port:3306 database:"metadb" user:"root" password:"root" > : mysql_real_connect failed: errno: 1130, error: Host '10.1.44.1' is not allowed to connect to this MariaDB server.
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000135100, 0xc0001b8000, 0x124, 0x17a)
external/com_github_golang_glog/glog.go:769 +0xb1
github.com/golang/glog.(*loggingT).output(0x1633360, 0xc000000003, 0xc0001af2d0, 0x14eadd3, 0x7, 0x5a, 0x0)
external/com_github_golang_glog/glog.go:720 +0x2f6
github.com/golang/glog.(*loggingT).printf(0x1633360, 0x3, 0xf6fee1, 0x37, 0xc00019be30, 0x2, 0x2)
external/com_github_golang_glog/glog.go:655 +0x14e
github.com/golang/glog.Fatalf(...)
external/com_github_golang_glog/glog.go:1148
main.mlmdStoreOrDie(0x0)
server/main.go:90 +0x1c3
main.main()
server/main.go:101 +0xe0
Third is the modeldb-backend
ubuntu@ai-demo:~/bundle-kubeflow$ m describe pod modeldb-backend-797f77c488-9vrkz -n kubeflow
Name: modeldb-backend-797f77c488-9vrkz
Namespace: kubeflow
Priority: 0
Node: ai-demo/10.180.213.139
Start Time: Tue, 03 Dec 2019 16:33:37 -0600
Labels: juju-app=modeldb-backend
pod-template-hash=797f77c488
Annotations: apparmor.security.beta.kubernetes.io/pod: runtime/default
juju.io/controller: e1a8ad96-a251-47a5-8c33-3b8a8fe66bbd
juju.io/model: b0ee3e19-b70e-49a4-81c4-fb9e4242b426
juju.io/unit: modeldb-backend/0
seccomp.security.beta.kubernetes.io/pod: docker/default
Status: Running
IP: 10.1.44.77
IPs:
IP: 10.1.44.77
Controlled By: ReplicaSet/modeldb-backend-797f77c488
Init Containers:
juju-pod-init:
Container ID: containerd://548ecc057d06bcf349b64d920e905676696137fd44bcc43cc152ead0c69f1f16
Image: jujusolutions/jujud-operator:2.7.0
Image ID: docker.io/jujusolutions/jujud-operator@sha256:375eee66a4a7af6128cb84c32a94a1abeffa4f4872e063ba935296701776b5e5
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
export JUJU_DATA_DIR=/var/lib/juju
export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools
mkdir -p $JUJU_TOOLS_DIR
cp /opt/jujud $JUJU_TOOLS_DIR/jujud
initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
if test -n "$initCmd"; then
$JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
else
exit 0
fi
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 03 Dec 2019 16:33:40 -0600
Finished: Tue, 03 Dec 2019 16:34:42 -0600
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/juju from juju-data-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Containers:
modeldb-backend:
Container ID: containerd://3a4251c4c8357c53159935efd7346a982500c6e4c05596cd28f65956687d7e99
Image: registry.jujucharms.com/kubeflow-charmers/modeldb-backend/oci-image@sha256:67e70b991598fe8fca12058e2cee1abc342ab26a0047ec4779cb6d8483d87161
Image ID: registry.jujucharms.com/kubeflow-charmers/modeldb-backend/oci-image@sha256:67e70b991598fe8fca12058e2cee1abc342ab26a0047ec4779cb6d8483d87161
Port: 8085/TCP
Host Port: 0/TCP
Command:
bash
Args:
-c
./wait-for-it.sh 10.152.183.242:3306 --timeout=10 && java -jar modeldb-1.0-SNAPSHOT-client-build.jar
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 04 Dec 2019 08:52:30 -0600
Finished: Wed, 04 Dec 2019 08:52:36 -0600
Ready: False
Restart Count: 182
Environment:
VERTA_MODELDB_CONFIG: /config-backend/config.yaml
Mounts:
/config-backend/ from modeldb-backend-config-config (rw)
/usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
/var/lib/juju from juju-data-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
modeldb-backend-proxy:
Container ID: containerd://bbe3449a4f54917085882a13b587f6446d9fe639a917b2639b352a750f3664af
Image: vertaaiofficial/modeldb-backend-proxy:kubeflow
Image ID: docker.io/vertaaiofficial/modeldb-backend-proxy@sha256:5e21c2f82df9b05f7309772dd2be946a8ef24ba43bd2579aa6af22c4827c9205
Port: 8080/TCP
Host Port: 0/TCP
Command:
/go/bin/proxy
Args:
-project_endpoint
localhost:8085
-experiment_endpoint
localhost:8085
-experiment_run_endpoint
localhost:8085
State: Running
Started: Tue, 03 Dec 2019 17:14:40 -0600
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
/var/lib/juju from juju-data-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j9wx5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
juju-data-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
modeldb-backend-config-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: modeldb-backend-config-config
Optional: false
default-token-j9wx5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-j9wx5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m42s (x4143 over 15h) kubelet, ai-demo Back-off restarting failed container
ubuntu@ai-demo:~/bundle-kubeflow$ m logs modeldb-backend-797f77c488-9vrkz -n kubeflow
Error from server (BadRequest): a container name must be specified for pod modeldb-backend-797f77c488-9vrkz, choose one of: [modeldb-backend modeldb-backend-proxy] or one of the init containers: [juju-pod-init]
ubuntu@ai-demo:~/bundle-kubeflow$
And last one is the pipelines api pod
ubuntu@ai-demo:~/bundle-kubeflow$ m describe pod pipelines-api-6c6f459c98-q2grc -n kubeflow
Name: pipelines-api-6c6f459c98-q2grc
Namespace: kubeflow
Priority: 0
Node: ai-demo/10.180.213.139
Start Time: Tue, 03 Dec 2019 16:33:54 -0600
Labels: juju-app=pipelines-api
pod-template-hash=6c6f459c98
Annotations: apparmor.security.beta.kubernetes.io/pod: runtime/default
juju.io/controller: e1a8ad96-a251-47a5-8c33-3b8a8fe66bbd
juju.io/model: b0ee3e19-b70e-49a4-81c4-fb9e4242b426
juju.io/unit: pipelines-api/0
seccomp.security.beta.kubernetes.io/pod: docker/default
Status: Running
IP: 10.1.44.78
IPs:
IP: 10.1.44.78
Controlled By: ReplicaSet/pipelines-api-6c6f459c98
Init Containers:
juju-pod-init:
Container ID: containerd://22633e8e541e6bfcfe78f6de032dac424267c956275f0d2748b4a816a9e99266
Image: jujusolutions/jujud-operator:2.7.0
Image ID: docker.io/jujusolutions/jujud-operator@sha256:375eee66a4a7af6128cb84c32a94a1abeffa4f4872e063ba935296701776b5e5
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
export JUJU_DATA_DIR=/var/lib/juju
export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools
mkdir -p $JUJU_TOOLS_DIR
cp /opt/jujud $JUJU_TOOLS_DIR/jujud
initCmd=$($JUJU_TOOLS_DIR/jujud help commands | grep caas-unit-init)
if test -n "$initCmd"; then
$JUJU_TOOLS_DIR/jujud caas-unit-init --debug --wait;
else
exit 0
fi
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 03 Dec 2019 16:33:56 -0600
Finished: Tue, 03 Dec 2019 16:34:43 -0600
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/juju from juju-data-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from pipelines-api-token-tzmx5 (ro)
Containers:
pipelines-api:
Container ID: containerd://6b215518d4010e935e62b580d49d774dc59a95437f490fe4b7c5ecb1e0c30eb9
Image: registry.jujucharms.com/kubeflow-charmers/pipelines-api/oci-image@sha256:9ce417ed6e5a4c2ba2804d4a2694542b8a0cfc50e7a2cc9a0e08053cd06a41d8
Image ID: registry.jujucharms.com/kubeflow-charmers/pipelines-api/oci-image@sha256:9ce417ed6e5a4c2ba2804d4a2694542b8a0cfc50e7a2cc9a0e08053cd06a41d8
Ports: 8887/TCP, 8888/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 04 Dec 2019 08:53:44 -0600
Finished: Wed, 04 Dec 2019 08:53:52 -0600
Ready: False
Restart Count: 188
Environment:
MINIO_SERVICE_SERVICE_HOST: minio
MINIO_SERVICE_SERVICE_PORT: 9000
MYSQL_SERVICE_HOST: 10.152.183.128
MYSQL_SERVICE_PORT: 3306
POD_NAMESPACE: kubeflow
Mounts:
/config from pipelines-api-config-config (rw)
/samples from pipelines-api-samples-config (rw)
/usr/bin/juju-run from juju-data-dir (rw,path="tools/jujud")
/var/lib/juju from juju-data-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from pipelines-api-token-tzmx5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
juju-data-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
pipelines-api-config-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: pipelines-api-config-config
Optional: false
pipelines-api-samples-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: pipelines-api-samples-config
Optional: false
pipelines-api-token-tzmx5:
Type: Secret (a volume populated by a Secret)
SecretName: pipelines-api-token-tzmx5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 22s (x4287 over 16h) kubelet, ai-demo Back-off restarting failed container
ubuntu@ai-demo:~/bundle-kubeflow$ m logs pipelines-api-6c6f459c98-q2grc -n kubeflow
I1204 14:53:44.834007 6 client_manager.go:123] Initializing client manager
F1204 14:53:52.940333 6 error.go:296] commands out of sync. Did you run multiple statements at once?
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000199000, 0xc0001dcc80, 0x6b, 0x9b)
external/com_github_golang_glog/glog.go:769 +0xd4
github.com/golang/glog.(*loggingT).output(0x2962840, 0xc000000003, 0xc0001ef130, 0x2844815, 0x8, 0x128, 0x0)
external/com_github_golang_glog/glog.go:720 +0x329
github.com/golang/glog.(*loggingT).printf(0x2962840, 0x3, 0x1a7dc3d, 0x2, 0xc0006139b0, 0x1, 0x1)
external/com_github_golang_glog/glog.go:655 +0x14b
github.com/golang/glog.Fatalf(0x1a7dc3d, 0x2, 0xc0006139b0, 0x1, 0x1)
external/com_github_golang_glog/glog.go:1148 +0x67
github.com/kubeflow/pipelines/backend/src/common/util.TerminateIfError(0x1c2fb80, 0xc000442830)
backend/src/common/util/error.go:296 +0x79
main.initMysql(0xc00035f64a, 0x5, 0x12a05f200, 0x0, 0x0)
backend/src/apiserver/client_manager.go:260 +0x37d
main.initDBClient(0x12a05f200, 0x15)
backend/src/apiserver/client_manager.go:190 +0x5c0
main.(*ClientManager).init(0xc000613cd8)
backend/src/apiserver/client_manager.go:125 +0x80
main.newClientManager(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
backend/src/apiserver/client_manager.go:300 +0x7b
main.main()
backend/src/apiserver/main.go:56 +0x5e
ubuntu@ai-demo:~/bundle-kubeflow$