Coder Social home page Coder Social logo

zookeeper-operator's Introduction

Zookeeper Operator

Build Status

Project status: alpha

The project is currently alpha. While no breaking API changes are currently planned, we reserve the right to address bugs and change the API before the project is declared stable.

Table of Contents

Overview

This operator runs a Zookeeper 3.7.2 cluster, and uses Zookeeper dynamic reconfiguration to handle node membership.

The operator itself is built with the Operator framework.

Requirements

  • Access to a Kubernetes v1.15.0+ cluster

Usage

We recommend using our helm charts for all installation and upgrades. Since version 0.2.8 onwards, the helm charts for zookeeper operator and zookeeper cluster are published in https://charts.pravega.io. To add this repository to your Helm repos, use the following command

helm repo add pravega https://charts.pravega.io

However there are manual deployment and upgrade options available as well.

Install the operator

Note: if you are running on Google Kubernetes Engine (GKE), please check this first.

Install via helm

To understand how to deploy the zookeeper operator using helm, refer to this.

Manual deployment

Register the ZookeeperCluster custom resource definition (CRD).

$ kubectl create -f config/crd/bases

You can choose to enable Zookeeper operator for all namespaces or just for a specific namespace. The example is using the default namespace, but feel free to edit the Yaml files and use a different namespace.

Create the operator role and role binding.

// default namespace
$ kubectl create -f config/rbac/default_ns_rbac.yaml

// all namespaces
$ kubectl create -f config/rbac/all_ns_rbac.yaml

Deploy the Zookeeper operator.

$ kubectl create -f config/manager/manager.yaml

Verify that the Zookeeper operator is running.

$ kubectl get deploy
NAME                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
zookeeper-operator   1         1         1            1           12m

Deploy a sample Zookeeper cluster

Install via helm

To understand how to deploy a sample zookeeper cluster using helm, refer to this.

Manual deployment

Create a Yaml file called zk.yaml with the following content to install a 3-node Zookeeper cluster.

apiVersion: "zookeeper.pravega.io/v1beta1"
kind: "ZookeeperCluster"
metadata:
  name: "zookeeper"
spec:
  replicas: 3
$ kubectl create -f zk.yaml

After a couple of minutes, all cluster members should become ready.

$ kubectl get zk

NAME        REPLICAS   READY REPLICAS    VERSION   DESIRED VERSION   INTERNAL ENDPOINT    EXTERNAL ENDPOINT   AGE
zookeeper   3          3                 0.2.8     0.2.8             10.100.200.18:2181   N/A                 94s

Note: when the Version field is set as well as Ready Replicas are equal to Replicas that signifies our cluster is in Ready state

Additionally, check the output of describe command which should show the following cluster condition

$ kubectl describe zk

Conditions:
  Last Transition Time:    2020-05-18T10:17:03Z
  Last Update Time:        2020-05-18T10:17:03Z
  Status:                  True
  Type:                    PodsReady

Note: User should wait for the Pods Ready condition to be True

$ kubectl get all -l app=zookeeper
NAME                     DESIRED   CURRENT   AGE
statefulsets/zookeeper   3         3         2m

NAME             READY     STATUS    RESTARTS   AGE
po/zookeeper-0   1/1       Running   0          2m
po/zookeeper-1   1/1       Running   0          1m
po/zookeeper-2   1/1       Running   0          1m

NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
svc/zookeeper-client     ClusterIP   10.31.243.173   <none>        2181/TCP            2m
svc/zookeeper-headless   ClusterIP   None            <none>        2888/TCP,3888/TCP   2m

Note: If you want to configure zookeeper pod, for example to change the service account or the CPU limits, you can set the following properties: ~/charts/zookeeper/templates/zookeeper.yaml. Service account configuration is available from zookeeper operator version 0.2.9 onwards.

apiVersion: "zookeeper.pravega.io/v1beta1"
kind: "ZookeeperCluster"
metadata:
  name: "example"
spec:
  pod:
    serviceAccountName: "zookeeper"
    resources:
        requests:
          cpu: 200m
          memory: 256Mi
        limits:
          cpu: 200m
          memory: 256Mi

Deploy a sample Zookeeper cluster with Ephemeral storage

Create a Yaml file called zk.yaml with the following content to install a 3-node Zookeeper cluster.

apiVersion: "zookeeper.pravega.io/v1beta1"
kind: "ZookeeperCluster"
metadata:
  name: "example"
spec:
  replicas: 3        
  storageType: ephemeral
$ kubectl create -f zk.yaml

After a couple of minutes, all cluster members should become ready.

$ kubectl get zk

NAME      REPLICAS   READY REPLICAS   VERSION   DESIRED VERSION   INTERNAL ENDPOINT    EXTERNAL ENDPOINT   AGE
example   3          3                 0.2.7     0.2.7             10.100.200.18:2181   N/A                 94s

Note: User should only provide value for either the field persistence or ephemeral in the spec if none of the values is specified default is persistence

Note: In case of ephemeral storage, the cluster may not be able to come back up if more than quorum number of nodes are restarted simultaneously.

Note: In case of ephemeral storage, there will be loss of data when the node gets restarted.

Deploy a sample Zookeeper cluster with Istio

Create a Yaml file called zk-with-istio.yaml with the following content to install a 3-node Zookeeper cluster.

apiVersion: zookeeper.pravega.io/v1beta1
kind: ZookeeperCluster
metadata:
  name: zk-with-istio
spec:
  replicas: 3
  config:
    initLimit: 10
    tickTime: 2000
    syncLimit: 5
    quorumListenOnAllIPs: true
$ kubectl create -f zk-with-istio.yaml

Upgrade a Zookeeper cluster

Trigger the upgrade via helm

To understand how to upgrade the zookeeper cluster using helm, refer to this.

Trigger the upgrade manually

To initiate an upgrade process manually, a user has to update the spec.image.tag field of the ZookeeperCluster custom resource. This can be done in three different ways using the kubectl command.

  1. kubectl edit zk <name>, modify the tag value in the YAML resource, save, and exit.
  2. If you have the custom resource defined in a local YAML file, e.g. zk.yaml, you can modify the tag value, and reapply the resource with kubectl apply -f zk.yaml.
  3. kubectl patch zk <name> --type='json' -p='[{"op": "replace", "path": "/spec/image/tag", "value": "X.Y.Z"}]'.

After the tag field is updated, the StatefulSet will detect the version change and it will trigger the upgrade process.

To detect whether a ZookeeperCluster upgrade is in progress or not, check the output of the command kubectl describe zk. Output of this command should contain the following entries

$ kubectl describe zk

status:
Last Transition Time:    2020-05-18T10:25:12Z
Last Update Time:        2020-05-18T10:25:12Z
Message:                 0
Reason:                  Updating Zookeeper
Status:                  True
Type:                    Upgrading

Additionally, the Desired Version will be set to the version that we are upgrading our cluster to.

$ kubectl get zk

NAME            REPLICAS   READY REPLICAS   VERSION   DESIRED VERSION   INTERNAL ENDPOINT     EXTERNAL ENDPOINT   AGE
zookeeper       3          3                0.2.6     0.2.7             10.100.200.126:2181   N/A                 11m

Once the upgrade completes, the Version field is set to the Desired Version, as shown below

$ kubectl get zk

NAME            REPLICAS   READY REPLICAS   VERSION   DESIRED VERSION   INTERNAL ENDPOINT     EXTERNAL ENDPOINT   AGE
zookeeper       3          3                0.2.7     0.2.7             10.100.200.126:2181   N/A                 11m


Additionally, the Upgrading status is set to False and PodsReady status is set to True, which signifies that the upgrade has completed, as shown below

$ kubectl describe zk

Status:
  Conditions:
    Last Transition Time:    2020-05-18T10:28:22Z
    Last Update Time:        2020-05-18T10:28:22Z
    Status:                  True
    Type:                    PodsReady
    Last Transition Time:    2020-05-18T10:28:22Z
    Last Update Time:        2020-05-18T10:28:22Z
    Status:                  False
    Type:                    Upgrading

Note: The value of the tag field should not be modified while an upgrade is already in progress.

Upgrade the Operator

For upgrading the zookeeper operator check the document operator-upgrade

Uninstall the Zookeeper cluster

Uninstall via helm

Refer to this.

Manual uninstall

$ kubectl delete -f zk.yaml

Uninstall the operator

Note that the Zookeeper clusters managed by the Zookeeper operator will NOT be deleted even if the operator is uninstalled.

Uninstall via helm

Refer to this.

Manual uninstall

To delete all clusters, delete all cluster CR objects before uninstalling the operator.

$ kubectl delete -f config/manager/manager.yaml
$ kubectl delete -f config/rbac/default_ns_rbac.yaml
// or, depending on how you deployed it
$ kubectl delete -f config/rbac/all_ns_rbac.yaml

The AdminServer

The AdminServer is an embedded Jetty server that provides an HTTP interface to the four letter word commands. This port is made accessible to the outside world via the AdminServer service. By default, the server is started on port 8080, but this configuration can be modified by providing the desired port number within the values.yaml file of the zookeeper cluster charts

ports:
   - containerPort: 8118
     name: admin-server

This would bring up the AdminServer service on port 8118 as shown below

$ kubectl get svc
NAME                                TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)
zookeeper-admin-server              LoadBalancer   10.100.200.104   10.243.39.62   8118:30477/TCP

The commands are issued by going to the URL /commands/<command name>, e.g. http://10.243.39.62:8118/commands/stat The list of available commands are

/commands/configuration
/commands/connection_stat_reset
/commands/connections
/commands/dirs
/commands/dump
/commands/environment
/commands/get_trace_mask
/commands/hash
/commands/initial_configuration
/commands/is_read_only
/commands/last_snapshot
/commands/leader
/commands/monitor
/commands/observer_connection_stat_reset
/commands/observers
/commands/ruok
/commands/server_stats
/commands/set_trace_mask
/commands/stat_reset
/commands/stats
/commands/system_properties
/commands/voting_view
/commands/watch_summary
/commands/watches
/commands/watches_by_path
/commands/zabstate

Development

Build the operator image

Requirements:

  • Go 1.17+

Use the make command to build the Zookeeper operator image.

$ make build

That will generate a Docker image with the format <latest_release_tag>-<number_of_commits_after_the_release> (it will append-dirty if there are uncommitted changes). The image will also be tagged as latest.

Example image after running make build.

The Zookeeper operator image will be available in your Docker environment.

$ docker images pravega/zookeeper-operator

REPOSITORY                    TAG              IMAGE ID        CREATED         SIZE   

pravega/zookeeper-operator    0.1.1-3-dirty    2b2d5bcbedf5    10 minutes ago  41.7MB

pravega/zookeeper-operator    latest           2b2d5bcbedf5    10 minutes ago  41.7MB

Optionally push it to a Docker registry.

docker tag pravega/zookeeper-operator [REGISTRY_HOST]:[REGISTRY_PORT]/pravega/zookeeper-operator
docker push [REGISTRY_HOST]:[REGISTRY_PORT]/pravega/zookeeper-operator

where:

  • [REGISTRY_HOST] is your registry host or IP (e.g. registry.example.com)
  • [REGISTRY_PORT] is your registry port (e.g. 5000)

Direct access to the cluster

For debugging and development you might want to access the Zookeeper cluster directly. For example, if you created the cluster with name zookeeper in the default namespace you can forward the Zookeeper port from any of the pods (e.g. zookeeper-0) as follows:

$ kubectl port-forward -n default zookeeper-0 2181:2181

Run the operator locally

You can run the operator locally to help with development, testing, and debugging tasks.

The following command will run the operator locally with the default Kubernetes config file present at $HOME/.kube/config. Use the --kubeconfig flag to provide a different path.

$ make run-local

Installation on Google Kubernetes Engine

The Operator requires elevated privileges in order to watch for the custom resources.

According to Google Container Engine docs:

Ensure the creation of RoleBinding as it grants all the permissions included in the role that we want to create. Because of the way Container Engine checks permissions when we create a Role or ClusterRole.

An example workaround is to create a RoleBinding that gives your Google identity a cluster-admin role before attempting to create additional Role or ClusterRole permissions.

This is a known issue in the Beta release of Role-Based Access Control in Kubernetes and Container Engine version 1.6.

On GKE, the following command must be run before installing the operator, replacing the user with your own details.

$ kubectl create clusterrolebinding your-user-cluster-admin-binding --clusterrole=cluster-admin [email protected]

Installation on Minikube

Minikube Setup

To setup minikube locally you can follow the steps mentioned here.

Once minikube setup is complete, minikube start will create a minikube VM.

Cluster Deployment

First install the zookeeper operator in either of the ways mentioned here. Since minikube provides a single node Kubernetes cluster which has a low resource provisioning, we provide a simple way to install a small zookeeper cluster on a minikube environment using the following command.

helm install zookeeper charts/zookeeper --values charts/zookeeper/values/minikube.yaml

Zookeeper YAML Exporter

Zookeeper Exporter is a binary which is used to generate YAML file for all the secondary resources which Zookeeper Operator deploys to the Kubernetes Cluster. It takes ZookeeperCluster resource YAML file as input and generates bunch of secondary resources YAML files. The generated output look like the following:

>tree  ZookeeperCluster/
ZookeeperCluster/
├── client
│   └── Service.yaml
├── config
│   └── ConfigMap.yaml
├── headless
│   └── Service.yaml
├── pdb
│   └── PodDisruptionBudget.yaml
└── zk
    └── StatefulSet.yaml
How to build Zookeeper Operator

When you build Operator, the Exporter is built along with it. make build-go - will build both Operator as well as Exporter.

How to use exporter

Just run zookeeper-exporter binary with -help option. It will guide you to input ZookeeperCluster YAML file. There are couple of more options to specify. Example: ./zookeeper-exporter -i ./ZookeeperCluster.yaml -o .

zookeeper-operator's People

Contributors

adrianmo avatar akamyshnikova avatar alim-akbashev avatar amuraru avatar anishakj avatar aparnarr avatar azun avatar bourgeoisor avatar cbuto avatar dependabot[bot] avatar endzyme avatar jiayushe avatar joshsouza avatar kentakozuka avatar lspecian-olx avatar mikechengwei avatar mnaser avatar nikore avatar nishant-yt avatar omaskery avatar pbelgundi avatar petrkohut avatar prabhaker24 avatar spiegela avatar srisht avatar srteam2020 avatar tbble avatar tristan1900 avatar unguiculus avatar vgupta-mickey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zookeeper-operator's Issues

Print operator version

On startup, the operator should print its version and commit SHA. It will be very useful information for troubleshooting. Moreover, there should be a -version flag to print the version and exit.

Zookeeper health check

This is to add Helm Charts to Zookeeper so as to implement helm test on it so that post an upgrade we can determine whether the upgrade was successful or not by simply running a helm test <release-name>

Resource Status not updated

When deploying a zookeeper cluster with 0.2.0 zookeeper-operator, the ZookeeperCluster resource does not get the Status section updated and always stays at

Status:
  External Client Endpoint:  
  Internal Client Endpoint:  
  Members:
    Ready:         <nil>
    Unready:       <nil>
  Ready Replicas:  0
  Replicas:        0

Example:
We've deployed a zk-cluster in namespace test-project that is supposed to have three replicas.
Inspecting the related stateful set reveals that there are indeed three replicas running in the system:

$kubectl get statefulsets -n test-project
NAME        DESIRED   CURRENT   AGE
zookeeper   3         3         53m

Inspecting the ZookeeperCluster existing in the project, we get the Status section showing zero replicas.

$kubectl describe zk zookeeper -n test-project
Name:         zookeeper
Namespace:    test-project
Labels:       <none>
Annotations:  <none>
API Version:  zookeeper.pravega.io/v1beta1
Kind:         ZookeeperCluster
Metadata:
  Creation Timestamp:  2019-03-01T19:08:21Z
  Generation:          1
  Owner References:
    API Version:           nautilus.dellemc.com/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Project
    Name:                  test-project
    UID:                   60c9fb21-3c55-11e9-a64e-005056bdd0fa
  Resource Version:        620540
  Self Link:               /apis/zookeeper.pravega.io/v1beta1/namespaces/test-project/zookeeper-clusters/zookeeper
  UID:                     6117246e-3c55-11e9-a64e-005056bdd0fa
Spec:
  Config:
    Init Limit:  10
    Sync Limit:  2
    Tick Time:   2000
  Image:
    Pull Policy:  Always
    Repository:   emccorp/zookeeper
    Tag:          3.5.4-beta-operator
  Labels:
    App:      zookeeper
    Release:  zookeeper
  Persistence:
    Access Modes:
      ReadWriteOnce
    Data Source:  <nil>
    Resources:
      Requests:
        Storage:  20Gi
  Pod:
    Affinity:
      Pod Anti Affinity:
        Preferred During Scheduling Ignored During Execution:
          Pod Affinity Term:
            Label Selector:
              Match Expressions:
                Key:       app
                Operator:  In
                Values:
                  zookeeper
            Topology Key:  kubernetes.io/hostname
          Weight:          20
    Labels:
      App:      zookeeper
      Release:  zookeeper
    Resources:
    Termination Grace Period Seconds:  30
  Ports:
    Container Port:  2181
    Name:            client
    Container Port:  2888
    Name:            quorum
    Container Port:  3888
    Name:            leader-election
  Replicas:          3
  Size:              3
Status:
  External Client Endpoint:  
  Internal Client Endpoint:  
  Members:
    Ready:         <nil>
    Unready:       <nil>
  Ready Replicas:  0
  Replicas:        0
Events:            <none>

Allow blank watch namespace

A blank watch namespace indicates to watch ALL namespaces for the ZookeeperCluster resource, however the SDK currently blocks a blank namespace.

ZK Cluster fails to come up when re-created with old PVCs

Steps to Reproduce:

  1. Create a zookeeper operator and cluster.
  2. Once cluster is up and running, delete cluster using kubectl delete zk<zk-cluster-name>
  3. Try to re-create cluster using same .yaml file as before.
  4. First server (zk-0) comes up correctly, but 2nd server hangs indefinately in crashLoopBackOff state. Other servers are never created.

Expected:
Zk Cluster should get re-created correctly with all previous data from existing PVCs.

unhealthy zookeepers

Installed Nautilus to a new nightshift cluster, came back ~14 hours later and pravega bookie pods were crashing, zookeeper pods were running but unhealthy. Logs attached.

$ kubectl get pods --namespace nautilus-pravega
NAME                                           READY   STATUS             RESTARTS   AGE
nautilus-bookie-0                              0/1     CrashLoopBackOff   39         18h
nautilus-bookie-1                              0/1     CrashLoopBackOff   45         18h
nautilus-bookie-2                              1/1     Running            0          18h
nautilus-pravega-controller-685d99d988-qg2vq   1/1     Running            2          18h
nautilus-pravega-grafana-pod                   1/1     Running            0          18h
nautilus-pravega-influxdb-pod                  1/1     Running            0          18h
nautilus-pravega-segmentstore-0                1/1     Running            0          18h
nautilus-pravega-segmentstore-1                1/1     Running            0          18h
nautilus-pravega-segmentstore-2                1/1     Running            0          18h
nautilus-pravega-telegraf-pod                  1/1     Running            0          18h
nautilus-pravega-zookeeper-0                   1/1     Running            0          18h
nautilus-pravega-zookeeper-1                   1/1     Running            0          18h
nautilus-pravega-zookeeper-2                   1/1     Running            0          18h
pravega-operator-54cdd8ccf9-9pbx6              1/1     Running            0          18h
pravega-service-broker-c755d8df9-nw9jd         1/1     Running            0          18h

bookies-crashed.log

zk-0.log.zip

Rbac error in all_ns and default_ns

I am having troubles creating rbac in all_ns and default_ns.
I think problems are located here and here. The first one should have a namespace specified and the second one is using default instead of zookeeper-operator

Check the entire license header on Go files

Create an automated check that verifies that all Golang files contain the entire license header below.

/**
 * Copyright (c) 2018 Dell Inc., or its subsidiaries. All Rights Reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 */

Add versioning

Since we are about to open up the repo and make the first release. We need to add versioning to the project.

Docker images are released with a wrong tag

Yesterday, @Tristan1900 and I released version 0.2.2. We created an operator release in GitHub, and then a Travis build was triggered and Docker images were created and pushed as part of the deploy stage.

script: make push

However, the image tag was formed incorrectly and the resulting image was pushed as 0.2.2-dirty instead of 0.2.2, making the release inaccessible through the expected tag.

Screenshot from 2019-05-14 16-01-37

Docker image tag is obtained through git tag information.

VERSION=$(shell git describe --always --tags --dirty | sed "s/\(.*\)-g`git rev-parse --short HEAD`/\1/")

The above git describe command is using the -dirty flag, which adds the "-dirty" suffix to the git tag if the working tree has local modification.

That means that the Travis build process generates modifications that are not git-ignored. I was able to reproduce this locally and I found out that the make dep can make modifications to the vendor directory and the Gopkg.lock file that are causing make build to create images with the wrong tag.

Two possible solutions to this are:

  1. Remove the -dirty flag from the git describe command.
  2. Add Gopkg.lock and vendor to the .gitignore file. Maybe we can also remove Gopkg.lock from the repo.

@Tristan1900 @spiegela what do you think?

Zookeeper pre-stop hook failes when cluster hasn't started

The Zookeeper pre-stop hook de-registers a node from the ensemble before it the node stops. When deleting an instance, the ensemble may not be available to remove the node, creating an warning K8s event.

++ DOMAIN=zinc-chinchilla-zookeeper-headless.default.svc.cluster.local                                                                                                                                     │
++ QUORUM_PORT=2888                                                                                                                                                                                        │
++ LEADER_PORT=3888                                                                                                                                                                                        │
++ CLIENT_HOST=zinc-chinchilla-zookeeper-client                                                                                                                                                            │
++ CLIENT_PORT=9277                                                                                                                                                                                        │
+ source /usr/local/bin/zookeeperFunctions.sh                                                                                                                                                              │
++ set -ex                                                                                                                                                                                                 │
+ DATA_DIR=/data                                                                                                                                                                                           │
+ MYID_FILE=/data/myid                                                                                                                                                                                     │
+ LOG4J_CONF=/conf/log4j-quiet.properties                                                                                                                                                                  │
+ set +e                                                                                                                                                                                                   │
++ zkConnectionString                                                                                                                                                                                      │
++ set +e                                                                                                                                                                                                  │
++ nslookup zinc-chinchilla-zookeeper-client                                                                                                                                                               │
++ [[ 0 -eq 1 ]]                                                                                                                                                                                           │
++ set -e                                                                                                                                                                                                  │
++ echo zinc-chinchilla-zookeeper-client:9277                                                                                                                                                              │
+ ZKURL=zinc-chinchilla-zookeeper-client:9277                                                                                                                                                              │
+ set -e                                                                                                                                                                                                   │
++ cat /data/myid                                                                                                                                                                                          │
+ MYID=1                                                                                                                                                                                                   │
+ java -Dlog4j.configuration=file:/conf/log4j-quiet.properties -jar /root/zu.jar remove zinc-chinchilla-zookeeper-client:9277 1                                                                            │
Connecting to Zookeeper zinc-chinchilla-zookeeper-client:9277                                                                                                                                              │
, message: "+ source /conf/env.sh\n++ DOMAIN=zinc-chinchilla-zookeeper-headless.default.svc.cluster.local\n++ QUORUM_PORT=2888\n++ LEADER_PORT=3888\n++ CLIENT_HOST=zinc-chinchilla-zookeeper-client\n++ CL│

Add unit tests

There are no tests for this project right now. We should expect fairly complete coverage.

Currently prioritizing the internal packages:

  • v1beta1.ClusterSpec#withDefaults
  • v1beta1.ContainerImaage#withDefaults
  • v1beta1.PodPolicy#withDefaults
  • v1beta1.ZookeeperConfig#WithDefaults
  • deploy#makeZkSts
  • deploy#makePodSpec
  • deploy#makeZkHeadlessSvc
  • deploy#makeZkClientSvc
  • deploy#makeZkConfigMap
  • deploy#makeZkPdb

Add recovery handling for restarted pods

Currently, there are not detailed checks of data within a pod's persistent volumes. If a pod starts, and the network identity, zookeeper ordinal and cluster membership all match, then the data is assumed correct. If any of those don't match, the data is removed.

The desired behavior is for the Pod, during startup, to check the consistency of data, and potentially leverage ZK snaps to recover data.

Set appropriate default zookeeper image

The default image used to deploy zookeeper, in the absence of any image being provided in the manifest, is emccorp/zookeeper:3.5.4-beta-operator. This image does not have the fix for issue #66 and hence causes problems when developer is not careful enough to specify the right image in the manifest.
Please change the default zookeeper image to pravega/zookeeper:0.2.2 or later

second zk pod doesn't get ready

I tried starting a simple cluster (using the "latest" image built in late July). The second pod ("zk-cluster-1") never gets ready and the third doesn't start up.

describe pod zk-cluster-2 says:

++ echo zk-cluster-client:2181
+ ZKURL=zk-cluster-client:2181
+ set -e
++ cat /data/myid
+ MYID=2
++ java -Dlog4j.configuration=file:/conf/log4j-quiet.properties -jar /root/zu.jar get-role zk-cluster-client:2181 2
Connecting to Zookeeper zk-cluster-client:2181
Server not found in zookeeper config
+ ROLE=

Logs from the second pod just shows the localhost ruok requests. On the first pod ("zk-cluster-0") it shows the connections from the first pod, but I guess it never successfully joins the cluster.

2019-08-17 03:23:07,288 [myid:1] - INFO  [NIOWorkerThread-1:ZooKeeperServer@1041] - Client attempting to establish new session at /10.42.6.236:43346
2019-08-17 03:23:07,293 [myid:1] - INFO  [CommitProcWorkThread-1:ZooKeeperServer@748] - Established session 0x10004784d030048 with negotiated timeout 4000 for client /10.42.6.236:43346
2019-08-17 03:23:07,690 [myid:1] - WARN  [NIOWorkerThread-1:NIOServerCnxn@366] - Unable to read additional data from client sessionid 0x10004784d030048, likely client has closed socket
2019-08-17 03:23:07,691 [myid:1] - INFO  [NIOWorkerThread-1:MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id1,name1=replica.1,name2=Leader,name3=Connections,name4=10.42.6.236,name5=0x10004784d030048]
2019-08-17 03:23:07,691 [myid:1] - INFO  [NIOWorkerThread-1:NIOServerCnxn@627] - Closed socket connection for client /10.42.6.236:43346 which had sessionid 0x10004784d030048
2019-08-17 03:23:12,201 [myid:1] - INFO  [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:354

Wrong Cluster Membership in Dynamic Config File

Problem description
We have suspicions that Pravega components (e.g., BK, SSS) are having trouble to interact with Zookeeper sometimes, which leads to failures. In this sense, an hypothesis we have is that the configuration of Zookeeper may be behind those problems. In a simple experiment, we deployed 3 Zookeeper instances with the Zookeeper Operator, and in the dynamic config file of each instance, we see the following:

(zookeeper-0)

/ # cat /data/zoo.cfg.dynamic
server.1=zookeeper-0.zookeeper-headless.default.svc.cluster.local:2888:3888:participant;2181

(zookeeper-1)

/ # cat /data/zoo.cfg.dynamic
server.1=zookeeper-0.zookeeper-headless.default.svc.cluster.local:2888:3888:participant;0.0.0.0:2181
server.2=zookeeper-1.zookeeper-headless.default.svc.cluster.local:2888:3888:observer;0.0.0.0:2181

(zookeeper-2)

/ # cat /data/zoo.cfg.dynamic
server.1=zookeeper-0.zookeeper-headless.default.svc.cluster.local:2888:3888:participant;0.0.0.0:2181
server.2=zookeeper-1.zookeeper-headless.default.svc.cluster.local:2888:3888:participant;0.0.0.0:2181
server.3=zookeeper-2.zookeeper-headless.default.svc.cluster.local:2888:3888:observer;0.0.0.0:2181

So, apparently every Zookeeper instance has a different view of the cluster, which may lead to problems. According to the documentation of Zookeeper, all the instances should have a consistent configuration.

Problem location
Zookeeper operator dynamic config generation.

Suggestions for an improvement
At first glance, all the instances should have the same membership in zoo.cfg.dynamic with all the members of the Zookeeper cluster. Also, we may need to verify whether the role setting (observer/participant) is correctly set in this context.

Allow Zookeepers to be run with ephemeral storage.

Currently the operator only allows for persistent volumes, however for certain use cases it can be very useful to have ephemeral storage instead of persistent storage.

This could be implemented like the etcd-operator, where EmptyDir is used when no PVC spec is provided. The issue there is that the change would be backwards incompatible. The API documentation actually states that this is the behavior, but it isn't implemented.

The changes would be relatively minimal and would just require edits to the statefulSet generator and api.

Handle scale-up/scale-down operations

Depends upon #4. There may be other requirements to handle scale-up and down using kubectl scale. However, the cluster can be scaled by changing the cluster spec and re-applying wtih kubectl apply.

Fix Windows Cross compilation failure

Since last 2 days, make build-go fails without any changes to zookeeper-operator code, with this error:

CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
-ldflags "-X github.com/pravega/zookeeper-operator/pkg/version.Version=0.2.3-10 -X github.com/pravega/zookeeper-operator/pkg/version.GitSHA=68bf700" \
-o bin/zookeeper-operator-linux-amd64 cmd/manager/main.go
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
-ldflags "-X github.com/pravega/zookeeper-operator/pkg/version.Version=0.2.3-10 -X github.com/pravega/zookeeper-operator/pkg/version.GitSHA=68bf700" \
-o bin/zookeeper-exporter-linux-amd64 cmd/exporter/main.go
CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 go build \
-ldflags "-X github.com/pravega/zookeeper-operator/pkg/version.Version=0.2.3-10 -X github.com/pravega/zookeeper-operator/pkg/version.GitSHA=68bf700" \
-o bin/zookeeper-operator-darwin-amd64 cmd/manager/main.go
CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 go build \
-ldflags "-X github.com/pravega/zookeeper-operator/pkg/version.Version=0.2.3-10 -X github.com/pravega/zookeeper-operator/pkg/version.GitSHA=68bf700" \
-o bin/zookeeper-exporter-darwin-amd64 cmd/exporter/main.go
CGO_ENABLED=0 GOOS=windows GOARCH=amd64 go build \
-ldflags "-X github.com/pravega/zookeeper-operator/pkg/version.Version=0.2.3-10 -X github.com/pravega/zookeeper-operator/pkg/version.GitSHA=68bf700" \
-o bin/zookeeper-operator-windows-amd64.exe cmd/manager/main.go
# github.com/pravega/zookeeper-operator/vendor/golang.org/x/sys/windows
vendor/golang.org/x/sys/windows/dll_windows.go:21:6: missing function body
vendor/golang.org/x/sys/windows/dll_windows.go:24:6: missing function body
Makefile:31: recipe for target 'build-go' failed
make: *** [build-go] Error 2
The command "make build" exited with 2.

This error was seen only with go versions < 1.12.
Hence need to upgrade go version used by zk-operator to 1.12+

RBAC Fails to deploy on GKE

When deploying the operator with kubectl apply -f deploy the RBAC resource fails to deploy on GKE:

$> kubectl apply -f deploy
customresourcedefinition.apiextensions.k8s.io "zookeeper-clusters.zookeeper.pravega.io" created
deployment.apps "zookeeper-operator" created
rolebinding.rbac.authorization.k8s.io "default-account-zookeeper-operator" created
Error from server (Forbidden): error when creating "deploy/rbac.yaml": roles.rbac.authorization.k8s.io "zookeeper-operator" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["*"], APIGroups:["zookeeper.pravega.io"], Verbs:["*"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["*"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["*"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["*"]} PolicyRule{Resources:["persistentvolumeclaims"], APIGroups:[""], Verbs:["*"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["*"]} PolicyRule{Resources:["configmaps"], APIGroups:[""], Verbs:["*"]} PolicyRule{Resources:["secrets"], APIGroups:[""], Verbs:["*"]} PolicyRule{Resources:["deployments"], APIGroups:["apps"], Verbs:["*"]} PolicyRule{Resources:["daemonsets"], APIGroups:["apps"], Verbs:["*"]} PolicyRule{Resources:["replicasets"], APIGroups:["apps"], Verbs:["*"]} PolicyRule{Resources:["statefulsets"], APIGroups:["apps"], Verbs:["*"]} PolicyRule{Resources:["poddisruptionbudgets"], APIGroups:["policy"], Verbs:["*"]}] user=&{[email protected]  [system:authenticated] map[]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews" "selfsubjectrulesreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swagger-2.0.0.pb-v1" "/swagger.json" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]

Solution

See prometheus-operator/prometheus-operator#357 for a solution which should be added to the README.

But in short, running the following before deploying the Operator works:

kubectl create clusterrolebinding your-user-cluster-admin-binding --clusterrole=cluster-admin [email protected]

Add zookeeper cluster status support

Currently, the zk Status cluster is blank. This should be populated with data from the associated stateful-set. This is also required for the kubectl scale command to work.

Resource status not updating correctly

Although there is a concept of resource status for the ZookeeperCluster, the status is never updated throughout the lifecycle of the reconciliation.
We should be updating the ZookeeperCluster status with the correct node count of the stateful set here: https://github.com/pravega/zookeeper-operator/blob/master/pkg/zk/sync.go#L38-L44

This is the current status of a fully deployed ZookeeperCluster:

~ kubectl get zookeeper-cluster some-zookeeper-cluster -n some-namespace -o json
. . .
    "status": {
        "members": {
            "ready": null,
            "unready": null
        },
        "size": 0
    }
. . .

storageClassName not applied from ZookeeperCluster yaml

I've been trying to get the example zk cluster running while specifying a specific storageClass via the PersistentVolumeClaimSpec that's under "persistence" in the ZookeeperCluster yaml and it doesn't appear to be applying that field. It keeps attempting to apply the cluster's default storage class.

apiVersion: "zookeeper.pravega.io/v1beta1"
kind: "ZookeeperCluster"
metadata:
  name: "example"
  namespace: "default"
spec:
  persistence:
    storageClassName: cluster-storage
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: 5Gi
  size: 3

PVC are not deleted when ZK is deleted

When a ZookeeperCluster resource is deleted (i.e. kubectl delete zk example), PVC are not automatically deleted. PVC should have owner reference information so that they are attached to the CR lifecycle.

Error trying to run the operator locally

I'm trying to run the operator locally as instructed in the operator-sdk user guide using the operator-sdk up local command. However, the operators panics and outputs the following log:

$ operator-sdk up local
INFO[0000] Go Version: go1.11
INFO[0000] Go OS/Arch: linux/amd64
INFO[0000] operator-sdk Version: 0.0.5+git
INFO[0000] Watching zookeeper.pravega.io/v1beta1, ZookeeperCluster, default, 5
panic: No Auth Provider found for name "gcp"

goroutine 1 [running]:
github.com/pravega/zookeeper-operator/vendor/k8s.io/client-go/kubernetes/typed/admissionregistration/v1alpha1.NewForConfigOrDie(0xc000394000, 0xc0003e62d0)     
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/zookeeper-operator/vendor/k8s.io/client-go/kubernetes/typed/admissionregistration/v1alpha1/admissionregistration_client.go:58 +0x65
github.com/pravega/zookeeper-operator/vendor/k8s.io/client-go/kubernetes.NewForConfigOrDie(0xc000394000, 0x0)                                                   
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/zookeeper-operator/vendor/k8s.io/client-go/kubernetes/clientset.go:529 +0x49             
github.com/pravega/zookeeper-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient.mustNewKubeClientAndConfig(0x59, 0xc0001f5c30, 0xe840f0)  
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/zookeeper-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient/client.go:138 +0x68
github.com/pravega/zookeeper-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient.newSingletonFactory()                                     
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/zookeeper-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient/client.go:52 +0x34
sync.(*Once).Do(0x1ba0390, 0x1146e50)
        /home/adrian/.gvm/gos/go1.11/src/sync/once.go:44 +0xb3
github.com/pravega/zookeeper-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient.GetResourceClient(0x10e256f, 0x1c, 0x10d7753, 0x10, 0xc00003e470, 0x7, 0xc000430580, 0xc0001f5e20, 0xe8304e, 0xc0004e6410, ...)
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/zookeeper-operator/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient/client.go:70 +0x3d
github.com/pravega/zookeeper-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.Watch(0x10e256f, 0x1c, 0x10d7753, 0x10, 0xc00003e470, 0x7, 0x12a05f200, 0x0, 0x0, 0x0)
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/zookeeper-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/api.go:45 +0x84
main.main()
        /home/adrian/.gvm/pkgsets/go1.11/global/src/github.com/pravega/zookeeper-operator/cmd/zookeeper-operator/main.go:31 +0x215                              
exit status 2
Error: failed to run operator locally: exit status 1

We also experienced this error with the Pravega operator (pravega/pravega-operator#39) and fixed it by importing the gcp package. Will apply the same fix and submit a PR.

Allow offline use

Hello,

In an offline env where we have a private registry (usually mirroring dockerhub)

  • Add a env var to dockerfile to set the registry to use for emccorp/zookeeper image., which will allow to build a custom image without changing the code

or/and

  • make the registry to use in env variables of the operator (idk if possible)

Thank you

Persistence section in manifest seems mandatory

Problem description
The sample manifest in the README does not work with the latest version of the Zookeeper operator. The main reason is that the persistence section seems to be mandatory, probably from PR #62. If it is not set, the Zookeeper operator throws a cryptic error and Zookeeper is not deployed:

{"level":"info","ts":1569916028.5814774,"logger":"controller_zookeepercluster","msg":"Updating zookeeper status","Request.Namespace":"default","Request.Name":"zookeeper","StatefulSet.Namespace":"default","StatefulSet.Name":"zookeeper"}
{"level":"error","ts":1569916028.585655,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"zookeepercluster-controller","request":"default/zookeeper","error":"Operation cannot be fulfilled on zookeeperclusters.zookeeper.pravega.io \"zookeeper\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/pravega/zookeeper-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/pravega/zookeeper-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/pravega/zookeeper-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/pravega/zookeeper-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/pravega/zookeeper-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/pravega/zookeeper-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/pravega/zookeeper-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/pravega/zookeeper-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/pravega/zookeeper-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/pravega/zookeeper-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/pravega/zookeeper-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/pravega/zookeeper-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1569916029.586072,"logger":"controller_zookeepercluster","msg":"Reconciling ZookeeperCluster","Request.Namespace":"default","Request.Name":"zookeeper"}
{"level":"info","ts":1569916029.586174,"logger":"controller_zookeepercluster","msg":"Updating existing config-map","Request.Namespace":"default","Request.Name":"zookeeper","ConfigMap.Namespace":"default","ConfigMap.Name":"zookeeper-configmap"}
{"level":"info","ts":1569916029.5912735,"logger":"controller_zookeepercluster","msg":"Updating StatefulSet","Request.Namespace":"default","Request.Name":"zookeeper","StatefulSet.Namespace":"default","StatefulSet.Name":"zookeeper"}
{"level":"info","ts":1569916029.5961022,"logger":"controller_zookeepercluster","msg":"Updating existing client service","Request.Namespace":"default","Request.Name":"zookeeper","Service.Namespace":"default","Service.Name":"zookeeper-client"}

Adding the persistence section to the manifest fixes the problem and allows me to deploy Zookeeper.

Problem location
Persistence section of the operator.

Suggestions for an improvement
We need to take a decision: should the persistence section be mandatory or not?
In the affirmative case, we need to:

  1. Fix the Zookeeper operator README file, concretelly in the sample Zookeeper manifest provided and add the persistence section.
  2. Throw a clearer error in the operator logs if a user missed to specify this section.

Otherwise, we need to set proper defaults for the persistence section and allow the deployment of Zookeeper without that section in the manifest.

Changing Zookeeper resource "storage" value through `kubectl edit` not reflecting

Changing Zookeeper resource "storage" values through kubectl edit not reflecting for new/scaledup zookeeper pods

Increasing the zookeeper storage size does not work for the already deployed zookeeper pods & also for the new/scaledup pods.

# kubectl edit zk some-zookeeper-cluster -n some-namespace
...
    resources:
      requests:
        storage: 20Gi  --> Increased to 50Gi
...

Add support for data-checking on container restart

Currently the zookeeper-operator assumes that if a restarted instance's zoo.cfg.dynamic is valid, then so is the rest of the information under /data. The zookeeper operator should detect that a pod has been restarted with a valid zoo.cfg.dynamic, and then check the status of the data, restoring or removing the data if necessary.

Slow Zookeeper Pod Recovery (in PKS)

Problem description
In PR #69, the membership management of a Zookeeper cluster has been fixed. That is, at this point a Zookeeper ensemble has a consistent configuration along time, even in the presence of pod restarts.

However, during the investigation of #69, another problem popped up: the long time it takes for Zookeeper to recover from pod deletions/restarts in PKS. Zookeeper is a critical dependency for Pravega and Bookkeeper, so we have examples of the impact of this problem in numerous reported issues: pravega/pravega#3942, pravega/pravega#3836, pravega/pravega#3783, pravega/pravega#3954. That is, among other symptoms, having an unhealthy Zookeeper service running in the cluster for too long may impact the Controller (e.g., unexpected restarts), Bookkeeper (e.g., inability to access metadata), Segment Store (e.g., problems with container recoveries, container assignments), as well as client applications (e.g., readers may exhaust their retries).

In PKS, this problem can be consistently reproduced. This is an example:

  1. We have a complete Pravega deployment in PKS:
λ kubectl.exe get pods
NAME                                                    READY   STATUS    RESTARTS   AGE
alert-seagull-nfs-client-provisioner-5d7745fd84-4dhpn   1/1     Running   0          51m
benchmark-pod-3                                         1/1     Running   0          41m
original-ibex-nfs-client-provisioner-86c46b5d6d-8d5kl   1/1     Running   0          56m
pravega-bookie-0                                        1/1     Running   0          2m45s
pravega-bookie-1                                        1/1     Running   0          2m45s
pravega-bookie-2                                        1/1     Running   0          2m45s
pravega-operator-554d769d4c-gbdj7                       1/1     Running   0          43m
pravega-pravega-controller-d599b4fd6-wj5cd              1/1     Running   0          2m45s
pravega-pravega-segmentstore-0                          1/1     Running   0          2m45s
zookeeper-0                                             1/1     Running   0          5m57s
zookeeper-1                                             1/1     Running   0          5m17s
zookeeper-2                                             1/1     Running   0          4m36s
zookeeper-3                                             1/1     Running   0          4m9s
zookeeper-4                                             1/1     Running   0          3m40s
zookeeper-operator-566694c7ff-stzkn                     1/1     Running   0          59m
  1. I have started a workload to verify that everything works fine:
/bin/pravega-benchmark -controller tcp://172.25.194.11:9090 -producers 1 -segments 1 -size 100 -events 100 -time 300 -stream myStream
...
[epollEventLoopGroup-4-1] INFO io.pravega.client.segment.impl.SegmentOutputStreamImpl - Connection setup complete for writer efc88ec9-d8a7-436b-a966-02169cd8140c
     501 records Writing,     100.1 records/sec,   0.01 MB/sec,     8.2 ms avg latency,    87.0 ms max latency
     501 records Writing,     100.0 records/sec,   0.01 MB/sec,     5.0 ms avg latency,    13.0 ms max latency
     501 records Writing,     100.0 records/sec,   0.01 MB/sec,     5.2 ms avg latency,    23.0 ms max latency
     501 records Writing,     100.0 records/sec,   0.01 MB/sec,     4.7 ms avg latency,    11.0 ms max latency
     501 records Writing,     100.1 records/sec,   0.01 MB/sec,     6.4 ms avg latency,    58.0 ms max latency
  1. Then, we delete 2 out of 5 Zookeeper pods, expecting the service to automatically recover them:
λ kubectl delete pods zookeeper-2 zookeeper-0 --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "zookeeper-2" force deleted
pod "zookeeper-0" force deleted
  1. Zookeeper pods and, consequently, Controller pods keep restarting while the service is trying to recover Zookeeper pods:
λ kubectl.exe get pods
NAME                                                    READY   STATUS    RESTARTS   AGE
alert-seagull-nfs-client-provisioner-5d7745fd84-4dhpn   1/1     Running   0          57m
benchmark-pod-3                                         1/1     Running   0          46m
original-ibex-nfs-client-provisioner-86c46b5d6d-8d5kl   1/1     Running   0          61m
pravega-bookie-0                                        1/1     Running   0          8m11s
pravega-bookie-1                                        1/1     Running   0          8m11s
pravega-bookie-2                                        1/1     Running   0          8m11s
pravega-operator-554d769d4c-gbdj7                       1/1     Running   0          49m
pravega-pravega-controller-d599b4fd6-wj5cd              0/1     Running   2          8m11s
pravega-pravega-segmentstore-0                          1/1     Running   0          8m11s
zookeeper-0                                             0/1     Running   4          4m50s
zookeeper-1                                             1/1     Running   0          10m
zookeeper-2                                             0/1     Running   4          4m51s
zookeeper-3                                             1/1     Running   0          9m35s
zookeeper-4                                             1/1     Running   0          9m6s
zookeeper-operator-566694c7ff-stzkn                     1/1     Running   0          64m
  1. The restarted Zookeeer pods get stuck in the very same point as in the rest of tests we have done so far (see Connecting to Zookeeper zookeeper-client:2181):
λ kubectl.exe logs zookeeper-0
...
Name:      zookeeper-headless.default.svc.cluster.local
Address 1: 172.25.194.10 172-25-194-10.zookeeper-client.default.svc.cluster.local
Address 2: 172.25.194.9 172-25-194-9.zookeeper-client.default.svc.cluster.local
Address 3: 172.25.194.5 172-25-194-5.zookeeper-client.default.svc.cluster.local
+ [[ 0 -eq 1 ]]
+ set -e
+ set +e
++ zkConnectionString
++ set +e
++ nslookup zookeeper-client
++ [[ 0 -eq 1 ]]
++ set -e
++ echo zookeeper-client:2181
+ ZKURL=zookeeper-client:2181
+ set -e
++ java -Dlog4j.configuration=file:/conf/log4j-quiet.properties -jar /root/zu.jar get-all zookeeper-client:2181
Connecting to Zookeeper zookeeper-client:2181

After a period of time between 10 minutes to 15 minutes, the system comes back to stability.


However, the same experiment in GKE has a different outcome: I measured the time from the deletion of pods to their total recovery in 1min 10secs. In this case, the recovery was fast enough to do not induce any Controller restart:

raul_gracia@cloudshell:~ (pravega-dev)$ kubectl get pods
NAME                                        READY   STATUS    RESTARTS   AGE
coiling-shark-nfs-server-provisioner-0      1/1     Running   0          5h54m
pravega-bookie-0                            1/1     Running   1          6m22s
pravega-bookie-1                            1/1     Running   0          6m22s
pravega-bookie-2                            1/1     Running   0          6m22s
pravega-operator-687bbf897b-82z69           1/1     Running   0          155m
pravega-pravega-controller-f4dfc887-9gzb4   1/1     Running   0          6m22s
pravega-pravega-segmentstore-0              1/1     Running   0          6m22s
zookeeper-0                                 1/1     Running   0          72s
zookeeper-1                                 1/1     Running   0          8m57s
zookeeper-2                                 1/1     Running   0          23s
zookeeper-3                                 1/1     Running   0          7m49s
zookeeper-4                                 1/1     Running   0          7m5s
zookeeper-operator-5fb6d7cf4d-tlpsr         1/1     Running   0          5h```

Observations of this issue so far:

  • The issue is reproducible in PKS, not in GKE.
  • While in PKS it may take minutes for Zookeeper to stabilize, in general we observed that this is a transient problem and the system recovers.
  • In PKS, the slow Zookeeper pod restarts occurs the first time we delete Zookeeper pods. Subsequent Zookeeper pod deletions are recovered fast, without problems. This leads us to think that after the Zookeeper cluster recovers from the first deletion, the resolution for the name zookeeper-client:2181 is cached (this is the point in which Zookeeper pods get stuck when trying to recover). If the name resolution in PKS was slow due to some DNS problem, this may explain why subsequent pod deletions are handled faster.

Problem location
PKS networking configuration? DNS?

Suggestions for an improvement

  • Try with a longer readiness probe interval in Zookeeper pods.
  • Understand and potentially fix the DNS architecture in our PKS clusters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.