kudobuilder / operators Goto Github PK

View Code? Open in Web Editor NEW

228.0 17.0 79.0 4.7 MB

Collection of Kubernetes Operators built with KUDO.

Home Page: https://kudo.dev

License: Apache License 2.0

Dockerfile 3.36% Shell 72.05% Makefile 24.59%

kubernetes cncf kubernetes-operator kubernetes-controller kafka mysql redis crd zookeeper elasticsearch

operators's Introduction

Operators

Collection of KUDO operators.

This is a set of KUDO operators that were developed as first community operators by the KUDO community. In the future this should be ideally split into its own repositories maintained by the operator maintainers themselves.

This is not home of all KUDO community operators, that's (operators-index)(https://github.com/kudobuilder/operators-index).

Apache Cassandra

Apache Cassandra is an open-source distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Confluent REST Proxy for Kafka

The Kafka REST Proxy provides a RESTful interface to a Kafka cluster. It makes it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. Examples of use cases include reporting data to Kafka from any frontend app built in any language, ingesting messages into a stream processing framework that doesn't yet support Kafka, and scripting administrative actions.

Confluent Schema Registry for Kafka

Schema Registry provides a serving layer for your metadata. It provides a RESTful interface for storing and retrieving Avro schemas. It stores a versioned history of all schemas, provides multiple compatibility settings and allows evolution of schemas according to the configured compatibility setting. It provides serializers that plug into Kafka clients that handle schema storage and retrieval for Kafka messages that are sent in the Avro format.

Cowsay

The KUDO cowsay operator is a small demo for the KUDO Pipe-Tasks.

Elastic

Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

First Operator

This is an example operator as described in the KUDO documentation.

Apache Flink

Apache Flink® provides stateful computations over data streams.

Apache Kafka

Apache Kafka® is a distributed streaming platform.

MySQL

MySQL is an open-source relational database management system.

RabbitMQ

RabbitMQ is the most widely deployed open source message broker. The KUDO RabbitMQ Operator makes it easy to deploy and manage RabbitMQ on Kubernetes.

Redis

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

Apache Spark

Apache Spark® is a unified analytics engine for large-scale data processing.

Apache Zookeeper

Apache ZooKeeper is an open-source server which enables highly reliable distributed coordination.

operators's People

Contributors

Stargazers

Watchers

Forkers

realmbgl jbarrick-mesosphere guenter masterofmonkeys zmalik yankcrime tbaums ccsgroupinternational ankitcid shubhanilbag joeljacobson minyk sivaramsk edgarlanting rawdatalabs viivek46 adrianriobo muratkars kwehden clistoq djannot rishabh96b sharego vibhujain xinxin2015 ruguodangshi davidkarlsen mpereira alembiewski zaoshangyaochifan harsh-98 harryge00 gallardot jpg-datarobot blublinsky mans2singh el-gear dineshdataengineer laashub-soa allanhung jieyu vishnudeevi nucsimple dboyleitrs var2dan rspurgeon joemitchellms junneyang syldej bochuxt sagar1905 pradeepkaushal devopstoday11 jianghaitao1221 yellowdog farhan5900 davar-playgrounds clix-dev-llc marvel-works asekretenko xiaoruiguo stnguyenn lx1036 saravanan-sathyamoorthy revanthsai79 thomasdarimont dystudio technologyinstitute ekmixon alexrogalskiy iketutg estkae cmcc-ict lang1lang predator4ann developless kovitimi 0xgoerlimainnet ansenio

operators's Issues

[kafka] move away from gcr.io/google-samples image

We are using a gcr.io/google-samples images for kafka operator.

The gcr.io/google-samples is a stalled image and there are more issues to that image other than being stalled.

ref: helm/charts#4540

We should replace the image with a community supported image that is as close to apache/kafka configuration and without tweaks done for a specific platform.

Racy usage of docker image by cassandra tests

When working on mesosphere/kudo-cassandra-operator#56 I noticed there is a race condition between building+pushing a docker image and actually using it in tests, because we seem to be using the static tag which is being pushed to simultaneously from multiple different PRs.

CICD Pipeline to publish repo

What would you like to be added:

We should start thinking about for future releases how Frameworks can be updated and published to our official repo and Google Cloud Storage. E.g. how new PRs can be tested with CircleCI and how merges into master will be published to our hosted bucket.

Why is this needed:

Right now this is a manual process and fairly slow. Automating this has all known benefits of making it more stable, predictable, etc..

This also relates to:

kudobuilder/kudo#303

for zookeeper and kafka make imagePullPolicy configurable

For zookeeper and kafka make imagePullPolicy configurable.
https://github.com/kudobuilder/operators/blob/master/repository/zookeeper/operator/templates/statefulset.yaml#L32
https://github.com/kudobuilder/operators/blob/master/repository/kafka/operator/templates/statefulset.yaml#L83

As it is for example for cassandra.
https://github.com/kudobuilder/operators/blob/master/repository/cassandra/3.11/operator/templates/stateful-set.yaml#L49

Document how to trigger an operator release

Right now when doing an operator release, it's a manual process i.e. asking KUDO engineers to do a release of an operator.

We should document how operator developers can actually kickstart a release process for their operator.

enable custom configuration for confluent schema registry

to enable using custom schema-registry.properties

Instead of using parameters to enable each option separately we can use custom ConfigMap for confluent schema registry to enable users to override the default properties

Specify StorageClass parameter while deploying Elastic Operator

While installing kudo based Operator for Elastic, it should have the provision to change the StorageClass. In my GKE deployment, it used standard StorageClass. Ideally, this parameter should be configurable which is already done for other Kudo based operator such as Kafka.

Cassandra Grafana dashboard link broken

Link to Grafana dashboard yields 404.

Line 74 - https://github.com/kudobuilder/operators/blob/master/repository/cassandra/3.11/docs/monitoring.md

add zookeeper docs to explain how to retrieve ZK_URI

add documentation to explain how to retrieve ZK_URI based on instance name.
For the first time users, doing a custom installation this can be hard to find

Spark Package Verify Warnings

go run cmd/kubectl-kudo/main.go package verify ../operators/repository/spark/operator/
Warnings
template "spark-applications-crds.yaml" is not used as a resource
template "webhook-cleanup-job.yaml" is not used as a resource
package is valid

KUDO mysql operator objects names are hardcoded to 'mysql'

we should use the instance name to enable users to have multiple instances of an operator living in same namespace. In case of mysql its not possible as for today.

Add Operator Labels to OperatorVersions

To support queries like:

kubectl get frameworkversions -l framework=flink

Update or remove outdated demos

Demos are broken or outdated. E.g. the Flink financial fraud demo is supposed to use KUDO 0.7 according to its README. But the operator only works on KUDO 0.8 though, while depending on packages that only work on KUDO 0.7.
The Flink modifications demo doesn't use operator packages at all.

These demos need to be updated or removed.

make node-exporter resources in KUDO Kafka configurable

right now they have hardcoded value and with the significant load, we saw it can start having CPUThrottling errors

Kafka updating properties doesn't work

When I tried to enable external access on minikube I've got error in kafka console

INFO[0000] Running kafka-utils... INFO[0000] kubeconfig file: using InClusterConfig. INFO[0000] kubernetes client configured. INFO[0000] Running kafka-utils... INFO[0000] Checking the service created for kafka-kafka-0 INFO[0000] detected NodePort ERRO[0000] could not run the kafka utils bootstrap: no node found with name 'minikube'

I checked nodes in k8s. k8s has minikube node

kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS minikube Ready master 44h v1.19.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=minikube,kubernetes.io/os=linux,minikube.k8s.io/commit=1fd1f67f338cbab4b3e5a6e4c71c551f522ca138,minikube.k8s.io/name=minikube,minikube.k8s.io/updated_at=2020_09_30T16_32_03_0700,minikube.k8s.io/version=v1.13.1,node-role.kubernetes.io/master=

Tried to debug kafka-utils but didn't find documentations and etc.
Do you have guide how to use kafka-utils?

kafka install failure: Unknown task kind Toggle

To replicate:

Literally follow the runbook exactly

Fresh EKS cluster
k kudo init.
k kudo install zookeeper
k kudo install kafka

Result = install fails.

k kudo plan status --instance=kakfa-instance shows:

Plan(s) for "kafka-instance" in namespace "default":
.
└── kafka-instance (Operator-Version: "kafka-1.3.0" Active-Plan: "deploy")
    ├── Plan cruise-control (serial strategy) [NOT ACTIVE]
    │   └── Phase cruise-addon (serial strategy) [NOT ACTIVE]
    │       └── Step deploy-cruise-control [NOT ACTIVE]
    ├── Plan deploy (serial strategy) [FATAL_ERROR]
    │   ├── Phase deploy-kafka (serial strategy) [COMPLETE]
    │   │   ├── Step generate-tls-certificates [COMPLETE]
    │   │   ├── Step configuration [COMPLETE]
    │   │   ├── Step service [COMPLETE]
    │   │   └── Step app [COMPLETE]
    │   └── Phase addons (parallel strategy) [FATAL_ERROR]
    │       ├── Step monitoring [FATAL_ERROR] (default/kafka-instance fatal error:  failed to build task deploy.addons.monitoring.service-monitor: unknown task kind Toggle)
    │       ├── Step access [PENDING]
    │       ├── Step mirror [PENDING]
    │       └── Step load [PENDING]
    ├── Plan external-access (serial strategy) [NOT ACTIVE]
    │   └── Phase resources (serial strategy) [NOT ACTIVE]
    │       └── Step deploy [NOT ACTIVE]
    ├── Plan kafka-connect (serial strategy) [NOT ACTIVE]
    │   └── Phase deploy-kafka-connect (serial strategy) [NOT ACTIVE]
    │       ├── Step deploy [NOT ACTIVE]
    │       └── Step setup [NOT ACTIVE]
    ├── Plan mirrormaker (serial strategy) [NOT ACTIVE]
    │   └── Phase app (serial strategy) [NOT ACTIVE]
    │       └── Step deploy [NOT ACTIVE]
    ├── Plan not-allowed (serial strategy) [NOT ACTIVE]
    │   └── Phase not-allowed (serial strategy) [NOT ACTIVE]
    │       └── Step not-allowed [NOT ACTIVE]
    ├── Plan service-monitor (serial strategy) [NOT ACTIVE]
    │   └── Phase enable-service-monitor (serial strategy) [NOT ACTIVE]
    │       └── Step deploy [NOT ACTIVE]
    ├── Plan update-instance (serial strategy) [NOT ACTIVE]
    │   └── Phase app (serial strategy) [NOT ACTIVE]
    │       ├── Step conf [NOT ACTIVE]
    │       ├── Step svc [NOT ACTIVE]
    │       └── Step sts [NOT ACTIVE]
    └── Plan user-workload (serial strategy) [NOT ACTIVE]
        └── Phase workload (serial strategy) [NOT ACTIVE]
            └── Step toggle-workload [NOT ACTIVE]

Output of k kudo version is

KUDO Version: version.Info{GitVersion:"0.12.0", GitCommit:"50a43f45", BuildDate:"2020-04-15T19:26:34Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"linux/amd64"}

Some Operator yamls missing specifying namespace

On checking the quick-start deployment for kafka, observed that the templates for kafka missing the namespace (namespace: {{ .Namespace }}) from some of the yamls. Also, observed this on other operator manifest yamls as well.

See below example where namespace is missing from Service Yaml, but exists in Statefulsets Yaml

Statefulsets: https://github.com/kudobuilder/operators/blob/master/repository/kafka/operator/templates/statefulset.yaml#L5

Service: https://github.com/kudobuilder/operators/blob/master/repository/kafka/operator/templates/service.yaml#L4

MySQL example should provide for / allow for replication setups

What would you like to be added:
The ability to create a MySQL installation that leverages either native or Galera-based replication

Why is this needed:
Database replication is a key requirement for high-availability and workload partitioning (read/write nodes, etc).

If we are not going to provide an easy way to leverage native MySQL HA features, we should perhaps explore performance tests on how auto-replacement of db-nodes works vs. something like haproxy + galera.

Kafka Package Verification Warnings

go run cmd/kubectl-kudo/main.go package verify ../operators/repository/kafka/operator/
Warnings
parameter "SSL_ENDPOINT_IDENTIFICATION_ENABLED" defined but not used.
parameter "CUSTOM_SERVER_PROPERTIES" defined but not used.
package is valid

create an operator for confluent schema-registry

RabbitMQ Package Failure

go run cmd/kubectl-kudo/main.go package create --destination ~/repo ../operators/repository/rabbitmq/operator/
Errors
plan "not-allowed" used in parameter "DISK_SIZE" is not defined
Error: found 1 package verification errors
exit status 255

[Framework] MySQL

It would be great to have https://github.com/kudobuilder/kudo/blob/master/config/samples/mysql.yaml as a standalone framework.

Use default instance names in operator documentation

The current documentation sometimes assumes default instance names, and sometimes overrides the default, so commands in the docs don't work when copied verbatim.

For example the installation docs for Kafka override the names with --instance, but Kafka tries to connect to a ZooKeeper named zookeeper-instance, so it never comes up.

In another example, the docs for scaling Kafka sometimes assume the default instance name, and sometimes it is overridden.

My recommendation is to use the default instance names in the docs to keep things simple, and assume default instance names in parameter such as the ZooKeeper host names. I'm happy to make a pass if people agree but wanted to post this first to see if work is already underway.

ci test-4 failing

Ran into this with my elastic pull request. Alena restarted and then it worked. Just logging this to keep track in case we see this again.

https://circleci.com/gh/kudobuilder/operators/593

Document how to build this git repository to a kudo repository

As kubectl kudo package create command can zip a package, what can be using for a whole repository

Flink Demo Package Verify Error

go run cmd/kubectl-kudo/main.go package verify ../operators/repository/flink/docs/demo/financial-fraud/demo-operator/
Errors
parameter "download_url" in template uploader.yaml is not defined
Error: package verification errors: 1
exit status 255

Flink demo 0.1.5 doesn't work with KUDO 0.17

As reported in kudobuilder/kudo#1714, the flink demo fails with KUDO 0.17 because its dependencies don't set mandatory appVersion fields.

MySQL example should allow for / provide for the use of xtrabackup

What would you like to be added:
The MySQL example should allow for / provide for the ability to use xtrabackup as the backup solution instead of mysqldump

Why is this needed:
Once a database grows beyond a certain size, xtrabackup becomes a more useful / performant backup and restore option. mysqldump remains a useful tool and should be readily available, but hot backups are very much a requirement for production workloads.

Change uses of .k8s.io to .kudo.dev

The .k8s.io suffix is reserved for use by the Kubernetes project. Please change your API groups, labels, and annotations to use a domain that you own, such as .kudo.dev.

Work is underway to prevent accidental misuse in the apiserver:
kubernetes/enhancements#1111

We are also adding more easily discoverable documentation regarding this.

Thanks.

MySQL Example not working

When trying to follow https://github.com/maestrosdk/maestro/blob/master/docs/Backups.md and run

maestro-demo $ kubectl apply -f mysql.yaml
framework.maestro.k8s.io/mysql created
frameworkversion.maestro.k8s.io/mysql-57 created
instance.maestro.k8s.io/mysql created

I get the following error message:

maestro-demo $ kubectl logs mysql-mysql-77fcd5d94f-bvqmx
Initializing database
2019-01-29T05:54:00.352589Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2019-01-29T05:54:00.353999Z 0 [ERROR] --initialize specified but the data directory has files in it. Aborting.
2019-01-29T05:54:00.354032Z 0 [ERROR] Aborting

Log:

mysql-mysql-77fcd5d94f-bvqmx   0/1   Pending   0     0s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Pending   0     0s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Pending   0     1s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Pending   0     3s
mysql-mysql-77fcd5d94f-bvqmx   0/1   ContainerCreating   0     3s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Error   0     14s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Error   1     15s
mysql-mysql-77fcd5d94f-bvqmx   0/1   CrashLoopBackOff   1     16s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Error   2     30s
mysql-mysql-77fcd5d94f-bvqmx   0/1   CrashLoopBackOff   2     38s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Error   3     54s
mysql-mysql-77fcd5d94f-bvqmx   0/1   CrashLoopBackOff   3     58s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Error   4     104s
mysql-mysql-77fcd5d94f-bvqmx   0/1   CrashLoopBackOff   4     105s

advertised.listeners is not set EXTERNAL_INGRESS when Kafka instance with external NodePort service exposed

Hi guys,

reference the doc: https://github.com/kudobuilder/operators/blob/master/repository/kafka/docs/latest/external-access.md
I installed a Kafka instance with external NodePort type service exposed in my k8s cluster.

-p EXTERNAL_ADVERTISED_LISTENER=true \
-p EXTERNAL_ADVERTISED_LISTENER_TYPE=NodePort \

I exec to the kafka pod inside and check the server.properties as below:

kafka@kfk-x102-kafka-0:/opt/kafka$ cat server.properties | grep listen
listeners=INTERNAL://0.0.0.0:9093,CLIENT://0.0.0.0:9092
advertised.listeners=INTERNAL://kfk-x102-kafka-0.kfk-x102-svc.x102.svc.cluster.local:9093,CLIENT://kfk-x102-kafka-0.kfk-x102-svc.x102.svc.cluster.local:9092
listener.security.protocol.map=INTERNAL:PLAINTEXT,CLIENT:PLAINTEXT
inter.broker.listener.name=INTERNAL

Expected:
EXTERNAL_INGRESS is configured as below(example from above reference):

advertised.listeners=INTERNAL://kafka-kafka-2.kafka-svc.default.svc.cluster.local:9093,EXTERNAL_INGRESS://34.214.27.71:30904

Any mistake or misconfiguration I make here? could you please help.
Thanks a lot!

Elastic operator does not work with current KUDO

Noticed when restructuring docs in #226.

This might be due to the fact that KUDO 0.10.x no longer prefixes pod names with instance name.

[root@master-0 elasticsearch]# curl coordinator-0.coordinator-hs:9200/_cluster/health?pretty
curl: (6) Could not resolve host: coordinator-0.coordinator-hs; Unknown error
[root@master-0 elasticsearch]# curl coordinator-hs:9200/_cluster/health?pretty
{
  "error" : {
    "root_cause" : [
      {
        "type" : "master_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}
[root@master-0 elasticsearch]#

Working Cassandra example.

Can't deploy Kafka 0.1.1 with KUDO controller 0.5.0

I wan't to test the upgrade from one version of the Kafka Operator (0.1.1) to another (0.2.0), but the deployment of the version 0.1.1 doesn't work.
I can deploy the version 0.2.0 without any issue.

Demo in blog "Running Apache Flink on Kubernetes with KUDO" did not work

I'm following the steps in the flink blog Running Apache Flink on Kubernetes with KUDO,
after execute the command kubectl kudo plan status --instance flink-demo3, the output said that "No plan ever run for instance - nothing to show for instance flink-demo3"

I'v installed from the local yaml file using the command kubectl kudo install repository/flink/docs/demo/financial-fraud/demo-operator --instance flink-demo3
and the command outputed

repository/flink/docs/demo/financial-fraud/demo-operator is a local file package
instance.kudo.dev/v1beta1/flink-demo3 created

anybody can help me with this?

kubectl version

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:49Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

minikube version

minikube version: v1.6.1
commit: 42a9df4854dcea40ec187b6b8f9a910c6038f81a

kubectl kudo version

KUDO Version: version.Info{GitVersion:"0.9.0", GitCommit:"9981ed2b", BuildDate:"2019-12-17T17:27:52Z", GoVersion:"go1.13", Compiler:"gc", Platform:"darwin/amd64"}

add cruise-control integration for Kafka

Integration with https://github.com/linkedin/cruise-control
Which out of the box connects to the KUDO Kafka instance

e2e test fail with kind when operator is using pipe tasks

This PR #201 is failing with

        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:26 +0000 UTC	Normal	Pulling	Pulling image "mesosphere/kafka:1.1.0-2.4.0"
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:26 +0000 UTC	Normal	Killing	Stopping container kubernetes-zookeeper
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:37 +0000 UTC	Normal	Pulled	Successfully pulled image "mesosphere/kafka:1.1.0-2.4.0"
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:37 +0000 UTC	Normal	Created	Created container init
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:37 +0000 UTC	Normal	Started	Started container init
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:41 +0000 UTC	Normal	Pulling	Pulling image "busybox"
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:41 +0000 UTC	Normal	Pulled	Successfully pulled image "busybox"
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:41 +0000 UTC	Normal	Created	Created container waiter
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:41 +0000 UTC	Normal	Started	Started container waiter
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:43 +0000 UTC	Warning	PipeTaskError	Error during execution: fatal error: kudo-test-composed-krill/kafka failed in deploy.deploy-kafka.generate-tls-certificates.generate-tls-certificates: failed to fetch cluster REST config: could not locate a kubeconfig
        logger.go:37: 08:05:29 | kafka-upgrade-test | Deleting namespace: kudo-test-composed-krill

The tests are passing when using a live cluster. KUDO controller is trying to read the file from default kubeconfig location rather than the kubeconfig created by kind.
Also when tried to export KUBECONFIG inside makefile, KUDO is failing to read the kubeconfig file.

ZooKeeper fails because of catch-22 DNS lookup against pods which are not running.

The bootstrap script creates the /conf/zoo.cfg with server.n entries, however the pods are yet to be running so k8s will never create endpoints and thus the DNS will not resolve and the pods will be stuck in CrashLoopBackOFF.

As the Zookeeper startup will never resolve the other servers it will crash loop.

It is my understanding that Services/Endpoints are updated when the Pods are running.

ZooKeeper pods are stuck

❯ k get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-77cfbb848f-8qdh8 1/1 Running 3 27h
nginx-deployment-77cfbb848f-t7gtj 1/1 Running 3 27h
zookeeper-instance-zookeeper-0 0/1 CrashLoopBackOff 6 14m
zookeeper-instance-zookeeper-2 0/1 CrashLoopBackOff 6 14m
zookeeper-instance-zookeeper-1 0/1 CrashLoopBackOff 6 14m

Logs for pod
❯ k logs zookeeper-instance-zookeeper-0 | more
Zookeeper configuration...

clientPort=2181
dataDir=/var/lib/zookeeper/data
dataLogDir=/logs
tickTime=2000
initLimit=10
syncLimit=5
maxClientCnxns=60
minSessionTimeout=4000
maxSessionTimeout=40000
autopurge.snapRetainCount=3
autopurge.purgeInteval=12

server.1=zookeeper-instance-zookeeper-0.zookeeper-instance-hs.default.svc.cluster.local:2888:3888
server.2=zookeeper-instance-zookeeper-1.zookeeper-instance-hs.default.svc.cluster.local:2888:3888
server.3=zookeeper-instance-zookeeper-2.zookeeper-instance-hs.default.svc.cluster.local:2888:3888
Creating ZooKeeper log4j configuration
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
"2020-11-19 15:46:30,107 [myid:] - INFO [main:QuorumPeerConfig@133] - Reading configuration from: /conf/zoo.cfg
""2020-11-19 15:46:30,112 [myid:] - INFO [main:QuorumPeerConfig@385] - clientPortAddress is 0.0.0.0/0.0.0.0:2181
""2020-11-19 15:46:30,112 [myid:] - INFO [main:QuorumPeerConfig@389] - secureClientPort is not set
""2020-11-19 15:46:30,125 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
""2020-11-19 15:46:30,126 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
""2020-11-19 15:46:30,126 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
""2020-11-19 15:46:30,127 [myid:1] - INFO [main:ManagedUtil@46] - Log4j found with jmx enabled.
""2020-11-19 15:46:30,135 [myid:1] - INFO [main:QuorumPeerMain@141] - Starting quorum peer
""2020-11-19 15:46:30,142 [myid:1] - INFO [main:ServerCnxnFactory@135] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
""2020-11-19 15:46:30,143 [myid:1] - INFO [main:NIOServerCnxnFactory@673] - Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 2 worker threads, and 64
kB direct buffers.
""2020-11-19 15:46:30,182 [myid:1] - INFO [main:NIOServerCnxnFactory@686] - binding to port 0.0.0.0/0.0.0.0:2181
""2020-11-19 15:46:30,207 [myid:1] - INFO [main:Log@193] - Logging initialized @465ms to org.eclipse.jetty.util.log.Slf4jLog
""2020-11-19 15:46:30,387 [myid:1] - WARN [main:ContextHandler@1588] - o.e.j.s.ServletContextHandler@29ca901e{/,null,UNAVAILABLE} contextPath ends with /*
""2020-11-19 15:46:30,387 [myid:1] - WARN [main:ContextHandler@1599] - Empty contextPath
""2020-11-19 15:46:30,398 [myid:1] - INFO [main:X509Util@79] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
""2020-11-19 15:46:30,401 [myid:1] - INFO [main:FileTxnSnapLog@103] - zookeeper.snapshot.trust.empty : false
""2020-11-19 15:46:30,407 [myid:1] - INFO [main:QuorumPeer@1488] - Local sessions disabled
""2020-11-19 15:46:30,407 [myid:1] - INFO [main:QuorumPeer@1499] - Local session upgrading disabled
""2020-11-19 15:46:30,407 [myid:1] - INFO [main:QuorumPeer@1466] - tickTime set to 2000
""2020-11-19 15:46:30,407 [myid:1] - INFO [main:QuorumPeer@1510] - minSessionTimeout set to 4000
""2020-11-19 15:46:30,408 [myid:1] - INFO [main:QuorumPeer@1521] - maxSessionTimeout set to 40000
""2020-11-19 15:46:30,408 [myid:1] - INFO [main:QuorumPeer@1536] - initLimit set to 10
""2020-11-19 15:46:30,418 [myid:1] - INFO [main:ZKDatabase@117] - zookeeper.snapshotSizeFactor = 0.33
""2020-11-19 15:46:30,419 [myid:1] - INFO [main:QuorumPeer@1781] - Using insecure (non-TLS) quorum communication
""2020-11-19 15:46:30,419 [myid:1] - INFO [main:QuorumPeer@1787] - Port unification disabled
""2020-11-19 15:46:30,420 [myid:1] - INFO [main:QuorumPeer@2154] - QuorumPeer communication is not secured! (SASL auth disabled)
""2020-11-19 15:46:30,420 [myid:1] - INFO [main:QuorumPeer@2183] - quorum.cnxn.threads.size set to 20
""2020-11-19 15:46:30,421 [myid:1] - INFO [main:FileSnap@83] - Reading snapshot /var/lib/zookeeper/data/version-2/snapshot.0
""2020-11-19 15:46:30,481 [myid:1] - INFO [main:Server@370] - jetty-9.4.17.v20190418; built: 2019-04-18T19:45:35.259Z; git: aa1c656c315c011c01e7b21aabb04066635b9f67; jvm 1.8.0_242-b08
""2020-11-19 15:46:30,522 [myid:1] - INFO [main:DefaultSessionIdManager@365] - DefaultSessionIdManager workerName=node0
""2020-11-19 15:46:30,523 [myid:1] - INFO [main:DefaultSessionIdManager@370] - No SessionScavenger set, using defaults
""2020-11-19 15:46:30,525 [myid:1] - INFO [main:HouseKeeper@149] - node0 Scavenging every 660000ms
""2020-11-19 15:46:30,535 [myid:1] - INFO [main:ContextHandler@855] - Started o.e.j.s.ServletContextHandler@29ca901e{/,null,AVAILABLE}
""2020-11-19 15:46:30,582 [myid:1] - INFO [main:AbstractConnector@292] - Started ServerConnector@58651fd0{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
""2020-11-19 15:46:30,582 [myid:1] - INFO [main:Server@410] - Started @846ms
""2020-11-19 15:46:30,583 [myid:1] - INFO [main:JettyAdminServer@112] - Started AdminServer on address 0.0.0.0, port 8080 and command URL /commands
""2020-11-19 15:46:30,586 [myid:1] - INFO [main:QuorumCnxManager$Listener@861] - Election port bind maximum retries is 3
""2020-11-19 15:46:30,587 [myid:1] - INFO [QuorumPeerListener:QuorumCnxManager$Listener@911] - My election bind port: zookeeper-instance-zookeeper-0.zookeeper-instance-hs.default.svc.cluster.local/1
0.1.190.40:3888
""2020-11-19 15:46:30,595 [myid:1] - INFO [QuorumPeermyid=1(secure=disabled):QuorumPeer@1193] - LOOKING
""2020-11-19 15:46:30,596 [myid:1] - INFO [QuorumPeermyid=1(secure=disabled):FastLeaderElection@885] - New election. My id = 1, proposed zxid=0x0
""2020-11-19 15:46:30,598 [myid:1] - WARN [WorkerSender[myid=1]:QuorumPeer$QuorumServer@196] - Failed to resolve address: zookeeper-instance-zookeeper-1.zookeeper-instance-hs.default.svc.cluster.loc
al
"java.net.UnknownHostException: zookeeper-instance-zookeeper-1.zookeeper-instance-hs.default.svc.cluster.local
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)
at org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:764)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)
at java.lang.Thread.run(Thread.java:748)

Describe service
k describe services zookeeper-instance-hs
Name: zookeeper-instance-hs
Namespace: default
Labels: app=zookeeper
heritage=kudo
kudo.dev/instance=zookeeper-instance
kudo.dev/operator=zookeeper
zookeeper=zookeeper-instance
Annotations: kudo.dev/last-applied-configuration:
{"kind":"Service","apiVersion":"v1","metadata":{"name":"zookeeper-instance-hs","namespace":"default","creationTimestamp":null,"labels":{"a...
kudo.dev/last-plan-execution-uid: d585c0bb-cf9f-403d-91db-b08ea5f334e9
kudo.dev/phase: zookeeper
kudo.dev/plan: deploy
kudo.dev/step: deploy
Selector: app=zookeeper,instance=zookeeper-instance
Type: ClusterIP
IP: None
Port: server 2888/TCP
TargetPort: 2888/TCP
Endpoints:
Port: leader-election 3888/TCP
TargetPort: 3888/TCP
Endpoints:
Session Affinity: None
Events:

[KUDO C*] Allow external access to Cassandra cluster

Provide a service that allows access to Cassandra from outside the cluster
Add Integration Test
Documentation

New Repo Structure

What would you like to be added:

I'd like to add a new more flatter structure that reflects more changes for KUDO 0.2.0 that will break the current structure. Breaking it should be fine as we are still under 1.0.0.

Proposed structure:

/.github
/.circleci
/docs
/hack
/repository
/repository/index.yaml
/repository/kafka
/repository/kafka/0.2.0
/repository/kafka/0.2.0/kafka-framework.yaml
/repository/kafka/0.2.0/kafka-frameworkversion.yaml
/repository/kafka/0.2.0/kafka-instance.yaml
/repository/kafka/0.2.0/metadata.yaml
/repository/kafka/0.1.0
/repository/kafka/docs
/repository/kafka/docs/README.md
/repository/kafka/docs/Demo.md
/repository/kafka/tests
/repository/kafka/tests/foobar.yaml
/repository/kafka/tests/0.2.0/bar.yaml
/repository/kafka/tests/0.1.0/foo.yaml
/repository/zookeeper
/README.md

Why is this needed:

We need to change this repo structure so it will work with changes coming from KEP-0008, KEP-0009 and KEP-0010. This also relates to:

enable custom TLS for Cassandra

this feature is to allow users to enable TLS in Cassandra based on the user-provided certificate.

This should follow the same approach as KUDO Kafka where users can provide a TLS secret to the operator

Rename for Kudo

s/maestro/kudo/g

MySQL example documentation does not show results of queries executed

What happened: The MySQL example documentation does not show the results of certain queries, which is necessary to properly demonstrate intended functionality / performance.

What you expected to happen:
That the documentation should include the output of the following commands / queries:
kubectl exec -it $MYSQL_POD -- mysql -ppassword -e "show tables;" kudo
kubectl exec -it $MYSQL_POD -- mysql -ppassword -e "select * from example;" kudo (for pre-deletion, post-deletion, and post-restore)

How to reproduce it (as minimally and precisely as possible):
Examine docs: https://github.com/kudobuilder/kudo/blob/master/docs/examples/backups.md

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
Kudo version (use kubectl kudo version):
Framework:
Frameworkversion:
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

NodePort allocate failed since conflict occur when multiple Kafka instances with external NodePort services expose

Hi teams,
When I use Kafka operator to install multiple Kafka instances in the same k8s cluster and all Kafka with external NodePort type services exposed, the nodePorts maybe conflicted assignment here.
e.g.

kafka-1 with 3 brokers and external expose with 3 nodePorts: 30902, 30903, 30904
kafka-2 with 3 brokers and external expose with 3 nodePorts the same: 30902, 30903, 30904
that's the conflict assignment.

Check the plan status of instance, you can get the error similar as "Invalid value: 30902: provided port is already allocated".

I know there is a parameter "EXTERNAL_NODE_PORT" to specify the port, but it's a starting value and operator will count the other ports according to brokers count, it also have the chance to get the used ports then install Kafka failed.

Cloud you please help. Thanks!

Adding videos to demos

We should add a video of a demo to the corresponding README.md of a demo, e.g. here.

Determine community operator policy

We have a few operator PRs out right now - #141 and #152 immediately come to mind, that raise some questions.

What is the policy for our primary operator repository on the contents of this repository and what that implies as far as ownership and support? Is the maintainer listed in the operator.yaml the contact for issue reporting and PRs? Is the KUDO team? Someone else? What is allowed into the operators repo? What is not? What gets tested?

Helm has a large chart repo and not all charts are actively maintained. They're even moving to a more federated approach instead of having a single repository of all charts.

This policy needs to account for all operators, including ones that the KUDO team spend their time on.

add built-in kafka mirror-maker integration

add integration for kafka mirror-maker to enable users to do miragtion in/out of KUDO Kafka cluster

Kafka readiness probe timeout is too short

The kafka readiness probe timeout is set to 1s, which means kafka is never marked ready in my cluster as the readiness probe takes 1.5s to respond:

kafka@kafka-kafka-0:/$ time /opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server=localhost:9093

kafka-kafka-0.kafka-svc.default.svc.cluster.local:9093 (id: 0 rack: null) -> (
	Produce(0): 0 to 2 [usable: 2],
	Fetch(1): 0 to 3 [usable: 3],
	Offsets(2): 0 to 1 [usable: 1],
	Metadata(3): 0 to 2 [usable: 2],
	LeaderAndIsr(4): 0 [usable: 0],
	StopReplica(5): 0 [usable: 0],
	UpdateMetadata(6): 0 to 3 [usable: 3],
	ControlledShutdown(7): 1 [usable: 1],
	OffsetCommit(8): 0 to 2 [usable: 2],
	OffsetFetch(9): 0 to 2 [usable: 2],
	GroupCoordinator(10): 0 [usable: 0],
	JoinGroup(11): 0 to 1 [usable: 1],
	Heartbeat(12): 0 [usable: 0],
	LeaveGroup(13): 0 [usable: 0],
	SyncGroup(14): 0 [usable: 0],
	DescribeGroups(15): 0 [usable: 0],
	ListGroups(16): 0 [usable: 0],
	SaslHandshake(17): 0 [usable: 0],
	ApiVersions(18): 0 [usable: 0],
	CreateTopics(19): 0 to 1 [usable: 1],
	DeleteTopics(20): 0 [usable: 0]
)

real	0m1.599s
user	0m0.952s
sys	0m0.216s
kafka@kafka-kafka-0:/$

Error:

  Warning  Unhealthy               43s (x61 over 10m)  kubelet, ip-10-0-129-230.us-west-2.compute.internal  Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: timeout 1s exceeded

[KUDO C*] Add integration tests for CPU/Mem settings

Add tests to verify CPU and Mem settings work correctly

run tests for operators which are relevant to the changes in the PR

right now we are running some tests for any change in the repository. Those tests are for just for zookeeper operator. In case we change some other operator logic, these tests are irrelevant.
we should only run the tests related to that operator or no tests in case we don't have any for that operator. We need to ensure that:

CI run tests that are relevant to the changes done in the PR
PR status provide feedback on which tests are being run
Provide documentation/examples on how to add tests for an operator

In all cases, the PR checks should provide information around which tests have been run for a certain PR. To enable users to write tests for their own operators that can live in this repository.