Coder Social home page Coder Social logo

kudobuilder / operators Goto Github PK

View Code? Open in Web Editor NEW
228.0 17.0 79.0 4.7 MB

Collection of Kubernetes Operators built with KUDO.

Home Page: https://kudo.dev

License: Apache License 2.0

Dockerfile 3.36% Shell 72.05% Makefile 24.59%
kubernetes cncf kubernetes-operator kubernetes-controller kafka mysql redis crd zookeeper elasticsearch

operators's Introduction

Operators

Collection of KUDO operators.

This is a set of KUDO operators that were developed as first community operators by the KUDO community. In the future this should be ideally split into its own repositories maintained by the operator maintainers themselves.

This is not home of all KUDO community operators, that's (operators-index)(https://github.com/kudobuilder/operators-index).

Apache Cassandra is an open-source distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

The Kafka REST Proxy provides a RESTful interface to a Kafka cluster. It makes it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. Examples of use cases include reporting data to Kafka from any frontend app built in any language, ingesting messages into a stream processing framework that doesn't yet support Kafka, and scripting administrative actions.

Schema Registry provides a serving layer for your metadata. It provides a RESTful interface for storing and retrieving Avro schemas. It stores a versioned history of all schemas, provides multiple compatibility settings and allows evolution of schemas according to the configured compatibility setting. It provides serializers that plug into Kafka clients that handle schema storage and retrieval for Kafka messages that are sent in the Avro format.

The KUDO cowsay operator is a small demo for the KUDO Pipe-Tasks.

Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

This is an example operator as described in the KUDO documentation.

Apache Flink® provides stateful computations over data streams.

Apache Kafka® is a distributed streaming platform.

MySQL is an open-source relational database management system.

RabbitMQ is the most widely deployed open source message broker. The KUDO RabbitMQ Operator makes it easy to deploy and manage RabbitMQ on Kubernetes.

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

Apache Spark® is a unified analytics engine for large-scale data processing.

Apache ZooKeeper is an open-source server which enables highly reliable distributed coordination.

operators's People

Contributors

akirillov avatar alembiewski avatar alenkacz avatar alexeygorobets avatar aneumann82 avatar davidkarlsen avatar edgarlanting avatar fabianbaier avatar gallardot avatar gerred avatar gkleiman avatar harryge00 avatar harsh-98 avatar jbarrick-mesosphere avatar jieyu avatar kaiwalyajoshi avatar kensipe avatar kwehden avatar linzhaoming avatar porridge avatar realmbgl avatar rishabh96b avatar runyontr avatar shubhanilbag avatar tbaums avatar vibhujain avatar xinxin2015 avatar y0psolo avatar yankcrime avatar zmalik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

operators's Issues

[kafka] move away from gcr.io/google-samples image

We are using a gcr.io/google-samples images for kafka operator.

The gcr.io/google-samples is a stalled image and there are more issues to that image other than being stalled.

ref: helm/charts#4540

We should replace the image with a community supported image that is as close to apache/kafka configuration and without tweaks done for a specific platform.

CICD Pipeline to publish repo

What would you like to be added:

We should start thinking about for future releases how Frameworks can be updated and published to our official repo and Google Cloud Storage. E.g. how new PRs can be tested with CircleCI and how merges into master will be published to our hosted bucket.

Why is this needed:

Right now this is a manual process and fairly slow. Automating this has all known benefits of making it more stable, predictable, etc..

This also relates to:

Document how to trigger an operator release

Right now when doing an operator release, it's a manual process i.e. asking KUDO engineers to do a release of an operator.

We should document how operator developers can actually kickstart a release process for their operator.

Specify StorageClass parameter while deploying Elastic Operator

While installing kudo based Operator for Elastic, it should have the provision to change the StorageClass. In my GKE deployment, it used standard StorageClass. Ideally, this parameter should be configurable which is already done for other Kudo based operator such as Kafka.

Spark Package Verify Warnings

go run cmd/kubectl-kudo/main.go package verify ../operators/repository/spark/operator/
Warnings
template "spark-applications-crds.yaml" is not used as a resource
template "webhook-cleanup-job.yaml" is not used as a resource
package is valid

Update or remove outdated demos

Demos are broken or outdated. E.g. the Flink financial fraud demo is supposed to use KUDO 0.7 according to its README. But the operator only works on KUDO 0.8 though, while depending on packages that only work on KUDO 0.7.
The Flink modifications demo doesn't use operator packages at all.

These demos need to be updated or removed.

Kafka updating properties doesn't work

When I tried to enable external access on minikube I've got error in kafka console

INFO[0000] Running kafka-utils... INFO[0000] kubeconfig file: using InClusterConfig. INFO[0000] kubernetes client configured. INFO[0000] Running kafka-utils... INFO[0000] Checking the service created for kafka-kafka-0 INFO[0000] detected NodePort ERRO[0000] could not run the kafka utils bootstrap: no node found with name 'minikube'

I checked nodes in k8s. k8s has minikube node

kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS minikube Ready master 44h v1.19.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=minikube,kubernetes.io/os=linux,minikube.k8s.io/commit=1fd1f67f338cbab4b3e5a6e4c71c551f522ca138,minikube.k8s.io/name=minikube,minikube.k8s.io/updated_at=2020_09_30T16_32_03_0700,minikube.k8s.io/version=v1.13.1,node-role.kubernetes.io/master=

Tried to debug kafka-utils but didn't find documentations and etc.
Do you have guide how to use kafka-utils?

kafka install failure: Unknown task kind Toggle

To replicate:

Literally follow the runbook exactly

  1. Fresh EKS cluster
  2. k kudo init.
  3. k kudo install zookeeper
  4. k kudo install kafka

Result = install fails.

k kudo plan status --instance=kakfa-instance shows:

Plan(s) for "kafka-instance" in namespace "default":
.
└── kafka-instance (Operator-Version: "kafka-1.3.0" Active-Plan: "deploy")
    ├── Plan cruise-control (serial strategy) [NOT ACTIVE]
    │   └── Phase cruise-addon (serial strategy) [NOT ACTIVE]
    │       └── Step deploy-cruise-control [NOT ACTIVE]
    ├── Plan deploy (serial strategy) [FATAL_ERROR]
    │   ├── Phase deploy-kafka (serial strategy) [COMPLETE]
    │   │   ├── Step generate-tls-certificates [COMPLETE]
    │   │   ├── Step configuration [COMPLETE]
    │   │   ├── Step service [COMPLETE]
    │   │   └── Step app [COMPLETE]
    │   └── Phase addons (parallel strategy) [FATAL_ERROR]
    │       ├── Step monitoring [FATAL_ERROR] (default/kafka-instance fatal error:  failed to build task deploy.addons.monitoring.service-monitor: unknown task kind Toggle)
    │       ├── Step access [PENDING]
    │       ├── Step mirror [PENDING]
    │       └── Step load [PENDING]
    ├── Plan external-access (serial strategy) [NOT ACTIVE]
    │   └── Phase resources (serial strategy) [NOT ACTIVE]
    │       └── Step deploy [NOT ACTIVE]
    ├── Plan kafka-connect (serial strategy) [NOT ACTIVE]
    │   └── Phase deploy-kafka-connect (serial strategy) [NOT ACTIVE]
    │       ├── Step deploy [NOT ACTIVE]
    │       └── Step setup [NOT ACTIVE]
    ├── Plan mirrormaker (serial strategy) [NOT ACTIVE]
    │   └── Phase app (serial strategy) [NOT ACTIVE]
    │       └── Step deploy [NOT ACTIVE]
    ├── Plan not-allowed (serial strategy) [NOT ACTIVE]
    │   └── Phase not-allowed (serial strategy) [NOT ACTIVE]
    │       └── Step not-allowed [NOT ACTIVE]
    ├── Plan service-monitor (serial strategy) [NOT ACTIVE]
    │   └── Phase enable-service-monitor (serial strategy) [NOT ACTIVE]
    │       └── Step deploy [NOT ACTIVE]
    ├── Plan update-instance (serial strategy) [NOT ACTIVE]
    │   └── Phase app (serial strategy) [NOT ACTIVE]
    │       ├── Step conf [NOT ACTIVE]
    │       ├── Step svc [NOT ACTIVE]
    │       └── Step sts [NOT ACTIVE]
    └── Plan user-workload (serial strategy) [NOT ACTIVE]
        └── Phase workload (serial strategy) [NOT ACTIVE]
            └── Step toggle-workload [NOT ACTIVE]

Output of k kudo version is

KUDO Version: version.Info{GitVersion:"0.12.0", GitCommit:"50a43f45", BuildDate:"2020-04-15T19:26:34Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"linux/amd64"}

Some Operator yamls missing specifying namespace

On checking the quick-start deployment for kafka, observed that the templates for kafka missing the namespace (namespace: {{ .Namespace }}) from some of the yamls. Also, observed this on other operator manifest yamls as well.

See below example where namespace is missing from Service Yaml, but exists in Statefulsets Yaml

Statefulsets: https://github.com/kudobuilder/operators/blob/master/repository/kafka/operator/templates/statefulset.yaml#L5

Service: https://github.com/kudobuilder/operators/blob/master/repository/kafka/operator/templates/service.yaml#L4

MySQL example should provide for / allow for replication setups

What would you like to be added:
The ability to create a MySQL installation that leverages either native or Galera-based replication

Why is this needed:
Database replication is a key requirement for high-availability and workload partitioning (read/write nodes, etc).

If we are not going to provide an easy way to leverage native MySQL HA features, we should perhaps explore performance tests on how auto-replacement of db-nodes works vs. something like haproxy + galera.

Kafka Package Verification Warnings

go run cmd/kubectl-kudo/main.go package verify ../operators/repository/kafka/operator/
Warnings
parameter "SSL_ENDPOINT_IDENTIFICATION_ENABLED" defined but not used.
parameter "CUSTOM_SERVER_PROPERTIES" defined but not used.
package is valid

RabbitMQ Package Failure

go run cmd/kubectl-kudo/main.go package create --destination ~/repo ../operators/repository/rabbitmq/operator/
Errors
plan "not-allowed" used in parameter "DISK_SIZE" is not defined
Error: found 1 package verification errors
exit status 255

Use default instance names in operator documentation

The current documentation sometimes assumes default instance names, and sometimes overrides the default, so commands in the docs don't work when copied verbatim.

For example the installation docs for Kafka override the names with --instance, but Kafka tries to connect to a ZooKeeper named zookeeper-instance, so it never comes up.

In another example, the docs for scaling Kafka sometimes assume the default instance name, and sometimes it is overridden.

My recommendation is to use the default instance names in the docs to keep things simple, and assume default instance names in parameter such as the ZooKeeper host names. I'm happy to make a pass if people agree but wanted to post this first to see if work is already underway.

Flink Demo Package Verify Error

go run cmd/kubectl-kudo/main.go package verify ../operators/repository/flink/docs/demo/financial-fraud/demo-operator/
Errors
parameter "download_url" in template uploader.yaml is not defined
Error: package verification errors: 1
exit status 255

MySQL example should allow for / provide for the use of xtrabackup

What would you like to be added:
The MySQL example should allow for / provide for the ability to use xtrabackup as the backup solution instead of mysqldump

Why is this needed:
Once a database grows beyond a certain size, xtrabackup becomes a more useful / performant backup and restore option. mysqldump remains a useful tool and should be readily available, but hot backups are very much a requirement for production workloads.

Change uses of .k8s.io to .kudo.dev

The .k8s.io suffix is reserved for use by the Kubernetes project. Please change your API groups, labels, and annotations to use a domain that you own, such as .kudo.dev.

Work is underway to prevent accidental misuse in the apiserver:
kubernetes/enhancements#1111

We are also adding more easily discoverable documentation regarding this.

Thanks.

MySQL Example not working

When trying to follow https://github.com/maestrosdk/maestro/blob/master/docs/Backups.md and run

maestro-demo $ kubectl apply -f mysql.yaml
framework.maestro.k8s.io/mysql created
frameworkversion.maestro.k8s.io/mysql-57 created
instance.maestro.k8s.io/mysql created

I get the following error message:

maestro-demo $ kubectl logs mysql-mysql-77fcd5d94f-bvqmx
Initializing database
2019-01-29T05:54:00.352589Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2019-01-29T05:54:00.353999Z 0 [ERROR] --initialize specified but the data directory has files in it. Aborting.
2019-01-29T05:54:00.354032Z 0 [ERROR] Aborting

Log:

mysql-mysql-77fcd5d94f-bvqmx   0/1   Pending   0     0s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Pending   0     0s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Pending   0     1s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Pending   0     3s
mysql-mysql-77fcd5d94f-bvqmx   0/1   ContainerCreating   0     3s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Error   0     14s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Error   1     15s
mysql-mysql-77fcd5d94f-bvqmx   0/1   CrashLoopBackOff   1     16s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Error   2     30s
mysql-mysql-77fcd5d94f-bvqmx   0/1   CrashLoopBackOff   2     38s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Error   3     54s
mysql-mysql-77fcd5d94f-bvqmx   0/1   CrashLoopBackOff   3     58s
mysql-mysql-77fcd5d94f-bvqmx   0/1   Error   4     104s
mysql-mysql-77fcd5d94f-bvqmx   0/1   CrashLoopBackOff   4     105s

advertised.listeners is not set EXTERNAL_INGRESS when Kafka instance with external NodePort service exposed

Hi guys,

reference the doc: https://github.com/kudobuilder/operators/blob/master/repository/kafka/docs/latest/external-access.md
I installed a Kafka instance with external NodePort type service exposed in my k8s cluster.

-p EXTERNAL_ADVERTISED_LISTENER=true \
-p EXTERNAL_ADVERTISED_LISTENER_TYPE=NodePort \

I exec to the kafka pod inside and check the server.properties as below:

kafka@kfk-x102-kafka-0:/opt/kafka$ cat server.properties | grep listen
listeners=INTERNAL://0.0.0.0:9093,CLIENT://0.0.0.0:9092
advertised.listeners=INTERNAL://kfk-x102-kafka-0.kfk-x102-svc.x102.svc.cluster.local:9093,CLIENT://kfk-x102-kafka-0.kfk-x102-svc.x102.svc.cluster.local:9092
listener.security.protocol.map=INTERNAL:PLAINTEXT,CLIENT:PLAINTEXT
inter.broker.listener.name=INTERNAL

Expected:
EXTERNAL_INGRESS is configured as below(example from above reference):

advertised.listeners=INTERNAL://kafka-kafka-2.kafka-svc.default.svc.cluster.local:9093,EXTERNAL_INGRESS://34.214.27.71:30904

Any mistake or misconfiguration I make here? could you please help.
Thanks a lot!

Elastic operator does not work with current KUDO

Noticed when restructuring docs in #226.

This might be due to the fact that KUDO 0.10.x no longer prefixes pod names with instance name.

[root@master-0 elasticsearch]# curl coordinator-0.coordinator-hs:9200/_cluster/health?pretty
curl: (6) Could not resolve host: coordinator-0.coordinator-hs; Unknown error
[root@master-0 elasticsearch]# curl coordinator-hs:9200/_cluster/health?pretty
{
  "error" : {
    "root_cause" : [
      {
        "type" : "master_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}
[root@master-0 elasticsearch]#

Demo in blog "Running Apache Flink on Kubernetes with KUDO" did not work

I'm following the steps in the flink blog Running Apache Flink on Kubernetes with KUDO,
after execute the command kubectl kudo plan status --instance flink-demo3, the output said that "No plan ever run for instance - nothing to show for instance flink-demo3"

I'v installed from the local yaml file using the command kubectl kudo install repository/flink/docs/demo/financial-fraud/demo-operator --instance flink-demo3
and the command outputed

repository/flink/docs/demo/financial-fraud/demo-operator is a local file package
instance.kudo.dev/v1beta1/flink-demo3 created

anybody can help me with this?

kubectl version

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:49Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

minikube version

minikube version: v1.6.1
commit: 42a9df4854dcea40ec187b6b8f9a910c6038f81a

kubectl kudo version

KUDO Version: version.Info{GitVersion:"0.9.0", GitCommit:"9981ed2b", BuildDate:"2019-12-17T17:27:52Z", GoVersion:"go1.13", Compiler:"gc", Platform:"darwin/amd64"}

e2e test fail with kind when operator is using pipe tasks

This PR #201 is failing with

        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:26 +0000 UTC	Normal	Pulling	Pulling image "mesosphere/kafka:1.1.0-2.4.0"
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:26 +0000 UTC	Normal	Killing	Stopping container kubernetes-zookeeper
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:37 +0000 UTC	Normal	Pulled	Successfully pulled image "mesosphere/kafka:1.1.0-2.4.0"
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:37 +0000 UTC	Normal	Created	Created container init
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:37 +0000 UTC	Normal	Started	Started container init
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:41 +0000 UTC	Normal	Pulling	Pulling image "busybox"
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:41 +0000 UTC	Normal	Pulled	Successfully pulled image "busybox"
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:41 +0000 UTC	Normal	Created	Created container waiter
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:41 +0000 UTC	Normal	Started	Started container waiter
        logger.go:37: 08:05:29 | kafka-upgrade-test | 2020-01-24 08:00:43 +0000 UTC	Warning	PipeTaskError	Error during execution: fatal error: kudo-test-composed-krill/kafka failed in deploy.deploy-kafka.generate-tls-certificates.generate-tls-certificates: failed to fetch cluster REST config: could not locate a kubeconfig
        logger.go:37: 08:05:29 | kafka-upgrade-test | Deleting namespace: kudo-test-composed-krill

The tests are passing when using a live cluster. KUDO controller is trying to read the file from default kubeconfig location rather than the kubeconfig created by kind.
Also when tried to export KUBECONFIG inside makefile, KUDO is failing to read the kubeconfig file.

ZooKeeper fails because of catch-22 DNS lookup against pods which are not running.

The bootstrap script creates the /conf/zoo.cfg with server.n entries, however the pods are yet to be running so k8s will never create endpoints and thus the DNS will not resolve and the pods will be stuck in CrashLoopBackOFF.

As the Zookeeper startup will never resolve the other servers it will crash loop.

It is my understanding that Services/Endpoints are updated when the Pods are running.

server.1=zookeeper-instance-zookeeper-0.zookeeper-instance-hs.default.svc.cluster.local:2888:3888
server.2=zookeeper-instance-zookeeper-1.zookeeper-instance-hs.default.svc.cluster.local:2888:3888
server.3=zookeeper-instance-zookeeper-2.zookeeper-instance-hs.default.svc.cluster.local:2888:3888

ZooKeeper pods are stuck

❯ k get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-77cfbb848f-8qdh8 1/1 Running 3 27h
nginx-deployment-77cfbb848f-t7gtj 1/1 Running 3 27h
zookeeper-instance-zookeeper-0 0/1 CrashLoopBackOff 6 14m
zookeeper-instance-zookeeper-2 0/1 CrashLoopBackOff 6 14m
zookeeper-instance-zookeeper-1 0/1 CrashLoopBackOff 6 14m

Logs for pod
❯ k logs zookeeper-instance-zookeeper-0 | more
Zookeeper configuration...

clientPort=2181
dataDir=/var/lib/zookeeper/data
dataLogDir=/logs
tickTime=2000
initLimit=10
syncLimit=5
maxClientCnxns=60
minSessionTimeout=4000
maxSessionTimeout=40000
autopurge.snapRetainCount=3
autopurge.purgeInteval=12

server.1=zookeeper-instance-zookeeper-0.zookeeper-instance-hs.default.svc.cluster.local:2888:3888
server.2=zookeeper-instance-zookeeper-1.zookeeper-instance-hs.default.svc.cluster.local:2888:3888
server.3=zookeeper-instance-zookeeper-2.zookeeper-instance-hs.default.svc.cluster.local:2888:3888
Creating ZooKeeper log4j configuration
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
"2020-11-19 15:46:30,107 [myid:] - INFO [main:QuorumPeerConfig@133] - Reading configuration from: /conf/zoo.cfg
""2020-11-19 15:46:30,112 [myid:] - INFO [main:QuorumPeerConfig@385] - clientPortAddress is 0.0.0.0/0.0.0.0:2181
""2020-11-19 15:46:30,112 [myid:] - INFO [main:QuorumPeerConfig@389] - secureClientPort is not set
""2020-11-19 15:46:30,125 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
""2020-11-19 15:46:30,126 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
""2020-11-19 15:46:30,126 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
""2020-11-19 15:46:30,127 [myid:1] - INFO [main:ManagedUtil@46] - Log4j found with jmx enabled.
""2020-11-19 15:46:30,135 [myid:1] - INFO [main:QuorumPeerMain@141] - Starting quorum peer
""2020-11-19 15:46:30,142 [myid:1] - INFO [main:ServerCnxnFactory@135] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
""2020-11-19 15:46:30,143 [myid:1] - INFO [main:NIOServerCnxnFactory@673] - Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 2 worker threads, and 64
kB direct buffers.
""2020-11-19 15:46:30,182 [myid:1] - INFO [main:NIOServerCnxnFactory@686] - binding to port 0.0.0.0/0.0.0.0:2181
""2020-11-19 15:46:30,207 [myid:1] - INFO [main:Log@193] - Logging initialized @465ms to org.eclipse.jetty.util.log.Slf4jLog
""2020-11-19 15:46:30,387 [myid:1] - WARN [main:ContextHandler@1588] - o.e.j.s.ServletContextHandler@29ca901e{/,null,UNAVAILABLE} contextPath ends with /*
""2020-11-19 15:46:30,387 [myid:1] - WARN [main:ContextHandler@1599] - Empty contextPath
""2020-11-19 15:46:30,398 [myid:1] - INFO [main:X509Util@79] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
""2020-11-19 15:46:30,401 [myid:1] - INFO [main:FileTxnSnapLog@103] - zookeeper.snapshot.trust.empty : false
""2020-11-19 15:46:30,407 [myid:1] - INFO [main:QuorumPeer@1488] - Local sessions disabled
""2020-11-19 15:46:30,407 [myid:1] - INFO [main:QuorumPeer@1499] - Local session upgrading disabled
""2020-11-19 15:46:30,407 [myid:1] - INFO [main:QuorumPeer@1466] - tickTime set to 2000
""2020-11-19 15:46:30,407 [myid:1] - INFO [main:QuorumPeer@1510] - minSessionTimeout set to 4000
""2020-11-19 15:46:30,408 [myid:1] - INFO [main:QuorumPeer@1521] - maxSessionTimeout set to 40000
""2020-11-19 15:46:30,408 [myid:1] - INFO [main:QuorumPeer@1536] - initLimit set to 10
""2020-11-19 15:46:30,418 [myid:1] - INFO [main:ZKDatabase@117] - zookeeper.snapshotSizeFactor = 0.33
""2020-11-19 15:46:30,419 [myid:1] - INFO [main:QuorumPeer@1781] - Using insecure (non-TLS) quorum communication
""2020-11-19 15:46:30,419 [myid:1] - INFO [main:QuorumPeer@1787] - Port unification disabled
""2020-11-19 15:46:30,420 [myid:1] - INFO [main:QuorumPeer@2154] - QuorumPeer communication is not secured! (SASL auth disabled)
""2020-11-19 15:46:30,420 [myid:1] - INFO [main:QuorumPeer@2183] - quorum.cnxn.threads.size set to 20
""2020-11-19 15:46:30,421 [myid:1] - INFO [main:FileSnap@83] - Reading snapshot /var/lib/zookeeper/data/version-2/snapshot.0
""2020-11-19 15:46:30,481 [myid:1] - INFO [main:Server@370] - jetty-9.4.17.v20190418; built: 2019-04-18T19:45:35.259Z; git: aa1c656c315c011c01e7b21aabb04066635b9f67; jvm 1.8.0_242-b08
""2020-11-19 15:46:30,522 [myid:1] - INFO [main:DefaultSessionIdManager@365] - DefaultSessionIdManager workerName=node0
""2020-11-19 15:46:30,523 [myid:1] - INFO [main:DefaultSessionIdManager@370] - No SessionScavenger set, using defaults
""2020-11-19 15:46:30,525 [myid:1] - INFO [main:HouseKeeper@149] - node0 Scavenging every 660000ms
""2020-11-19 15:46:30,535 [myid:1] - INFO [main:ContextHandler@855] - Started o.e.j.s.ServletContextHandler@29ca901e{/,null,AVAILABLE}
""2020-11-19 15:46:30,582 [myid:1] - INFO [main:AbstractConnector@292] - Started ServerConnector@58651fd0{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
""2020-11-19 15:46:30,582 [myid:1] - INFO [main:Server@410] - Started @846ms
""2020-11-19 15:46:30,583 [myid:1] - INFO [main:JettyAdminServer@112] - Started AdminServer on address 0.0.0.0, port 8080 and command URL /commands
""2020-11-19 15:46:30,586 [myid:1] - INFO [main:QuorumCnxManager$Listener@861] - Election port bind maximum retries is 3
""2020-11-19 15:46:30,587 [myid:1] - INFO [QuorumPeerListener:QuorumCnxManager$Listener@911] - My election bind port: zookeeper-instance-zookeeper-0.zookeeper-instance-hs.default.svc.cluster.local/1
0.1.190.40:3888
""2020-11-19 15:46:30,595 [myid:1] - INFO [QuorumPeermyid=1(secure=disabled):QuorumPeer@1193] - LOOKING
""2020-11-19 15:46:30,596 [myid:1] - INFO [QuorumPeermyid=1(secure=disabled):FastLeaderElection@885] - New election. My id = 1, proposed zxid=0x0
""2020-11-19 15:46:30,598 [myid:1] - WARN [WorkerSender[myid=1]:QuorumPeer$QuorumServer@196] - Failed to resolve address: zookeeper-instance-zookeeper-1.zookeeper-instance-hs.default.svc.cluster.loc
al
"java.net.UnknownHostException: zookeeper-instance-zookeeper-1.zookeeper-instance-hs.default.svc.cluster.local
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)
at org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:764)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)
at java.lang.Thread.run(Thread.java:748)

Describe service
k describe services zookeeper-instance-hs
Name: zookeeper-instance-hs
Namespace: default
Labels: app=zookeeper
heritage=kudo
kudo.dev/instance=zookeeper-instance
kudo.dev/operator=zookeeper
zookeeper=zookeeper-instance
Annotations: kudo.dev/last-applied-configuration:
{"kind":"Service","apiVersion":"v1","metadata":{"name":"zookeeper-instance-hs","namespace":"default","creationTimestamp":null,"labels":{"a...
kudo.dev/last-plan-execution-uid: d585c0bb-cf9f-403d-91db-b08ea5f334e9
kudo.dev/phase: zookeeper
kudo.dev/plan: deploy
kudo.dev/step: deploy
Selector: app=zookeeper,instance=zookeeper-instance
Type: ClusterIP
IP: None
Port: server 2888/TCP
TargetPort: 2888/TCP
Endpoints:
Port: leader-election 3888/TCP
TargetPort: 3888/TCP
Endpoints:
Session Affinity: None
Events:

New Repo Structure

What would you like to be added:

I'd like to add a new more flatter structure that reflects more changes for KUDO 0.2.0 that will break the current structure. Breaking it should be fine as we are still under 1.0.0.

Proposed structure:

/.github
/.circleci
/docs
/hack
/repository
/repository/index.yaml
/repository/kafka
/repository/kafka/0.2.0
/repository/kafka/0.2.0/kafka-framework.yaml
/repository/kafka/0.2.0/kafka-frameworkversion.yaml
/repository/kafka/0.2.0/kafka-instance.yaml
/repository/kafka/0.2.0/metadata.yaml
/repository/kafka/0.1.0
/repository/kafka/docs
/repository/kafka/docs/README.md
/repository/kafka/docs/Demo.md
/repository/kafka/tests
/repository/kafka/tests/foobar.yaml
/repository/kafka/tests/0.2.0/bar.yaml
/repository/kafka/tests/0.1.0/foo.yaml
/repository/zookeeper
/README.md

Why is this needed:

We need to change this repo structure so it will work with changes coming from KEP-0008, KEP-0009 and KEP-0010. This also relates to:

enable custom TLS for Cassandra

this feature is to allow users to enable TLS in Cassandra based on the user-provided certificate.

This should follow the same approach as KUDO Kafka where users can provide a TLS secret to the operator

MySQL example documentation does not show results of queries executed

What happened: The MySQL example documentation does not show the results of certain queries, which is necessary to properly demonstrate intended functionality / performance.

What you expected to happen:
That the documentation should include the output of the following commands / queries:
kubectl exec -it $MYSQL_POD -- mysql -ppassword -e "show tables;" kudo
kubectl exec -it $MYSQL_POD -- mysql -ppassword -e "select * from example;" kudo (for pre-deletion, post-deletion, and post-restore)

How to reproduce it (as minimally and precisely as possible):
Examine docs: https://github.com/kudobuilder/kudo/blob/master/docs/examples/backups.md

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Kudo version (use kubectl kudo version):
  • Framework:
  • Frameworkversion:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

NodePort allocate failed since conflict occur when multiple Kafka instances with external NodePort services expose

Hi teams,
When I use Kafka operator to install multiple Kafka instances in the same k8s cluster and all Kafka with external NodePort type services exposed, the nodePorts maybe conflicted assignment here.
e.g.

  • kafka-1 with 3 brokers and external expose with 3 nodePorts: 30902, 30903, 30904
  • kafka-2 with 3 brokers and external expose with 3 nodePorts the same: 30902, 30903, 30904
    that's the conflict assignment.

Check the plan status of instance, you can get the error similar as "Invalid value: 30902: provided port is already allocated".

I know there is a parameter "EXTERNAL_NODE_PORT" to specify the port, but it's a starting value and operator will count the other ports according to brokers count, it also have the chance to get the used ports then install Kafka failed.

Cloud you please help. Thanks!

Determine community operator policy

We have a few operator PRs out right now - #141 and #152 immediately come to mind, that raise some questions.

What is the policy for our primary operator repository on the contents of this repository and what that implies as far as ownership and support? Is the maintainer listed in the operator.yaml the contact for issue reporting and PRs? Is the KUDO team? Someone else? What is allowed into the operators repo? What is not? What gets tested?

Helm has a large chart repo and not all charts are actively maintained. They're even moving to a more federated approach instead of having a single repository of all charts.

This policy needs to account for all operators, including ones that the KUDO team spend their time on.

Kafka readiness probe timeout is too short

The kafka readiness probe timeout is set to 1s, which means kafka is never marked ready in my cluster as the readiness probe takes 1.5s to respond:

kafka@kafka-kafka-0:/$ time /opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server=localhost:9093

kafka-kafka-0.kafka-svc.default.svc.cluster.local:9093 (id: 0 rack: null) -> (
	Produce(0): 0 to 2 [usable: 2],
	Fetch(1): 0 to 3 [usable: 3],
	Offsets(2): 0 to 1 [usable: 1],
	Metadata(3): 0 to 2 [usable: 2],
	LeaderAndIsr(4): 0 [usable: 0],
	StopReplica(5): 0 [usable: 0],
	UpdateMetadata(6): 0 to 3 [usable: 3],
	ControlledShutdown(7): 1 [usable: 1],
	OffsetCommit(8): 0 to 2 [usable: 2],
	OffsetFetch(9): 0 to 2 [usable: 2],
	GroupCoordinator(10): 0 [usable: 0],
	JoinGroup(11): 0 to 1 [usable: 1],
	Heartbeat(12): 0 [usable: 0],
	LeaveGroup(13): 0 [usable: 0],
	SyncGroup(14): 0 [usable: 0],
	DescribeGroups(15): 0 [usable: 0],
	ListGroups(16): 0 [usable: 0],
	SaslHandshake(17): 0 [usable: 0],
	ApiVersions(18): 0 [usable: 0],
	CreateTopics(19): 0 to 1 [usable: 1],
	DeleteTopics(20): 0 [usable: 0]
)

real	0m1.599s
user	0m0.952s
sys	0m0.216s
kafka@kafka-kafka-0:/$ 

Error:

  Warning  Unhealthy               43s (x61 over 10m)  kubelet, ip-10-0-129-230.us-west-2.compute.internal  Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: timeout 1s exceeded

run tests for operators which are relevant to the changes in the PR

right now we are running some tests for any change in the repository. Those tests are for just for zookeeper operator. In case we change some other operator logic, these tests are irrelevant.
we should only run the tests related to that operator or no tests in case we don't have any for that operator. We need to ensure that:

  • CI run tests that are relevant to the changes done in the PR
  • PR status provide feedback on which tests are being run
  • Provide documentation/examples on how to add tests for an operator

In all cases, the PR checks should provide information around which tests have been run for a certain PR. To enable users to write tests for their own operators that can live in this repository.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.