Coder Social home page Coder Social logo

kubebrain's Introduction

KubeBrain

English | 中文

Overview

Kubernetes is a distributed application orchestration and scheduling system. It has become the de facto standard for cloud-native application bases, but its official stable operation scale is limited to 5K nodes. This is sufficient for most application scenarios, but still insufficient for applications with millions of machine nodes. With the growth of "digitalisation" and "cloud-native" in particular, the overall global IT infrastructure will continue to grow at an accelerated rate. For distributed application orchestration and scheduling systems, there are two ways to adapt to this trend.

  • Horizontal scaling: building the ability to manage N clusters.
  • Vertical scaling: increasing the size of individual clusters. In order to scale a single cluster, the storage of meta/state information is one of the core scaling points, and this project is to solve the scalability and performance problems of cluster state information storage.

We investigated some of existing distributed storage systems, and analyzed the performance of ETCD and the interface usage in kubernetes for state information storage. Inspired by the kine project, KubeBrain was implemented as the core service for Kubernetes state information storage.

Feature

  • Stateless: KubeBrain is a component that implements the storage server interface required by the API Server. It performs the conversion of the storage interface and does not actually store the data. The actual metadata is stored in the underlying storage engine, and the data that the API Server needs to monitor is stored in the memory of the master node.
  • Extensibility: KubeBrain abstracts the key-value database interface, and implements the interface needed for storage API Server storage on this basis. All key-value databases with specified characteristics can be adapted to the storage interface.
  • High availability: KubeBrain currently uses master-slave architecture. The master node supports all operations including conditional update, read, and event monitoring. The slave node supports read operations, and automatically selects the master based on K8S's "leaderelection" to achieve high availability.
  • Horizontal scaling: In a production environment, KubeBrain usually uses a distributed key-value database to store data. Horizontal scaling involves two levels:
    • At the KubeBrain level, concurrent read performance can be improved by adding slave nodes;
    • At the storage engine level, read and write performance and storage capacity can be improved by adding storage nodes.

Detailed Documentation

TODO

  • Guarantee consistence in critical cases
  • Optimize storage engine interface
  • Optimize unit test code, add use cases and error injection
  • Jepsen Test
  • Implement Proxy to make it more scalable

Contribution

Please check Contributing for more details.

Code of Conduct

Please check Code of Conduct for more details.

Community

License

This project is licensed under the Apache-2.0 License.

kubebrain's People

Contributors

c4pt0r avatar charleszheng44 avatar divanodestiny avatar lenage avatar siddontang avatar sky-big avatar strrl avatar tangmengqiu avatar xuchen-xiaoying avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubebrain's Issues

Quick Start Error

What happened?

make badger
./bin/kube-brain --key-prefix "/"
这个启动命令示例是不是有点问题?直接这样运行会报错,没有看到有详细的文档介绍,看了一下代码,这个启动命令直接执行程序是起不来的,是否考虑换一个能直接执行的能够演示的启动命令

func (o *KubeBrainOption) Validate() error {
// 在这里check的时候如果key-prefix后缀是 "/" 程序就一直起不来,示例启动命令刚好是这个
if strings.HasSuffix(o.Prefix, "/") {
return fmt.Errorf("prefix %s is invalid, make sure it has no / suffix", o.Prefix)
}
return o.storageConfig.validate()
}

What did you expect to happen?

no

How can we reproduce it (as minimally and precisely as possible)?

no

Software version

$ <software> version
# paste output here

在更新中,遇到大量的删除失败与更新失败的

What happened?

在更新中,遇到大量的删除失败与更新失败的
I0428 17:48:58.883419 3007632 kv.go:139] "txn failed" op="update" key="/registry/leases/kube-node-lease/kwok-node-3901"

I0426 18:29:10.242165 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766090 actualRev=449345926506766094
I0426 18:29:50.393991 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766236 actualRev=449345926506766239
I0426 18:30:30.558691 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766380 actualRev=449345926506766383
I0426 18:31:10.721504 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766524 actualRev=449345926506766527
I0426 18:31:50.889791 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766662 actualRev=449345926506766666
I0426 18:32:31.060937 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766807 actualRev=449345926506766810
I0426 18:33:11.225357 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766950 actualRev=449345926506766953
I0426 18:33:51.393198 3007632 txn.go:164] "delete cas failed" expectRev=449345926506767095 actualRev=449345926506767098
I0426 18:34:31.549499 3007632 txn.go:164] "delete cas failed" expectRev=449345926506767238 actualRev=449345926506767240
I0426 18:35:11.705058 3007632 txn.go:164] "delete cas failed" expectRev=449345926506767380 actualRev=449345926506767384

删除pod的时候会非常的慢,这个原因是什么?

What did you expect to happen?

这个如何解决

How can we reproduce it (as minimally and precisely as possible)?

非常容易复现

Software version

$ <software> version
# paste output here

[Bug] Delete resource error

What happened?

Commands:

kubectl create deployment nginx --image=nginx
kubectl delete deployment nginx

Error:

Error from server: invalid DeleteRange response: [response_range:<header:<revision:84 > kvs:<key:"/registry/deployments/default/nginx" mod_revision:81 value:"k8s\000\n\025\n\007apps/v1\022\nDeployment\022\374\007\n\371\005\n\005nginx\022\000\032\007default\"\000*$c876d2c2-0701-464a-83a9-942161d4e3a52\0008\001B\010\010\264\371\345\247\006\020\000Z\014\n\003app\022\005nginx\212\001\237\005\n\016kubectl-create\022\006Update\032\007apps/v1\"\010\010\264\371\345\247\006\020\0002\010FieldsV1:\345\004\n\342\004{\"f:metadata\":{\"f:labels\":{\".\":{},\"f:app\":{}}},\"f:spec\":{\"f:progressDeadlineSeconds\":{},\"f:replicas\":{},\"f:revisionHistoryLimit\":{},\"f:selector\":{},\"f:strategy\":{\"f:rollingUpdate\":{\".\":{},\"f:maxSurge\":{},\"f:maxUnavailable\":{}},\"f:type\":{}},\"f:template\":{\"f:metadata\":{\"f:labels\":{\".\":{},\"f:app\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"nginx\\\"}\":{\".\":{},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:resources\":{},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{}}},\"f:dnsPolicy\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:terminationGracePeriodSeconds\":{}}}}}B\000\022\357\001\010\001\022\016\n\014\n\003app\022\005nginx\032\250\001\n\036\n\000\022\000\032\000\"\000*\0002\0008\000B\000Z\014\n\003app\022\005nginx\022\205\001\022@\n\005nginx\022\005nginx*\000B\000j\024/dev/termination-logr\006Always\200\001\000\210\001\000\220\001\000\242\001\004File\032\006Always \0362\014ClusterFirstB\000J\000R\000X\000`\000h\000r\000\202\001\000\212\001\000\232\001\021default-scheduler\302\001\000\"'\n\rRollingUpdate\022\026\n\t\010\001\020\000\032\00325%\022\t\010\001\020\000\032\00325%(\0000\n8\000H\330\004\032\014\010\000\020\000\030\000 \000(\0008\000\032\000\"\000" > > ]

What did you expect to happen?

Delete resources without error

How can we reproduce it (as minimally and precisely as possible)?

STEP1: build kube-brain make badger, start kube-brain ./bin/kube-brain --compatible-with-etcd
STEP2: start kube-apiserver

# build kube-apiserver, in kubernetes project path
go build ./cmd/kube-apiserver

# gen cert and start kube-apiserver, the '--etcd-servers' parameter points to kube-brain
./kube-apiserver --etcd-servers=http://127.0.0.1:2379 \
    --service-account-signing-key-file=cert/service-account-key.pem \
    --service-account-issuer=https://127.0.0.1:6443 \
    --service-account-key-file=cert/ca-key.pem \
    --token-auth-file=cert/token.csv \
    --tls-cert-file=cert/kubernetes.pem \
    --tls-private-key-file=cert/kubernetes-key.pem \
    --client-ca-file=cert/ca.pem

STEP3: delete rsource

kubectl create deployment nginx --image=nginx
kubectl delete deployment nginx

Error from server: invalid DeleteRange response: [response_range:<header:<revision:675 > kvs:<key:"/registry/deployments/default/nginx" mod_revision:674 value:"k8s\000\n\025\n\007apps/v1\022\nDeployment\022\374\007\n\371\005\n\005nginx\022\000\032\007default\"\000*$431b9682-ad70-40cc-b921-636e7a99c43b2\0008\001B\010\010\212\375\345\247\006\020\000Z\014\n\003app\022\005nginx\212\001\237\005\n\016kubectl-create\022\006Update\032\007apps/v1\"\010\010\212\375\345\247\006\020\0002\010FieldsV1:\345\004\n\342\004{\"f:metadata\":{\"f:labels\":{\".\":{},\"f:app\":{}}},\"f:spec\":{\"f:progressDeadlineSeconds\":{},\"f:replicas\":{},\"f:revisionHistoryLimit\":{},\"f:selector\":{},\"f:strategy\":{\"f:rollingUpdate\":{\".\":{},\"f:maxSurge\":{},\"f:maxUnavailable\":{}},\"f:type\":{}},\"f:template\":{\"f:metadata\":{\"f:labels\":{\".\":{},\"f:app\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"nginx\\\"}\":{\".\":{},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:resources\":{},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{}}},\"f:dnsPolicy\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:terminationGracePeriodSeconds\":{}}}}}B\000\022\357\001\010\001\022\016\n\014\n\003app\022\005nginx\032\250\001\n\036\n\000\022\000\032\000\"\000*\0002\0008\000B\000Z\014\n\003app\022\005nginx\022\205\001\022@\n\005nginx\022\005nginx*\000B\000j\024/dev/termination-logr\006Always\200\001\000\210\001\000\220\001\000\242\001\004File\032\006Always \0362\014ClusterFirstB\000J\000R\000X\000`\000h\000r\000\202\001\000\212\001\000\232\001\021default-scheduler\302\001\000\"'\n\rRollingUpdate\022\026\n\t\010\001\020\000\032\00325%\022\t\010\001\020\000\032\00325%(\0000\n8\000H\330\004\032\014\010\000\020\000\030\000 \000(\0008\000\032\000\"\000" > > ]

Software version

branch: main
commit ID: 0cc3be740589fd51db3367c7037669602814e120

Provide more detail on reproducing the benchmark result

What would you like to be added?

What etcd version the benchmark is using? 3.5.x or 3.4.x
go-ycsb doesn't support etcd protocol on the master branch, so is the benchmark using a non-release version?
Is there any Tikv tuning or it is all default setting?
What is the version of Tikv in the benchmark report?

Why is this needed?

We need to reproduce the benchmark result to determine whether or not to replace etcd to kubebrain.

Question: more other backends support, and native watch support

I noticed that kubebrain already supported these storage "backend":

  • badger
  • in-memeory memkv / skiplist
  • tikv

Do we have plans to support more other KvStorage backends like Redis, MongoDB, aerospike, and so on?

Another question is, many KV Databases already support WATCH API as the basic operation, should we consider the support for the native watch API?

How about prune the dependency of `k8s.io/kubernetes`

What would you like to be added?

I found that the only usage of k8s.io/kubernetes is here:

utilflag.PrintFlags(cmd.Flags())

I am very pleased to contribute if my proposal is accepted.

Why is this needed?

  • It would reduce lots of time of preparing the dev environment and also the time of building kubebrain.
  • it would reduce the concern about "which version of k8s.io/kubernetes should we use" , and many related go modules replace issues

You know, the much better dev experience. :P

关于项目

What happened?

  • 没有微信群嘛? 邮件无法发送噢,有群会更方便吧
  • 会提供etcd迁移到kubebrain的工具嘛? 毕竟集群一开始可能没这么大规模,后边逐渐扩容可能才有必要迁移
  • 后面考虑成为CNCF项目?

What did you expect to happen?

  • 提供微信群或者其他IM群?
  • 提供迁移工具

How can we reproduce it (as minimally and precisely as possible)?

只是些小建议

Software version

No response

Question: A little confusion about the stateless, availability and scalability.

I totally admit that Kubebrain is definitely a GREAT project! And thank you open-source it! 🎉🎉🎉

I also considered how to make Kubernetes run on TiKV, and make the adapter layer efficient and flexible. I have some questions about the design and implementation of kubebrain. Please let me know what I missed! ❤️

It seems kubebrain uses a “master-slave” architecture, and “watch routine” only happens on the leader. So when using kubebrain with the official kube-apiserver, I could only configure the leader’s IP as the etcd server. But the leadeship would be changed, but the configuration on kube-apiserver is not easy to change. So I think currently, there is no easy way to use multi-instance kubebrain with official kube-apiserver.

And I noticed that we would build a proxy for kubebrain on the ROADMAP. Until the proxy works as what etcd does, I think this problem would be resolved. :)

Or do we have other suggested ways to setup multi-instance kubebrain with official kube-apiserver?

Another issue with “single master” is scalability. The performance of the master-slave architecture applications is restricted by the power of the single node. Do we have a plan or idea for migrating to “multi-master” design?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.