Coder Social home page Coder Social logo

gocrane / crane Goto Github PK

View Code? Open in Web Editor NEW
1.8K 40.0 374.0 67.89 MB

Crane is a FinOps Platform for Cloud Resource Analytics and Economics in Kubernetes clusters. The goal is not only to help users to manage cloud cost easier but also ensure the quality of applications.

Home Page: https://gocrane.io

License: Apache License 2.0

Dockerfile 0.09% Makefile 0.49% Go 80.14% Shell 0.37% JavaScript 0.36% HTML 0.48% TypeScript 16.87% Less 1.14% SCSS 0.07%
kubernetes finops cloud-computing cloud-native cost-optimization autoscaling prediction analytics monitoring time-series

crane's People

Contributors

2456868764 avatar borgerli avatar chenkaiyue avatar garrybest avatar hank6086 avatar janeconan avatar jxs1211 avatar kitianfresh avatar lbbniu avatar leeweir avatar liu-song avatar mangogoforward avatar mfanjie avatar michaelcheungdk avatar mtdtdev avatar payall4u avatar pmcfizz avatar qingtiantongxie avatar qmhu avatar reganyue avatar saikey0379 avatar shijieqin avatar szy441687879 avatar whitebear009 avatar xieydd avatar xrmzju avatar yan234280533 avatar yufeiyu avatar zouyee avatar zsnmwy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crane's Issues

document for contributor guide

Describe the feature

@mfanjie , I don't find the contributor guide in the readme file. Should we add this guide? If it is ok, please assign this issue to me. I will try to add the document.

thanks.

Unit tests for crane

We need UT code for many functions, and many of them are isolated enough so they can be picked up by new comers.
For example match() in pkg/controller/analytics/analytics_controller.go, help needed.

refactor crane agent

Describe the feature

The main crane agent framework is merged, found some areas can be enhanced, open this ticket to track the refactor effort.

Default PromQL query syntax should not use regular expressions

Describe the bug
NodeCpuUsagePromQLFmtStr = sum(count(node_cpu_seconds_total{mode="idle",instance="%s"}) by (mode, cpu)) - sum(irate(node_cpu_seconds_total{mode="idle",instance="%s"}[%s]))
NodeMemUsagePromQLFmtStr = sum(node_memory_MemTotal_bytes{instance="%s"} - node_memory_MemAvailable_bytes{instance="%s"})
The above two default queries use a regular expression, which will cause the query result that does not meet the expected expectations.
Reproduce steps

Expected behavior

Screenshots

Environment (please complete the following information):

  • K8S Version: [e.g. 1.19]
  • Crane Version: [e.g. 0.1.0]
  • Browser [e.g. chrome, safari]

how to use

Describe the feature

Do you have instructions

NodeResourceController should merge real-time data from nodes to compute ext resources

Describe the feature

The current NodeResourceController only calculates the kubernetes node's ext resource based on TSP's prediction data, and does not update when TSP has no data, however, in some cases TSP will not be able to calculate the data and TSP is not sensitive to bursts, so we need to merge real-time data of kubernetes nodes to assist NodeResourceController in calculating ext resource.

  1. If we want to merge real-time data to assist calculations, we first have to put it together The logic of nodeResourceController is implemented in the crane-agent.
  2. Crane-agent uses timeSeriesPredictionInformer to sense changes in TSP and notify NodeResourceManager, NodeResourceManager collect data from other collectors (including collectors of real-time data) and merges them with TSP's data, and finds the maximum value from the merged data to calculate the ext resource
  3. In order to avoid abnormal TSP controller and cause TSP not to be updated for a long time, NodeResourceManager's real-time data Collector will regularly notify NodeResourceManager.NodeResourceManager collects and merges other Collector data (including TSP Collector data). And get the maximum value from the merged data to calculate the ext resource.

Support cron for ehpa

Describe the feature

There are no cron way to scale for ehpa, support it by an external metric way

modify the names of node qos ensurance examples

What version of Crane?
release 0.1.0

Describe the bug
Examples of node qos ensurance's name are not suitable

To Reproduce
not related

Expected behavior
Replace preferable names for the example

Screenshots
not related

Environment (please complete the following information):

  • K8S Version: ALL
  • Crane Version: ALL
  • Browser ALL

Support checkpoint for percentile algorithm predictor

Describe the feature

  1. Now there is no checkpoint for each time series in percentile algorithm,we can describe a behavior for evpa crd to support restore algorithm model from prometheus history data or checkpoint store

Rename service for craned

Describe the feature

Rename service for craned, current service name is webhook-service, we need to change it to craned to support further requirements when use craned service.

Unregister a query when related crd is in deletion.

Describe the bug
Currently when user delete a tsp or recommendation, the prediction core is still registed the query and compute in background. We need to release it durning related crd's deletion.

Reproduce steps

Expected behavior

Have the ability to let controller unregister their query.

Screenshots

Environment (please complete the following information):

  • K8S Version: [e.g. 1.19]
  • Crane Version: [e.g. 0.1.0]
  • Browser [e.g. chrome, safari]

Propagation labels and annotations from ehpa to hpa

Describe the feature

We need to propagation labels and annotations when create hpa inside. the source is from ehpa.
Better to config it in command line arguments like "--ehpa-propagation-label-prefix" and "--ehpa-propagation-annotation-prefix"

Support longer term prometheus query

Describe the bug
Prometheus only supports 11000 data points per query, so crane cannot query for a long period of time, e.g. 14 days.

Reproduce steps

Expected behavior
Prometheus provider should support query that exceeding the limit of 11000 points.

Screenshots

Environment (please complete the following information):

  • K8S Version: [e.g. 1.19]
  • Crane Version: [e.g. 0.1.0]
  • Browser [e.g. chrome, safari]

crane-agent not work normally when using examples/ensurance

Describe the bug
1.use nodeName + "_" + string(uuid.NewUUID()) as nodename in podList etc.
2.NewNodeLocal collectors not staring always
Reproduce steps

Expected behavior

Screenshots

Environment (please complete the following information):

  • K8S Version: [e.g. 1.19]
  • Crane Version: [e.g. 0.1.0]
  • Browser [e.g. chrome, safari]

Configurable percent for idle resource reallocation

Describe the feature

Current Node Resource Controller update Kubernetes node ext resource with the predicted idle resource, all idle resources will be reallocated as ext resource which can be used by lower priority pods, especially for offline job, however this would leads the node resource to be exhausted, thus some of them would be evicted during the the execution.
So the request is to make the idle resource reallocation percentile can be configured, e.g. 4 cpu cores are idle, but only reallocate 2 cores.

Put recommendation result into target's annotation

Describe the feature

currently the recommendation result is present on recommendation.status, we can also put it into target's annotation.
this feature should be an option in recommendation's spec.

The kubernetes node CPU useage should subtract the CPU usage of services that use EXT resources

Describe the feature

  1. Ext-resource service(The service using EXT resources) is to populate the idle resources of the kubernetes node. If the CPU used by the ext-resource service is calculated to the CPU of the kubernetes node, nodeResourceController will double-compute the CPU used of the ext-resource service when updating the kubernetes node ext resources (the ext-resource of the service requested has been calculated into the allocation by the kubelet)
  2. Crane-Agent should expose the CPU usage metrics of the ext-resource service such as node_ext_cpu_usage_seconds_total
  3. NodeCpuUsagePromQLFmtStr: sum(count(node_cpu_seconds_total{mode="idle",instance=~"%s.*"}) by (mode, cpu)) - sum(irate(node_cpu_seconds_total{mode="idle",instance=~"%s.*"}[%s])) - (sum(irate(node_ext_cpu_usage_seconds_total{node="%s"}[%s])) or Vector(0))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.