jthomperoo / custom-pod-autoscaler Goto Github PK
View Code? Open in Web Editor NEWCustom Pod Autoscaler program and base images, allows creation of Custom Pod Autoscalers
License: Apache License 2.0
Custom Pod Autoscaler program and base images, allows creation of Custom Pod Autoscalers
License: Apache License 2.0
Each commit should result in a new image being pushed to Docker Hub.
Rather than being triggered just by the timer and at set intervals, the Custom Pod Autoscaler evaluation could be triggered manually, through a REST endpoint.
This would allow users to send an HTTP request to the Custom Pod Autoscaler and start an evaluation immediately, rather than waiting for the interval to expire.
Allow configuration of the following for the API:
2 flask vulnerabilities found in โฆ/app/requirements.txt
Remediation
Upgrade flask to version 1.0.0 or later. For example:
flask>=1.0.0
Should have two modes, api
and scaler
.
api
is when the evaluation is triggered through the API, as a read only event.scaler
is when the evaluation is triggered by the regular scaler logic.Implement a timeout for fetching metrics/evaluations, configurable - stops the CPA being unresponsive if a script fails - i.e due to pod terminating/crashing half way through an evaluation.
The CPA evaluator could decide which pods to terminate when scaling down, rather than relying on the Kubernetes decision making which bases it on how old the pod is.
This could be a list of pods with priorities assigned to them, with the lowest priority pods terminated when scaling down as needed.
Env vars should be lowercase, rather than uppercase to match the YAML config.
This will help when writing unit tests for other packages.
Hooks would be points at which a user-defined shell command is executed, to allow users to have greater control of the Custom Pod Autoscaler.
This code seems to be preventing me from scaling a deployment to zero replicas:
custom-pod-autoscaler/scale/scale.go
Line 112 in 7b1d427
My use case is a queue-based processing system with a bunch of GPUs, so scale to zero is rather important (and also why I can't use the build in HorizontalPodAutoscaler)
At the minute, the metric is run for every pod in a deployment; however it would be useful if you could run the metric only per deployment, rather than per each pod.
A new configuration option would help this, run-mode
, with these options:
per-pod
- Runs per pod.per-deployment
- Runs per deployment.Metrics should be retrievable through an API endpoint:
GET /metrics
Returning:
[
{
"deployment": "example-deployment",
"metrics": [
{
"pod": "example-pod",
"value": "value"
}
]
}
]
These metrics should be calculated at request time.
Would require a change to how the current configuration to support this, e.g. instead of
metric: "shell command"
It should allow you to specify which way to pass data, for shell commands:
metric:
type: "pipe"
pipe: "shell command"
For something like a HTTP request:
metric:
type: "http"
http:
method: "GET"
endpoint: "https://0.0.0.0:5000/metrics"
This would also apply to evaluations, so a more general configuration option to specify a series of different data transfer options would be useful.
This would be useful for introducing the framework.
If a new release is built on GitHub, it should result in a new image being pushed to Docker Hub and that image should be tagged as latest.
Allow setting up a timing for the scaler to start, for example could provide the time:
0001-01-01 00:00:15 +0000 UTC
This would cause the scaler to start only after the next round 15 seconds. This would be set up with a default of
0001-01-01 00:00:00 +0000 UTC
Which would by default just start at the next full minute/hour/second etc.
At the minute if there are no deployments, the /evaluate.py
script still seems to be being called, with no metric JSON, causing it to error out. If there are no deployments, the evaluation should be skipped.
Depends on jthomperoo/custom-pod-autoscaler-operator#14
Relies on #75
For something like a HTTP request:
metric:
type: "http"
http:
method: "GET"
headers:
- key: value
- key: value
url: "https://0.0.0.0:5000/metrics"
parameterMethod: query
The dryRun
field should be provided to the hooks and metric gathering to allow them to determine if it is a dry run.
Thrashing is when a deployment is scaled up and down repeatedly in a short period of time, caused by being right on the threshold of an evaluation. For example, if the number of pods in a deployment rapidly changes between 2 and 3 because of small changes in the metrics as it is directly on a boundary/threshold.
The Cooldown feature would allow setting a delay or a cool down period on when to scale, avoiding rapid changes in number of pods for minor changes in the metric being evaluated. For example, a cooldown could be set to not scale again if a deployment has been scaled in the past 5 minutes.
Add glog
logging framework to allow for logging levels with severity and verbosity. Allow log verbosity to be set by Custom Pod Autoscaler configuration.
Selector should be consistent with HPA, using scaleTargetRef
to select what object to manage, this for now should only support Deployments.
At the moment, if the CPA is shut down it takes a long time to terminate, I think this is due to the goroutine still running. When the shutdown command is given, the goroutine should exit immediately.
Should be able to specify minimum and maximum replica counts. Autoscaling should be allowed to be disabled by setting the replica value to 0
in a resource's spec.
Change JSON returned to use the same naming convention as Kubernetes, with camelCase rather than snake_case.
Running golint pulls down dependencies rather than using the vendor directory, slowing down the CI.
When a release is created, the CI should deploy the binary that is built to GitHub releases.
Interval between evaluations should be configurable in environment variables. This would allow it to be specified in the Custom Pod Autoscaler YAML.
Currently /bin/sh
, allow setting it to other values, e.g:
/bin/bash
/usr/local/bin/python3
Allows for greater flexibility and not tying to a requirement for /bin/sh
Horizontal Pod Autoscaler supports this, so the Custom Pod Autoscaler should too.
From https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, replica set or stateful set...
Evaluations should be retrievable through an API endpoint:
GET /evaluations
Returning:
[
{
"deployment": "example-deployment",
"evaluation": {
"target_replicas": 2
}
}
]
This evaluation should be calculated at request time.
The ResourceMetrics
struct has redundant data (the resource) and should be simplified by removing it and replacing it.
Support JSON alongside YAML for configuration file, consistent with Kubernetes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.