kedacore / keda Goto Github PK

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes

Home Page: https://keda.sh

License: Apache License 2.0

Go 99.09% Shell 0.29% Dockerfile 0.15% Makefile 0.46%

kubernetes serverless autoscaling event-driven keda hacktoberfest

keda's People

Contributors

Stargazers

Watchers

Forkers

jmccar yaron2 charlesakalugwu rkamisetti792 isgasho ptzagk beingsane pushthat parmarsumit caweltjr howardedidin tikyau cloudmelon matt-mahdieh hhy5277 kxion dyhpoon frogcam rajibmitra hydpublic shaunstanislauslau hubbucket-team upnrunnhq abduljaleel sjwaight aslom krisnova aminebizid smartfrog-oss izzymsft hmilkovi polyswarm pratham98k edidingroup nepomuceno aarthisk ericbottard t-shama zroubalik shashanksr6694 jiangjunzhang009 xeroeta desmax74 patnaikshekhar linus5 ghanima davidxarnold anshulahuja98 ndevops techniumlabs kernt tomkerkhove anniyanvr stgricci patocl apwestgarth toshi0607 cliffjansen ambhojgoyal lesio777 findpritish avaussant maria-hh pshail eashi markfisher scotty-c maheedhar131 gordonby balchua sbawaska dprotaso dorucioclea aliveon appcoreopc ross-p-smith cnadolny dobrzyn2 lawrencegripper tsuyoshiushio kunalbabre du2016 tuantranf sjwoodman ianpartridge yuces iyacontrol odedpriva rizalgowandy danielchudc isharapannila rasavant-ms n3wscott msfuko servicefoundation melmaliacone samjo-nyang trucnguyenlam izekchen lberk

keda's Issues

Get logging out of only debug level

Some logs should be emitted by default to help with debugging. Specifically I think these logs would be interested:

When a new scaledObject is detected / registered
When activation occurs
When a scaling decision is made

update scaled object status with error if it's in error state

Keda as a generic pull-based autoscaler for K8s and Knative

I'd love to see Kore become a generic pull-based autoscaler for Kubernetes and Knative. Below are the general high-level things I believe are needed to make that happen:

Open source Kore
Define an extension or plugin mechanism. If Kore supports various Azure services and Kafka out of the box, how would a user add support for another service? AMQP, other cloud provider services, etc? Where would that code live? Does it all need to live in the Kore repo? How would the person deploying Kore enable or disable specific service support?
Where would the Knative PodAutoscaler implementation the delegates to Kore live? Should that part live in Knative itself and just depend on Kore, which keeps Kore from needing to take a dependency on Knative? Or should it live in Kore? Or its own repo, independent of either?
Add unit and e2e tests so we have confidence in releases
Give Kore the option to watch a single namespace instead of only all namespaces. This is trivial if we move to an operator-sdk style layout, but even with what's here today it's just adding an environment variable and a bit of logic to control which namespaces we watch. I believe this is important because Kore handles Secret objects and, at least while it's fairly new, limiting its privileges to reading Secrets in a single namespace seems like a good idea.

This issue is to have a conversation about the above and track any other bits of work we need to make it happen.

Use Operator SDK

Waiting for #47 and #41.

Kore metrics as custom/external Kubernetes metrics

Kubernetes defines APIs to get custom/external metrics into the system. They can also be used to feed the HPA for example.

As Kore progresses, we could think about organizing the code in a way, that makes it straightforward to eventually plug it into the HPA (or other autoscalers). I think these APIs make for a very nice uniform way to provide the metrics Kore needs.

See https://github.com/kubernetes-incubator/custom-metrics-apiserver/blob/master/pkg/provider/interfaces.go#L104-L108 for the interface to implement. This might directly influence #19 as it could be a partial answer as to how the code should be organized.

I'd envision a custom/external metrics adapter per event-source. Kore's own autoscaler would poll these metrics in a fixed interval (as today) for now, to provide the cases Kore need.

Ability to set a minimum number of instances for a deployment

Rather than force all functions to scale to 0 on no events, you may want some functions (e.g. HTTP ones) to only ever scale down to 1. You still want them to scale 1 -> n if the event source gets noisy, but that would allow you to protect against cold start.

Have scaled objects override min and max replica counts

examples/scaledobject.yaml out of date

The example ScaledObject in examples/scaledobject.yaml has fields that don't match the spec. I can look at the source and figure out what's expected here, but just adding an issue to remember to update the example.

Stop scaling out new resources if the function isn't successfully processing

There are two scenarios we have today where we stop scaling new instances even as the queue length increases:

No available partitions. If you have an event hub with 5 partitions that is growing in event lenghts, we will scale to 5 instances, and then 6. The 6th one gets scaled out but can't get a lock on a partition so never actually consumes any events. We never scale to a 7th because we look to see that the 6th isn't execution so stop scaling.
A function is misconfigured and never actually is able to start consuming. Rather than scaling out a broken app, we stop scaling if we see no executions are happening.

In our service we do that by looking at the execution and billing metrics for the instance, and if no billing or execution metrics are emitting we stop scaling more.

We'd need a similar pattern in Kore so that if the event consumer isn't actually processing new messages, we don't keep scaling. Both for the "no available partition" scenario or the "my app isn't even working" scenario. We could also solve each of these in 2 seperate ways (e.g. maybe expose partition info into the scaler?)

Create helm chart for deploying

Issue with api-resources and custom.metrics.k8s.io

After deploying via the current master branch deploy, things seem to setup ok but when I run kubectl api-resources I get the following:

error: unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server could not find the requested resource

Tried on a fresh install and same thing.

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-03-01T23:34:27Z", GoVersion:"go1.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:30:48Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Add kafka trigger to Azure Functions host

To support Kafka with the Azure Functions runtime in Kore, we'll need corresponding logic in the Azure Functions host.

Create OpenAPI3 validation schema for ScaledObject

Set proper logging level for debug output

Allow non string values for ScaledObject metadata

Update the HPA definition on scaledObject update

move scale_handler to its own package

I think the handler should be in a separate go package from the scalers... right?

Allow loglevel change through a commandline option for debugging

Understand the story for multiple namespaces and a single or multiple kore deployment

Sounds like cross namespace custom metrics isn't supported today.

HTTP integration into Keda tooling (func)

People deploying a function set to consume HTTP events or a function set to consume queue events should work through the same publishing flow.

I understand from a meta-point of view, the decision on where or how to handle HTTP scale-to-zero and scale-to-n is slightly different than our non-HTTP triggers, and there are concepts like endpoint controllers, and forwarding of events. In addition, there are a few options on how to deal with this (Osiris, Knative, even some discussion about supporting this on the HPA directly).

I think short term the desired action is for the tooling to be able to integrate with and help establish an HTTP pipeline without burdening the user. The default should likely be something with few dependencies like Osiris, but I think it makes sense to make it configurable or extendible if people wanted a deploy that worked with Knative eventing. For now just scoping this issue to Osiris support.

Update ScaledObject status on scale in/out

Func cli command to generate deployment artifacts

Using the functions command line tools I should be able to do a deploy of a function project and have the necessary artifacts generated. This would include:

Ability to create scale controller resources per cluster (either idempotently or an explicit gesture once per cluster) (Needs linked issue)
Ability to generate the appropriate ScaledObjects based on the function.json metadata of the function (Needs linked issue)

Helm chart deployment of metrics adapter

[Scaler] Azure queues

Component that can drive scaling decisions and metrics specific to Azure queues.

Keep track of k8s api server usage

This issue is to examine KEDA's CPU and Memory implications in regards to the Kubernetes API server.
This is to ensure that KEDA does not cause over-utilization of the API server, as it communicates in a bi-di manner with the API server (KEDA controller being asked for metrics and looking up Kubernetes objects using the Kubernetes API).

Removing a deployment doesn't get rid of autoscale rules

When I delete a deployment to Kore I still see the horizontalpodautoscaler.autoscaling resource floating around. Ideally that would be removed after I remove a deployment

Validate kore installation in OCP4 (func install)

We should validate that Kore can be installed in OCP4 using the CLI and that we can use custom metrics to scale functions 1-N.

[Scaler] Azure Event Hubs

Have helm chart delete CRD or add option to not create CRD

Deleting a helm release of keda & then re-installing fails currently as helm does not delete CRDs by default. There are a few possible work arounds:

Add a values file flag that sets whether or not CRDs are created
Add a hook that cleans up the ScaledObject CRD post-deletion
Add a hook that cleans up the ScaledObject CRD pre-install (wipes old versions if they exist)

Might want to do 1. & one of 2. or 3. - since killing the ScaledObject CRD will kill deployed ScaledObjects afaik, adding in option 1. & making either 2. or 3. dependent on that value will give users a use case where they don't touch their deployed ScaledObjects while making other edits

Haven't checked the func cli behavior for this issue but the func cli also doesn't offer a command to delete/remove keda from a cluster so at that point it's manual anyway.

Add information to NOTES.txt in helm chart

Add installation notes, next steps, methods to see output, etc to NOTES.txt - some general information that would be useful to someone who has just installed the chart.

Most likely: note about config-ing scaledobjects, note about seeing which External Metrics are in use via apis/external.metrics... link, how to see current hpas, etc

Kore metrics adapter for integration with HPA

Metrics adapter layer for HPA to query. Includes integration with scalers to query for said metrics

Reduce level of logs

Seeing noisy logs on service-controller, lots of requests to create Kubernetes resources, and lots of custom and external metrics requests and responses. Would be good if we could move those to debug logs.

/cc @Aarthisk

Consider updating repo layout to align with operator-sdk or kubebuilder

This repo is based on the Kubernetes sample-controller, which is a bit outdated in its layout. Operator SDK and kubebuilder are both tools that have a more modern approach to generating Kubernetes controllers. Specifically, if this repo ever evolves to contain more than one API or controller reconcile loop, the skaffolded layouts of the above two are a more flexible approach.

I'm not sure that's needed, depending on where we'd like to land the Knative PodAutoscaler implementation - in this repo itself, in Knative, or in a 3rd new repo.

Publishing HPA specs for different scaled objects

Querying scalers for HPA definitions and creating HPAs based on the spec.

Add end to end tests

Move rabbitmq password to environment.

I think the current rabbitmq spec contains a password 'guest' https://github.com/Azure/Kore/blob/master/examples/rabbitmq_scaledobject.yaml#L15

This should probably move to env vars like we handled all other secrets, right?

/cc @yaron2

Align func core tools & trigger naming conventions

There are mismatched naming conventions for triggers when going between func core tools uses, function container env vars, & kore go files.

ex: storage queue functions have trigger type queueTrigger in functions, then type azure-queue in Kore.

If something in func core tools is deliberately standardizing those names for the k8s setting,

that should be documented somewhere for cases where someone writes their own ScaledObject yaml (related: #96 )
serviceBusTrigger should also be standardized by func core tools when creating a scaled object

Validate that Kore works with Virtual Nodes / Kubelet

We should check that you could have a cluster with very few real nodes and have your "functions" scale out exclusively on Kubernetes virtual nodes in AKS or the virtual kubelet.

[Scaler] Azure Service Bus

Provide default values for as much trigger metadata as is reasonable

~~- [ ] RabbitMQ Scaler could have default/fallback queueLength value~~ n/a

Kafka Scaler could have default/fallback lag threshold

Change code and documentation to replace Kore with Keda

Kafka trigger - recover broken function instances?

Not sure the right fix for this. Was playing with the kafka trigger again today, here's the cycle:

Created a kafka topic with a single partition
Deploy a function with KEDA. KEDA activated, the first function locked the partition
KEDA kept scaling out (which is fine for now) until I had 4 instances. Only 1 was active (the first one). Once it caught up KEDA scaled down to 1 instance.

however at this point the instance that was left remaining was one of the additional instances that never got a lock. When checking the logs for that function it was more or less dead.

info: Host.General[0]
Host lock lease acquired by instance ID '000000000000000000000000448490CC'.
fail: Host.Triggers.Kafka[0]
kafka-cp-kafka-headless:9092/bootstrap: Failed to resolve 'kafka-cp-kafka-headless:9092': Temporary failure in name resolution (after 5298ms in state CONNECT)
fail: Host.Triggers.Kafka[0]
1/1 brokers are down

I'm not sure if I really had a reliability issue, or if this was one of the ones that didn't have an available partition.

In my mind a few thoughts:

Should the Kafka trigger keep retrying to connect if it fails? I assume the runtime in general doesn't do this?
Should Kubernetes know that this function is in a dead state so it can do the CrashBackoffCycle and restart it? If so, is there an existing health probe we should be hooking up?

Realize this isn't really a KEDA issue but didn't know where else to put.

/cc @ahmedelnably @fabiocav would be interested to get your thoughts here

core-tools kubernetes commands design

Install:

func kubernetes install kore
func kubernetes install osiris

# Other options:
# --namespace     Default: "default"
# --version       Default: latest
# --chart-option  Default: none.

Remove

func kubernetes remove kore 
func kubernetes remove osiris

# Other options:
# --namespace     Default: "default"

Update

func kubernetes update kore
func kubernetes update osiris

# Other options:
# --version    Default: latest

Deploy

# Auto-build container
func kubernetes deploy --name {name} --registry ahmelsayed.azurecr.io
func kubernetes deploy --name {name} --registry DockerHubUser

# Use image
func kubernetes deploy --name {name} --image-name {image-name}

# Other options:
# --image-pull-secret        Default: none.
# --enable-osiris            Default: true if the function app has any http trigger
# --function-per-deployment  Default: false
# --secrets-name             Default: generate a secrets collection.

Generate

# Use image
func kubernetes generate --name {name} --image-name {image-name}

# Other options:
# --image-pull-secret        Default: none.
# --enable-osiris            Default: true if the function app has any http trigger
# --function-per-deployment  Default: false
# --output                   Default: yaml. Options: yaml, json

Core tools generation fixes

Per my investigation I ran into two issues for the core tools. Creating them here for the detail instead of core tools repo.

ScaledObject needs to be deployed after the Deployment. Today running func kubernetes deploy --dry-run has ScaledObject serialized first.
The ScaledObject needs a label to hook up the custom metrics adapter for deploymentName which the value should be the name of the deployment

/cc @ahmelsayed @Aarthisk

examples do not conflict w/ each other
one scaledobject example per scaler
clear documentation on which fields are optional vs necessary
explanation of where fields are drawn from

handle disabled functions

[Scaler] Kafka

Creates the appropriate metrics and drives logic for scaling Kafka event sources.