Comments (13)
It seems that KEDA metrics server can't reach the operator pod. Are you using network policies or something so to manage the networking within the cluster?
from keda.
Hi @JorTurFer
KEDA has deployed under helm and GKE, and I have allowed the firewall.
k get apiservices
NAME SERVICE AVAILABLE AGE
...
v1beta1.external.metrics.k8s.io keda-system/keda-operator-metrics-apiserver True 40d
v1beta1.metrics.k8s.io kube-system/metrics-server True 2y30d
...
And I describe these apiservices
k describe apiservices v1beta1.external.metrics.k8s.io
Name: v1beta1.external.metrics.k8s.io
Namespace:
Labels: app.kubernetes.io/component=operator
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=v1beta1.external.metrics.k8s.io
app.kubernetes.io/part-of=keda-operator
app.kubernetes.io/version=2.12.1
helm.sh/chart=keda-2.12.1
Annotations: meta.helm.sh/release-namespace: keda-system
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2024-01-16T10:47:44Z
Resource Version: 819888502
UID: 77d55c9b-3a06-430f-972d-3aeacb7b70dc
Spec:
Ca Bundle: LS0tLSxxxxxxxxxxxxxxxxxxx
Group: external.metrics.k8s.io
Group Priority Minimum: 100
Service:
Name: keda-operator-metrics-apiserver
Namespace: keda-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2024-02-22T03:12:07Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events: <none>
Can you give me a direction to troubleshoot it. I looked up the troubleshooting on the homepage and it didn't really apply to my case.
from keda.
This issue is because KEDA pods can't communicate between them. Do you have any network policy in the cluster blocking internal traffic? KEDA's metrics server pod can reach KEDA's operator.
If you deploy a random pod in keda-system namespace and execute a curl from there to keda-operator.keda-system.svc.cluster.local:9666
, does it work?
from keda.
Hi @JorTurFer
So I tested 2 cases in the helm chart:
- Case 1: I using config default like
# -- Kubernetes cluster domain
clusterDomain: cluster.local
and execute to pods in the same namespace keda-system
to curl
got like
/workspace # nslookup keda-operator.keda-system.svc.cluster.local
Server: 169.254.169.254
Address: 169.254.169.254#53
** server can't find keda-operator.keda-system.svc.cluster.local: NXDOMAIN
/workspace # curl keda-operator.keda-system.svc.cluster.local:9666
curl: (6) Could not resolve host: keda-operator.keda-system.svc.cluster.local
- Case 2: I change
clusterDomain:
to new value
# -- Kubernetes cluster domain
clusterDomain: ct.dev
Because. My GKE using CloudDNS of GCP. and curl
got like
/workspace # curl keda-operator.keda-system.svc.ct.dev:9666
curl: (6) Could not resolve host: keda-operator.keda-system.svc.ct.dev
so I check logs of keda-operator-metrics-apiserver
W0226 09:42:36.952229 1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.keda-system.svc.ct.dev:9666", ServerName: "keda-operator.keda-system.svc.ct.dev:9666", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup keda-operator.keda-system.svc.ct.dev on 169.254.169.254:53: no such host"
Thank you bro
from keda.
So, is the service not available? what do you see as output from kubectl get svc -o wide -n keda-system
?
from keda.
Hi @JorTurFer J
Output
kubectl get svc -o wide -n keda-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
keda-admission-webhooks ClusterIP 10.99.193.232 <none> 443/TCP 40d app=keda-admission-webhooks
keda-operator ClusterIP 10.99.194.245 <none> 9666/TCP 40d app=keda-operator
keda-operator-metrics-apiserver ClusterIP 10.99.202.86 <none> 443/TCP,8080/TCP 40d app=keda-operator-metrics-apiserver
from keda.
I can see the service there, so IDK why the host can't be resolved 🤔
Maybe it's something related with the DNS resolution in GKE? Could you try this curl curl keda-operator.keda-system:9666
?
The self generated certificate has these configurations:
keda/pkg/certificates/certificate_manager.go
Lines 100 to 108 in b3f5548
Why do I say it? Because if curl works using just service.namespace, you can override the value to use it thanks to the arg metrics-service-address
, you can just set --metrics-service-address=keda-operator.keda-system:9666
in the metrics server and it will use the new host without the cluster DNS
from keda.
Hi @JorTurFer So I miss config clusterDomain
.
I update it again like this:
# -- Kubernetes cluster domain
clusterDomain: gke1.ct.dev
Now. I checked again with curl
, telnet
, and nslookup
nettools:/workspace#
nettools:/workspace#
nettools:/workspace# curl keda-operator.keda-system.svc.gke1.ct.dev:9666
curl: (52) Empty reply from server
nettools:/workspace# nslookup keda-operator.keda-system.svc.gke1.ct.dev
Server: 169.254.169.254
Address: 169.254.169.254#53
Non-authoritative answer:
Name: keda-operator.keda-system.svc.gke1.ct.dev
Address: 10.99.194.245
nettools:/workspace# telnet keda-operator.keda-system.svc.gke1.ct.dev 9666
Connected to keda-operator.keda-system.svc.gke1.ct.dev
I think DNS works. But I don't know when to curl
it became empty
Any idea bro?
from keda.
The gke1
is missing in your previous message #5527 (comment) and I bet that it's the root cause xD
Could you try updating KEDA to set the cluster domain as gke1.ct.dev
? You could have to delete the secret kedaorg-certs
within keda's namespace (and restart KEDA components)
from keda.
Hi @JorTurFer
Nice bro. I deleted the secret kedaorg-certs
and restarted deploy to check logs it okay bro
kubectl delete secret kedaorg-certs -n keda-system
secret "kedaorg-certs" deleted
- I check logs of
keda-operator-metrics-apiserver
. It's Okay
I0226 15:26:59.757886 1 provider.go:81] keda_metrics_adapter/provider "msg"="KEDA Metrics Server received request for external metrics" "metric name"="s0-prometheus" "metricSelector"="scaledobject.keda.sh /name=ct-logic-uni-ad-listing-consumer" "namespace"="default"
- But. I checked
ScaledObject
again and I don't see it scale. I checked theHPA
again and there were no error logs. But I'm still not sure why it won't scale my deployment.
I describedScaledObject
:
π ~> kubectl describe ScaledObject ct-logic-uni-ad-listing-consumer
Name: ct-logic-uni-ad-listing-consumer
Namespace: default
Labels: app=ct-logic-uni-ad-listing-consumer
app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: ct-logic-uni-ad-listing-consumer
meta.helm.sh/release-namespace: default
API Version: keda.sh/v1alpha1
Kind: ScaledObject
Metadata:
Creation Timestamp: 2024-02-26T15:29:23Z
Finalizers:
finalizer.keda.sh
Generation: 1
Resource Version: 827148808
UID: f80f6f14-3015-4e74-b0e3-69a83db30c61
Spec:
Cooldown Period: 100
Max Replica Count: 9
Min Replica Count: 1
Polling Interval: 100
Scale Target Ref:
API Version: apps/v1
Kind: Deployment
Name: ct-logic-uni-ad-listing-consumer
Triggers:
Metadata:
Ignore Null Values: true
Query: sum(ad_listing_system_logic_priority_queue_tasks_counter{deployment="ct-logic-uni-ad-listing-metrics"}[2m])
Server Address: https://vmselect.domain/select/0/prometheus
Threshold: 2
Type: prometheus
Status:
Conditions:
Message: ScaledObject is defined correctly and is ready for scaling
Reason: ScaledObjectReady
Status: True
Type: Ready
Message: Scaling is not performed because triggers are not active
Reason: ScalerNotActive
Status: False
Type: Active
Message: No fallbacks are active on this scaled object
Reason: NoFallbackFound
Status: False
Type: Fallback
Status: Unknown
Type: Paused
External Metric Names:
s0-prometheus
Health:
s0-prometheus:
Number Of Failures: 0
Status: Happy
Hpa Name: keda-hpa-ct-logic-uni-ad-listing-consumer
Original Replica Count: 1
Scale Target GVKR:
Group: apps
Kind: Deployment
Resource: deployments
Version: v1
Scale Target Kind: apps/v1.Deployment
Events: <none>
I described HPA
:
π ~> kubectl describe hpa keda-hpa-ct-logic-uni-ad-listing-consumer
Name: keda-hpa-ct-logic-uni-ad-listing-consumer
Namespace: default
Labels: app=ct-logic-uni-ad-listing-consumer
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=keda-hpa-ct-logic-uni-ad-listing-consumer
Annotations: meta.helm.sh/release-name: ct-logic-uni-ad-listing-consumer
meta.helm.sh/release-namespace: default
CreationTimestamp: Mon, 26 Feb 2024 22:29:53 +0700
Reference: Deployment/ct-logic-uni-ad-listing-consumer
Metrics: ( current / target )
"s0-prometheus" (target average value): 0 / 2
Min replicas: 1
Max replicas: 9
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},})
ScalingLimited True TooFewReplicas the desired replica count is less than the minimum replica count
Events: <none>
I'm not sure if there is any missing config. ?
from keda.
If we don't have the communication issues we are moving forward indeed! 😄
Could you try copycatting the exact query into your prometheus? sum(ad_listing_system_logic_priority_queue_tasks_counter{deployment="ct-logic-uni-ad-listing-metrics"}[2m])
The picture doesn't show the same query as it has no filters. The problem because I ask this is because I ignoreNullValues: true
can hide querying errors converting null
values into 0, which can fit in your case (you can just try removing the property temporally and if I'm right, you will see errors in KEDA opertor)
from keda.
Hi @JorTurFer
I checked again. so the query is incorrect bro
sum(ad_listing_system_logic_priority_queue_tasks_counter{deployment="ct-logic-uni-ad-listing-metrics"}[2m])
and I update new the query like
sum(ad_listing_system_logic_priority_queue_tasks_counter{app="ct-logic-uni-ad-listing-metrics"}[1m])
and I checked again and it worked as expected.
Events:
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},})
ScalingLimited True TooManyReplicas the desired replica count is more than the maximum replica count
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 16m horizontal-pod-autoscaler New size: 2; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
Normal SuccessfulRescale 7m53s horizontal-pod-autoscaler New size: 12; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) below target
Normal SuccessfulRescale 6m20s horizontal-pod-autoscaler New size: 8; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) below target
Normal SuccessfulRescale 6m6s horizontal-pod-autoscaler New size: 2; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) below target
Normal SuccessfulRescale 3m41s (x2 over 15m) horizontal-pod-autoscaler New size: 4; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
Normal SuccessfulRescale 3m25s (x2 over 15m) horizontal-pod-autoscaler New size: 8; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
Normal SuccessfulRescale 3m9s (x2 over 15m) horizontal-pod-autoscaler New size: 16; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
Normal SuccessfulRescale 2m53s (x2 over 15m) horizontal-pod-autoscaler New size: 20; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
from keda.
nice!
I close the issue as it looks solved, let me know if there is any other issue and I'll open it again
from keda.
Related Issues (20)
- Keda operator pod restarts because of objectScaler HOT 5
- cron-scaler scales higher than expected HOT 2
- Continuous HPA updates with CPU Utilization trigger HOT 1
- Add support for access token authentication to an Azure Postgres Flexible Server - Postgres scaler
- Service bus scaler whith workload-identity (override) HOT 3
- Unable to scale with AmbiguousSelector message when there are multiple HPA HOT 1
- Unable to Use Behavior Field with KEDA ScaledObject HOT 1
- Support for Kubernetes v1.30 HOT 1
- ScaledObject for RabbitMQ with Quorum Queues Not Scaling
- Single place to maintain common properties for scaled objects
- Failed to update TriggerAuthenticationStatus - podIdentity validation
- github-runner support for GitHub Runner Groups
- Could not find stackdriver metric with query fetch pubsub_subscription - Google Cloud Platform Pub/Sub
- Issue with Prometheus ScaledObject - ERROR failed to ensure HPA is correctly created for ScaledObject HOT 1
- Allow multiple directories of CA certificates
- Scale applications based on a PostgreSQL
- 增加按时间段修改minReplicaCount的值的功能
- unsafeSsl is being ignored in prometheus scaler for versions 2.13+
- Default Values for IgnoreNullValue for AWS Cloudwatch Scaler hides errors - V3 Fix
- Add custom name to Prometheus triggers to make the scaling dimensions clearer HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keda.