Coder Social home page Coder Social logo

Comments (13)

JorTurFer avatar JorTurFer commented on June 11, 2024

It seems that KEDA metrics server can't reach the operator pod. Are you using network policies or something so to manage the networking within the cluster?

from keda.

duydo-ct avatar duydo-ct commented on June 11, 2024

Hi @JorTurFer
KEDA has deployed under helm and GKE, and I have allowed the firewall.

k get apiservices
NAME                                                  SERVICE                                                                         AVAILABLE         AGE
...
v1beta1.external.metrics.k8s.io        keda-system/keda-operator-metrics-apiserver        True                     40d
v1beta1.metrics.k8s.io                       kube-system/metrics-server                                       True                     2y30d
...

And I describe these apiservices

k describe  apiservices v1beta1.external.metrics.k8s.io
Name:         v1beta1.external.metrics.k8s.io
Namespace:
Labels:       app.kubernetes.io/component=operator
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=v1beta1.external.metrics.k8s.io
              app.kubernetes.io/part-of=keda-operator
              app.kubernetes.io/version=2.12.1
              helm.sh/chart=keda-2.12.1
Annotations:  meta.helm.sh/release-namespace: keda-system
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2024-01-16T10:47:44Z
  Resource Version:    819888502
  UID:                 77d55c9b-3a06-430f-972d-3aeacb7b70dc
Spec:
  Ca Bundle:               LS0tLSxxxxxxxxxxxxxxxxxxx
  Group:                   external.metrics.k8s.io
  Group Priority Minimum:  100
  Service:
    Name:            keda-operator-metrics-apiserver
    Namespace:       keda-system
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2024-02-22T03:12:07Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>

Can you give me a direction to troubleshoot it. I looked up the troubleshooting on the homepage and it didn't really apply to my case.

from keda.

JorTurFer avatar JorTurFer commented on June 11, 2024

This issue is because KEDA pods can't communicate between them. Do you have any network policy in the cluster blocking internal traffic? KEDA's metrics server pod can reach KEDA's operator.

If you deploy a random pod in keda-system namespace and execute a curl from there to keda-operator.keda-system.svc.cluster.local:9666, does it work?

from keda.

duydo-ct avatar duydo-ct commented on June 11, 2024

Hi @JorTurFer
So I tested 2 cases in the helm chart:

  • Case 1: I using config default like
# -- Kubernetes cluster domain
clusterDomain: cluster.local

and execute to pods in the same namespace keda-system to curl got like

/workspace # nslookup keda-operator.keda-system.svc.cluster.local
Server:		169.254.169.254
Address:	169.254.169.254#53

** server can't find keda-operator.keda-system.svc.cluster.local: NXDOMAIN

/workspace # curl keda-operator.keda-system.svc.cluster.local:9666
curl: (6) Could not resolve host: keda-operator.keda-system.svc.cluster.local
  • Case 2: I change clusterDomain: to new value
# -- Kubernetes cluster domain
clusterDomain: ct.dev

Because. My GKE using CloudDNS of GCP. and curl got like

/workspace # curl keda-operator.keda-system.svc.ct.dev:9666
curl: (6) Could not resolve host: keda-operator.keda-system.svc.ct.dev

so I check logs of keda-operator-metrics-apiserver

W0226 09:42:36.952229       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.keda-system.svc.ct.dev:9666", ServerName: "keda-operator.keda-system.svc.ct.dev:9666", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup keda-operator.keda-system.svc.ct.dev on 169.254.169.254:53: no such host"

Thank you bro

from keda.

JorTurFer avatar JorTurFer commented on June 11, 2024

So, is the service not available? what do you see as output from kubectl get svc -o wide -n keda-system?

from keda.

duydo-ct avatar duydo-ct commented on June 11, 2024

Hi @JorTurFer J
Output

kubectl get svc -o wide -n keda-system
NAME                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)            AGE   SELECTOR
keda-admission-webhooks           ClusterIP   10.99.193.232   <none>        443/TCP            40d   app=keda-admission-webhooks
keda-operator                     ClusterIP   10.99.194.245   <none>        9666/TCP           40d   app=keda-operator
keda-operator-metrics-apiserver   ClusterIP   10.99.202.86    <none>        443/TCP,8080/TCP   40d   app=keda-operator-metrics-apiserver

from keda.

JorTurFer avatar JorTurFer commented on June 11, 2024

I can see the service there, so IDK why the host can't be resolved 🤔
Maybe it's something related with the DNS resolution in GKE? Could you try this curl curl keda-operator.keda-system:9666?

The self generated certificate has these configurations:

func getDNSNames(service, k8sClusterDomain string) []string {
namespace := kedautil.GetPodNamespace()
return []string{
service,
fmt.Sprintf("%s.%s", service, namespace),
fmt.Sprintf("%s.%s.svc", service, namespace),
fmt.Sprintf("%s.%s.svc.%s", service, namespace, k8sClusterDomain),
}
}

Why do I say it? Because if curl works using just service.namespace, you can override the value to use it thanks to the arg metrics-service-address, you can just set --metrics-service-address=keda-operator.keda-system:9666 in the metrics server and it will use the new host without the cluster DNS

from keda.

duydo-ct avatar duydo-ct commented on June 11, 2024

Hi @JorTurFer So I miss config clusterDomain.
I update it again like this:

# -- Kubernetes cluster domain
clusterDomain: gke1.ct.dev

Now. I checked again with curl, telnet, and nslookup

nettools:/workspace#
nettools:/workspace#
nettools:/workspace# curl keda-operator.keda-system.svc.gke1.ct.dev:9666
curl: (52) Empty reply from server
nettools:/workspace# nslookup keda-operator.keda-system.svc.gke1.ct.dev
Server:		169.254.169.254
Address:	169.254.169.254#53

Non-authoritative answer:
Name:	keda-operator.keda-system.svc.gke1.ct.dev
Address: 10.99.194.245

nettools:/workspace# telnet keda-operator.keda-system.svc.gke1.ct.dev 9666
Connected to keda-operator.keda-system.svc.gke1.ct.dev

I think DNS works. But I don't know when to curl it became empty
Any idea bro?

from keda.

JorTurFer avatar JorTurFer commented on June 11, 2024

The gke1 is missing in your previous message #5527 (comment) and I bet that it's the root cause xD

Could you try updating KEDA to set the cluster domain as gke1.ct.dev? You could have to delete the secret kedaorg-certs within keda's namespace (and restart KEDA components)

from keda.

duydo-ct avatar duydo-ct commented on June 11, 2024

Hi @JorTurFer
Nice bro. I deleted the secret kedaorg-certs and restarted deploy to check logs it okay bro

kubectl delete secret kedaorg-certs -n keda-system
secret "kedaorg-certs" deleted
  • I check logs of keda-operator-metrics-apiserver. It's Okay
I0226 15:26:59.757886 1 provider.go:81] keda_metrics_adapter/provider "msg"="KEDA Metrics Server received request for external metrics" "metric name"="s0-prometheus" "metricSelector"="scaledobject.keda.sh /name=ct-logic-uni-ad-listing-consumer" "namespace"="default"
  • But. I checked ScaledObject again and I don't see it scale. I checked the HPA again and there were no error logs. But I'm still not sure why it won't scale my deployment.
    I described ScaledObject:
π ~> kubectl describe  ScaledObject ct-logic-uni-ad-listing-consumer
Name:         ct-logic-uni-ad-listing-consumer
Namespace:    default
Labels:       app=ct-logic-uni-ad-listing-consumer
              app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: ct-logic-uni-ad-listing-consumer
              meta.helm.sh/release-namespace: default
API Version:  keda.sh/v1alpha1
Kind:         ScaledObject
Metadata:
  Creation Timestamp:  2024-02-26T15:29:23Z
  Finalizers:
    finalizer.keda.sh
  Generation:        1
  Resource Version:  827148808
  UID:               f80f6f14-3015-4e74-b0e3-69a83db30c61
Spec:
  Cooldown Period:    100
  Max Replica Count:  9
  Min Replica Count:  1
  Polling Interval:   100
  Scale Target Ref:
    API Version:  apps/v1
    Kind:         Deployment
    Name:         ct-logic-uni-ad-listing-consumer
  Triggers:
    Metadata:
      Ignore Null Values:  true
      Query:               sum(ad_listing_system_logic_priority_queue_tasks_counter{deployment="ct-logic-uni-ad-listing-metrics"}[2m])
      Server Address:      https://vmselect.domain/select/0/prometheus
      Threshold:           2
    Type:                  prometheus
Status:
  Conditions:
    Message:  ScaledObject is defined correctly and is ready for scaling
    Reason:   ScaledObjectReady
    Status:   True
    Type:     Ready
    Message:  Scaling is not performed because triggers are not active
    Reason:   ScalerNotActive
    Status:   False
    Type:     Active
    Message:  No fallbacks are active on this scaled object
    Reason:   NoFallbackFound
    Status:   False
    Type:     Fallback
    Status:   Unknown
    Type:     Paused
  External Metric Names:
    s0-prometheus
  Health:
    s0-prometheus:
      Number Of Failures:  0
      Status:              Happy
  Hpa Name:                keda-hpa-ct-logic-uni-ad-listing-consumer
  Original Replica Count:  1
  Scale Target GVKR:
    Group:            apps
    Kind:             Deployment
    Resource:         deployments
    Version:          v1
  Scale Target Kind:  apps/v1.Deployment
Events:               <none>

I described HPA:

π ~>  kubectl describe hpa keda-hpa-ct-logic-uni-ad-listing-consumer
Name:                                      keda-hpa-ct-logic-uni-ad-listing-consumer
Namespace:                                 default
Labels:                                    app=ct-logic-uni-ad-listing-consumer
                                           app.kubernetes.io/managed-by=Helm
                                           app.kubernetes.io/name=keda-hpa-ct-logic-uni-ad-listing-consumer
Annotations:                               meta.helm.sh/release-name: ct-logic-uni-ad-listing-consumer
                                           meta.helm.sh/release-namespace: default
CreationTimestamp:                         Mon, 26 Feb 2024 22:29:53 +0700
Reference:                                 Deployment/ct-logic-uni-ad-listing-consumer
Metrics:                                   ( current / target )
  "s0-prometheus" (target average value):  0 / 2
Min replicas:                              1
Max replicas:                              9
Deployment pods:                           1 current / 1 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},})
  ScalingLimited  True    TooFewReplicas    the desired replica count is less than the minimum replica count
Events:           <none>
  • My metric:
    Screenshot 2024-02-27 at 10 10 42

I'm not sure if there is any missing config. ?

from keda.

JorTurFer avatar JorTurFer commented on June 11, 2024

If we don't have the communication issues we are moving forward indeed! 😄
Could you try copycatting the exact query into your prometheus? sum(ad_listing_system_logic_priority_queue_tasks_counter{deployment="ct-logic-uni-ad-listing-metrics"}[2m])

The picture doesn't show the same query as it has no filters. The problem because I ask this is because I ignoreNullValues: true can hide querying errors converting null values into 0, which can fit in your case (you can just try removing the property temporally and if I'm right, you will see errors in KEDA opertor)

from keda.

duydo-ct avatar duydo-ct commented on June 11, 2024

Hi @JorTurFer
I checked again. so the query is incorrect bro

sum(ad_listing_system_logic_priority_queue_tasks_counter{deployment="ct-logic-uni-ad-listing-metrics"}[2m])

and I update new the query like

sum(ad_listing_system_logic_priority_queue_tasks_counter{app="ct-logic-uni-ad-listing-metrics"}[1m])

and I checked again and it worked as expected.
Events:

Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},})
  ScalingLimited  True    TooManyReplicas      the desired replica count is more than the maximum replica count
Events:
  Type    Reason             Age                  From                       Message
  ----    ------             ----                 ----                       -------
  Normal  SuccessfulRescale  16m                  horizontal-pod-autoscaler  New size: 2; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
  Normal  SuccessfulRescale  7m53s                horizontal-pod-autoscaler  New size: 12; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) below target
  Normal  SuccessfulRescale  6m20s                horizontal-pod-autoscaler  New size: 8; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) below target
  Normal  SuccessfulRescale  6m6s                 horizontal-pod-autoscaler  New size: 2; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) below target
  Normal  SuccessfulRescale  3m41s (x2 over 15m)  horizontal-pod-autoscaler  New size: 4; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
  Normal  SuccessfulRescale  3m25s (x2 over 15m)  horizontal-pod-autoscaler  New size: 8; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
  Normal  SuccessfulRescale  3m9s (x2 over 15m)   horizontal-pod-autoscaler  New size: 16; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) above target
  Normal  SuccessfulRescale  2m53s (x2 over 15m)  horizontal-pod-autoscaler  New size: 20; reason: external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: ct-logic-uni-ad-listing-consumer,},MatchExpressions:[]LabelSelectorRequirement{},}) above target

from keda.

JorTurFer avatar JorTurFer commented on June 11, 2024

nice!
I close the issue as it looks solved, let me know if there is any other issue and I'll open it again

from keda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.