Coder Social home page Coder Social logo

Comments (6)

wbrewer avatar wbrewer commented on June 11, 2024

Got some news on this: If I change the cluster role for the datadog-cluster-agent-external-metrics-reader to

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    helm.fluxcd.io/antecedent: monitoring:helmrelease/datadog
  creationTimestamp: "2020-07-30T12:47:37Z"
  labels:
    app.kubernetes.io/instance: datadog
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: datadog
    app.kubernetes.io/version: "7"
    helm.sh/chart: datadog-2.4.10
  name: datadog-cluster-agent-external-metrics-reader
  resourceVersion: "139077736"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/datadog-cluster-agent-external-metrics-reader
  uid: 3be7d96a-6f83-49d1-97f7-1b6b7c84cea1
rules:
- apiGroups:
  - external.metrics.k8s.io/v1beta1
  resources:
  - '*'
  verbs:
  - list
  - get
  - watch

The HPA logs make more sense:

Name:                                                                nginxext
Namespace:                                                           default
Labels:                                                              <none>
Annotations:                                                         CreationTimestamp:  Wed, 19 Aug 2020 15:01:34 -0400
Reference:                                                           Deployment/nginx
Metrics:                                                             ( current / target )
  "datadogmetric@nginx-demo:nginx-requests" (target average value):  <unknown> / 9
Min replicas:                                                        1
Max replicas:                                                        3
Deployment pods:                                                     1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetExternalMetric  the HPA was unable to compute the replica count: unable to get external metric secureworks/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "default"
Events:
  Type     Reason                        Age                   From                       Message
  ----     ------                        ----                  ----                       -------
  Warning  FailedComputeMetricsReplicas  23m (x12 over 26m)    horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get datadogmetric@nginx-demo:nginx-requests external metric: unable to get external metric secureworks/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "default"
  Warning  FailedGetExternalMetric       11m (x58 over 26m)    horizontal-pod-autoscaler  unable to get external metric secureworks/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "default"
  Warning  FailedComputeMetricsReplicas  7m20s (x12 over 10m)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get datadogmetric@nginx-demo:nginx-requests external metric: unable to get external metric secureworks/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "secureworks"
  Warning  FailedGetExternalMetric       11s (x39 over 10m)    horizontal-pod-autoscaler  unable to get external metric secureworks/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "default"

from helm-charts.

vboulineau avatar vboulineau commented on June 11, 2024

Hi @wbrewer,

What's the Kubernetes setup you use (custom, GKE, EKS, AKS?) and which version are you using?

from helm-charts.

wbrewer avatar wbrewer commented on June 11, 2024

We use Rancher on AWS with Kubernetes v1.17.9

from helm-charts.

vboulineau avatar vboulineau commented on June 11, 2024

Hello @wbrewer,

I tried to reproduce your issue with same setup (Rancher 2.4.5 on AWS with Kubernetes 1.17.9) with our latest chart (2.4.13, but no changes from 2.4.10 on this part) and I'm not able to reproduce this issue, autoscaling works out-of-the-box.

However, I notice that the apiGroups in your ClusterRole is not the expected one:

rules:
- apiGroups:
  - external.metrics.k8s.io/v1beta1 // Should be external.metrics.k8s.io
  resources:
  - '*'
  verbs:
  - list
  - get
  - watch

Though, I don't believe this v1beta1 is generated by our Helm chart, we generate:
https://github.com/DataDog/helm-charts/blob/datadog-2.4.10/charts/datadog/templates/hpa-external-metrics-rbac.yaml#L18

from helm-charts.

wbrewer avatar wbrewer commented on June 11, 2024

So I switched the api group like you suggested but now I get the original error:

external metrics API: datadogmetric@******:nginx-requests.external.metrics.k8s.io is forbidden: User "system:kube-proxy" cannot list resource "datadogmetric@*****:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "****"

I understand this might not be a DD Helm chart issue but any advice on what could be causing this? Why would system:kube-proxy be called at all?

from helm-charts.

vboulineau avatar vboulineau commented on June 11, 2024

It's a good question, we have no definitive answer on that but we had some report on this happening on older GKE clusters too, but they fixed it by creating the HPA ExternalMetrics RBAC OOTB now.

We were not able to pinpoint the configuration/setup option that causes this, but there is a workaround that has been working for these clusters: naming the ClusterRole/ClusterRoleBinding exactly external-metrics-reader.

Otherwise I believe that adding the system:kube-proxy user to the ClusterRoleBinding datadog-cluster-agent-external-metrics-reader should also work of course.

from helm-charts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.