Comments (6)
Got some news on this: If I change the cluster role for the datadog-cluster-agent-external-metrics-reader to
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
helm.fluxcd.io/antecedent: monitoring:helmrelease/datadog
creationTimestamp: "2020-07-30T12:47:37Z"
labels:
app.kubernetes.io/instance: datadog
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: datadog
app.kubernetes.io/version: "7"
helm.sh/chart: datadog-2.4.10
name: datadog-cluster-agent-external-metrics-reader
resourceVersion: "139077736"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/datadog-cluster-agent-external-metrics-reader
uid: 3be7d96a-6f83-49d1-97f7-1b6b7c84cea1
rules:
- apiGroups:
- external.metrics.k8s.io/v1beta1
resources:
- '*'
verbs:
- list
- get
- watch
The HPA logs make more sense:
Name: nginxext
Namespace: default
Labels: <none>
Annotations: CreationTimestamp: Wed, 19 Aug 2020 15:01:34 -0400
Reference: Deployment/nginx
Metrics: ( current / target )
"datadogmetric@nginx-demo:nginx-requests" (target average value): <unknown> / 9
Min replicas: 1
Max replicas: 3
Deployment pods: 1 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetExternalMetric the HPA was unable to compute the replica count: unable to get external metric secureworks/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "default"
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedComputeMetricsReplicas 23m (x12 over 26m) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get datadogmetric@nginx-demo:nginx-requests external metric: unable to get external metric secureworks/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "default"
Warning FailedGetExternalMetric 11m (x58 over 26m) horizontal-pod-autoscaler unable to get external metric secureworks/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "default"
Warning FailedComputeMetricsReplicas 7m20s (x12 over 10m) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get datadogmetric@nginx-demo:nginx-requests external metric: unable to get external metric secureworks/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "secureworks"
Warning FailedGetExternalMetric 11s (x39 over 10m) horizontal-pod-autoscaler unable to get external metric secureworks/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:serviceaccount:kube-system:horizontal-pod-autoscaler" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "default"
from helm-charts.
Hi @wbrewer,
What's the Kubernetes setup you use (custom, GKE, EKS, AKS?) and which version are you using?
from helm-charts.
We use Rancher on AWS with Kubernetes v1.17.9
from helm-charts.
Hello @wbrewer,
I tried to reproduce your issue with same setup (Rancher 2.4.5 on AWS with Kubernetes 1.17.9) with our latest chart (2.4.13, but no changes from 2.4.10 on this part) and I'm not able to reproduce this issue, autoscaling works out-of-the-box.
However, I notice that the apiGroups
in your ClusterRole
is not the expected one:
rules:
- apiGroups:
- external.metrics.k8s.io/v1beta1 // Should be external.metrics.k8s.io
resources:
- '*'
verbs:
- list
- get
- watch
Though, I don't believe this v1beta1
is generated by our Helm chart, we generate:
https://github.com/DataDog/helm-charts/blob/datadog-2.4.10/charts/datadog/templates/hpa-external-metrics-rbac.yaml#L18
from helm-charts.
So I switched the api group like you suggested but now I get the original error:
external metrics API: datadogmetric@******:nginx-requests.external.metrics.k8s.io is forbidden: User "system:kube-proxy" cannot list resource "datadogmetric@*****:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "****"
I understand this might not be a DD Helm chart issue but any advice on what could be causing this? Why would system:kube-proxy be called at all?
from helm-charts.
It's a good question, we have no definitive answer on that but we had some report on this happening on older GKE clusters too, but they fixed it by creating the HPA ExternalMetrics RBAC OOTB now.
We were not able to pinpoint the configuration/setup option that causes this, but there is a workaround that has been working for these clusters: naming the ClusterRole/ClusterRoleBinding exactly external-metrics-reader
.
Otherwise I believe that adding the system:kube-proxy
user to the ClusterRoleBinding datadog-cluster-agent-external-metrics-reader
should also work of course.
from helm-charts.
Related Issues (20)
- Pods with non-default serviceaccount fail to reach Datadog Admission Controller HOT 4
- DataDog violating PodSecurity Policy HOT 4
- Synthetics Private Location - helm chart example using configSecret HOT 2
- GOMEMLIMIT should be sensibly set based on the various container memory limits
- Add proper change log to github releases
- Cannot mount additional volumes to agent when deploying to GKE Autopilot HOT 1
- Datadog chart 3.51.0 broken due to missing config map HOT 7
- kpi-telemetry-configmap cause helm-diff to always show changes HOT 6
- Chart is missing support for OTLP logs
- Agent restarts after upgrade to helm 3.52.0 HOT 1
- clusterAgent initContainer resources is missing from values.yaml
- Add Helm value for seccomp-setup container securityContext
- Configure securityContext for initContainers injected by Datadog Admission Controller (such as "datadog-lib-java-init")
- Duplicate `nodeSelector` values in datadog chart with CI defaults HOT 1
- Health probe failed to restart agent pods HOT 5
- The release 3.58.0 is not yet available HOT 1
- Can't add DD_APM_MAX_TPS to cluster agent HOT 1
- Make the datadog CRD 1.5.x available with the Operator HOT 2
- cluster-agent insufficient permissions HOT 3
- Synthetics Private Location - There is no way to add labels just to the pods managed by the deployment
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from helm-charts.