Coder Social home page Coder Social logo

amazon-archives / k8s-cloudwatch-adapter Goto Github PK

View Code? Open in Web Editor NEW
158.0 15.0 98.0 8.43 MB

An implementation of Kubernetes Custom Metrics API for Amazon CloudWatch

License: Apache License 2.0

Shell 7.57% Makefile 3.07% Go 81.67% Dockerfile 1.13% Mustache 6.56%
cloudwatch-metrics kubernetes eks aws-cloudwatch

k8s-cloudwatch-adapter's Introduction

Build Status GitHub release docker image size image layers image pulls

Attention! This project has been archived and is no longer being worked on. If you are looking for a metrics server that can consume metrics from CloudWatch, please consider using the KEDA project instead. KEDA is a Kubernetes-based Event Driven Autoscaler. With KEDA, you can drive the scaling of any container in Kubernetes based on the number of events needing to be processed. For an overview of KEDA, see An overview of Kubernetes Event-Driven Autoscaling.

Kubernetes Custom Metrics Adapter for Kubernetes

An implementation of the Kubernetes Custom Metrics API and External Metrics API for AWS CloudWatch metrics.

This adapter allows you to scale your Kubernetes deployment using the Horizontal Pod Autoscaler (HPA) with metrics from AWS CloudWatch.

Prerequisites

This adapter requires the following permissions to access metric data from Amazon CloudWatch.

  • cloudwatch:GetMetricData

You can create an IAM policy using this template, and attach it to the Service Account Role if you are using IAM Roles for Service Accounts.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

Deploy

Requires a Kubernetes cluster with Metric Server deployed, Amazon EKS cluster is fine too.

Now deploy the adapter to your Kubernetes cluster:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/deploy/adapter.yaml
namespace/custom-metrics created
clusterrolebinding.rbac.authorization.k8s.io/k8s-cloudwatch-adapter:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/k8s-cloudwatch-adapter-auth-reader created
deployment.apps/k8s-cloudwatch-adapter created
clusterrolebinding.rbac.authorization.k8s.io/k8s-cloudwatch-adapter-resource-reader created
serviceaccount/k8s-cloudwatch-adapter created
service/k8s-cloudwatch-adapter created
apiservice.apiregistration.k8s.io/v1beta1.external.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/k8s-cloudwatch-adapter:external-metrics-reader created
clusterrole.rbac.authorization.k8s.io/k8s-cloudwatch-adapter-resource-reader created
clusterrolebinding.rbac.authorization.k8s.io/k8s-cloudwatch-adapter:external-metrics-reader created
customresourcedefinition.apiextensions.k8s.io/externalmetrics.metrics.aws created
clusterrole.rbac.authorization.k8s.io/k8s-cloudwatch-adapter:crd-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/k8s-cloudwatch-adapter:crd-metrics-reader created

This creates a new namespace custom-metrics and deploys the necessary ClusterRole, Service Account, Role Binding, along with the deployment of the adapter.

Alternatively the crd and adapter can be deployed using the Helm chart in the /charts directory:

$ helm install k8s-cloudwatch-adapter-crd ./charts/k8s-cloudwatch-adapter-crd
NAME: k8s-cloudwatch-adapter-crd
LAST DEPLOYED: Thu Sep 17 11:36:53 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ helm install k8s-cloudwatch-adapter ./charts/k8s-cloudwatch-adapter \
>   --namespace custom-metrics \
>   --create-namespace
NAME: k8s-cloudwatch-adapter
LAST DEPLOYED: Fri Aug 14 13:20:17 2020
NAMESPACE: custom-metrics
STATUS: deployed
REVISION: 1
TEST SUITE: None

Verifying the deployment

Next you can query the APIs to see if the adapter is deployed correctly by running:

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
  ]
}

Deploying the sample application

There is a sample SQS application provided in this repository for you to test how the adapter works. Refer to this guide.

More docs

License

This library is licensed under the Apache 2.0 License.

Issues

Report any issues in the Github Issues

k8s-cloudwatch-adapter's People

Contributors

arun-amzn avatar arunbhagyanath avatar bukashk0zzz avatar chankh avatar chaudyg avatar ellisvalentiner avatar ericlarssen-wf avatar homme avatar jicowan avatar jpeddicord avatar mattsb42-aws avatar ojima-h avatar otterley avatar shivam9268 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k8s-cloudwatch-adapter's Issues

k8s-cloudwatch-adapter to work with multiple regions

Hi I've an EKS cluster that using HPA based on external metrics (the idea to use CRD resources) similar to what described here :
https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/
The setup i own is that SQS and the EKS are in two different regions which causing problem getting queries (seems that nothing is returned). is cloudwatch-adapter region dependent? if so what is the correct way to tell hpa where is SQS located( different region) inorder to get any results for that queue. I don't see any errors but hpa 'targets' remain 0 while the queue keep accumulate messages.

Why max out desired number when exceeding the targetvalue by only 1?

My targetvalue is set at 6. If I bump up to 7 messages in queue, I get this

  "sqs-helloworld-length" (target value):  7 / 6
Min replicas:                              1
Max replicas:                              10
Deployment pods:                           10 current / 10 desired

That's not good if I set my max to 100. I might at all 100 pods for one single message increase.
If I do one lower than the targetvalue, I get this.

  "sqs-helloworld-length" (target value):  5 / 6
Min replicas:                              1
Max replicas:                              10
Deployment pods:                           5 current / 5 desired

Could someone explain why it's behaving this way?

Helm3 error

Hello.

I'm trying to use helm3 and cloudwatch-adapter and faced this an issue. When I install adapter helm stop working with such message:

Error: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server could not find the requested resource

On helm2 everything is OK.
Thanks for helping

Add labels to the metric

Is there any way to supply a label to the external metric?
We'd like to create a few metrics and then aggregate them when setting a HorizontalPodAutoscaling, as described here: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-more-specific-metrics

    name: `http_requests`
    selector: `verb=GET`

This selector uses the same syntax as the full Kubernetes label selectors. The monitoring pipeline determines how to collapse multiple series into a single value, if the name and selector match multiple series. The selector is additive,

Right now the metricsLabels returned is null:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/myapp-perf-kond/myapp-elb-5xx-requests" | jq

"metricLabels": null

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/myapp-perf-kond/myapp-elb-5xx-requests"
  },
  "items": [
    {
      "metricName": "myapp-elb-5xx-requests",
      "metricLabels": null,
      "timestamp": "2020-01-29T18:51:25Z",
      "value": "0"
    }
  ]
}

The current value is always 0

I setup the SQS based externalmetrics but No metrics coming in even there is no error in the HPA description or the cloudwatch-adapter. Have done tons of tempts of changing still no clues.

externalmetrics

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: push-ml-coordinator-queue-dev
  namespace: sn-push
spec:
  name: push-ml-coordinator-queue-dev
  queries:
    - id: sqs_sn_push_ml_coordinator_queue_dev
      metricStat:
        metric:
          dimensions:
            - name: QueueName
              value: "sn_push_user_incoming_queue_dev"
          metricName: "ApproximateNumberOfMessagesVisible"
          namespace: "AWS/SQS"
        period: 60
        stat: Average
        unit: Count
      returnData: true
  resource:
    resource: "deployment"

HPA

Labels:                                                    <none>
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: coordinator-hpa
  namespace: sn-push
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sn-push-ml-coordinator
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: External
    external:
      metricName: push-ml-coordinator-queue-dev
      targetAverageValue: 10

HPA Description

Name:                                                      coordinator-hpa
Namespace:                                                 sn-push
Labels:                                                    <none>
Annotations:                                               <none>
CreationTimestamp:                                         Fri, 11 Sep 2020 12:02:21 +0800
Reference:                                                 Deployment/sn-push-ml-coordinator
Metrics:                                                   ( current / target )
  "push-ml-coordinator-queue-dev" (target average value):  0 / 10
Min replicas:                                              1
Max replicas:                                              4
Deployment pods:                                           1 current / 1 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from external metric push-ml-coordinator-queue-dev(nil)
  ScalingLimited  True    TooFewReplicas    the desired replica count is less than the minimum replica count

Cloudwatch Adapter logs snippet

I0911 05:57:25.228094       1 handler.go:69] externalMetricInfo: &{{ } {push-ml-coordinator-queue-dev  sn-push /apis/metrics.aws/v1alpha1/namespaces/sn-push/externalmetrics/push-ml-coordinator-queue-dev fa008218-8423-41bd-a535-6168223aad9a 165916796 1 2020-09-11 04:01:27 +0000 UTC <nil> <nil> map[] map[] [] []  []} {push-ml-coordinator-queue-dev [{  sqs_sn_push_ml_coordinator_queue_dev  {{[{QueueName sn_push_user_incoming_queue_dev}] NumberOfEmptyReceives AWS/SQS} 60 Average Count} true}]}}
I0911 05:57:25.228226       1 handler.go:114] adding to cache item 'push-ml-coordinator-queue-dev' in namespace 'sn-push'
I0911 05:57:25.228230       1 controller.go:122] successfully processed item '{sn-push/push-ml-coordinator-queue-dev ExternalMetric}'

If you need any further information. Please do let me know.

Failed to pull image

I understand this may be an upstream/unrelated issue - but I consider it worth a shot, in case the community has experienced this.

After four months problem-free, through many teardowns and buildouts of our EKS clusters...
Failed to pull image "chankh/k8s-cloudwatch-adapter:v0.8.0": rpc error: code = Unknown desc = context canceled

Fresh, default EKS clusters with the latest eksctl no custom/advanced setup. I've rebuilt clean 10 times today, and no matter what I do - it fails to pull.

LoadBalancer latency always shows as "0"

My external-metrics adapter looks like this:

queries:
      - id: latency
        metricLabels: "latency"
        metricStat:
          metric:
            namespace: "AWS/ELB"
            metricName: "Latency"
            dimensions:
              - name: LoadBalancerName
                value: "ab2cc236ed3a211e98a7c122360628b8"
          period: 60
          stat: Sum
          unit: Milliseconds
        returnData: true

When I am running the query to fetch data from external-metrics-api inside my kubernetes cluster I get the following response:

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/chandra/elb-latency"
  },
  "items": [
    {
      "metricName": "elb-latency",
      "metricLabels": null,
      "timestamp": "2019-09-23T12:17:19Z",
      "value": "0"
    }
  ]
}

Whereas in cloudwatch, the datapoint is never 0

Can the adapter work with EKS Fargate type ?

I can see that the adapter using metadata to retrieve the region, I tried using the adapter with EKS FARGATE type cluster and I got this error :

get hpa sqs-consumer-scaler -o yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
annotations:
autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2020-01-27T23:43:40Z","reason":"SucceededGetScale","message":"the
HPA controller was able to get the target''s current scale"},{"type":"ScalingActive","status":"False","lastTransitionTime":"2020-01-27T23:43:40Z","reason":"FailedGetExternalMetric","message":"the
HPA was unable to compute the replica count: unable to get external metric default/sqs-helloworld-length/nil:
unable to fetch metrics from external metrics API: MissingRegion: could not
find region configuration"}]'

how can we add specific region ?

is there anything else I can do to make it work with fargate cluster

Update image source location?

Would it be possible to move the image location from: image: chankh/k8s-cloudwatch-adapter:v0.8.0 to a known AWS ECR supported one? I realize @chankh was the original creator of this repo but random image sources makes my security team cranky.

Thanks!

[question]Unable to get an externalmetric from cloudwatch, possible permissions issue

I0717 03:51:59.474073       1 request.go:947] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"externalmetrics.metrics.aws is forbidden: User \"system:serviceaccount:custom-metrics:k8s-cloudwatch-adapter\" cannot list resource \"externalmetrics\" in API group \"metrics.aws\" at the cluster scope","reason":"Forbidden","details":{"group":"metrics.aws","kind":"externalmetrics"},"code":403}
E0717 03:51:59.474831       1 reflector.go:125] github.com/awslabs/k8s-cloudwatch-adapter/pkg/client/informers/externalversions/factory.go:114: Failed to list *v1alpha1.ExternalMetric: externalmetrics.metrics.aws is forbidden: User "system:serviceaccount:custom-metrics:k8s-cloudwatch-adapter" cannot list resource "externalmetrics" in API group "metrics.aws" at the cluster scope
I0717 03:52:00.234254       1 authorization.go:73] Forbidden: "/", Reason: ""

Wondering if someone has seen the above errors, Im running this in eks 1.14, I have metrics-server running, I am not using pod level iam (for now) but all the necessary permissions are applied at node level (ec2). I deployed an external metric and created the hpa but seem to only ever get back a 0 while the metric is reporting in cloudwatch averaging between 1 and 5.

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/auth-overall-last-process" | jq .
{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/auth-overall-last-process"
  },
  "items": [
    {
      "metricName": "auth-overall-last-process",
      "metricLabels": null,
      "timestamp": "2020-07-17T04:13:19Z",
      "value": "0"
    }
  ]
}

Repeated logging of aws region

I've turned verbosity logging down to -1 as I'm only interested in fatal issues. However our logs are filling up with which aws region it is using, 26 times in 6 minutes (example below)

I believe the code is located at pkg/aws/client.go line 52.

I1112 10:12:01.536973 1 client.go:52] using AWS Region: eu-west-1
I1112 10:12:16.672508 1 client.go:52] using AWS Region: eu-west-1
I1112 10:12:32.547729 1 client.go:52] using AWS Region: eu-west-1
I1112 10:12:47.687460 1 client.go:52] using AWS Region: eu-west-1
I1112 10:13:03.575687 1 client.go:52] using AWS Region: eu-west-1
I1112 10:13:18.730768 1 client.go:52] using AWS Region: eu-west-1
I1112 10:13:34.742182 1 client.go:52] using AWS Region: eu-west-1
I1112 10:13:49.880865 1 client.go:52] using AWS Region: eu-west-1
I1112 10:14:05.774337 1 client.go:52] using AWS Region: eu-west-1
I1112 10:14:20.926628 1 client.go:52] using AWS Region: eu-west-1
I1112 10:14:36.059460 1 client.go:52] using AWS Region: eu-west-1
I1112 10:14:51.233399 1 client.go:52] using AWS Region: eu-west-1
I1112 10:15:06.373145 1 client.go:52] using AWS Region: eu-west-1
I1112 10:15:21.521138 1 client.go:52] using AWS Region: eu-west-1
I1112 10:15:36.768373 1 client.go:52] using AWS Region: eu-west-1
I1112 10:15:51.734150 1 client.go:52] using AWS Region: eu-west-1
I1112 10:16:06.709600 1 client.go:52] using AWS Region: eu-west-1
I1112 10:16:21.856930 1 client.go:52] using AWS Region: eu-west-1
I1112 10:16:36.989989 1 client.go:52] using AWS Region: eu-west-1
I1112 10:16:52.112803 1 client.go:52] using AWS Region: eu-west-1
I1112 10:17:07.250296 1 client.go:52] using AWS Region: eu-west-1
I1112 10:17:22.406827 1 client.go:52] using AWS Region: eu-west-1
I1112 10:17:37.573066 1 client.go:52] using AWS Region: eu-west-1
I1112 10:17:52.704800 1 client.go:52] using AWS Region: eu-west-1
I1112 10:18:07.833672 1 client.go:52] using AWS Region: eu-west-1
I1112 10:18:22.957724 1 client.go:52] using AWS Region: eu-west-1

How do you disable verbose logging of every request?

k8s-cloudwatch-adapter is verbosely logging potentially every request, which makes it difficult to see if there is anything of value in the logs. It's logging similar to this every few 3-5 seconds or so:

I0721 20:39:31.167602       1 round_trippers.go:438] POST https://172.20.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews 201 Created in 0 milliseconds
I0721 20:39:31.167642       1 round_trippers.go:444] Response Headers:
I0721 20:39:31.167678       1 round_trippers.go:447]     Audit-Id: 7dcda5db-675c-40c3-9b8f-eeaa1a896d09
I0721 20:39:31.167700       1 round_trippers.go:447]     Cache-Control: no-cache, private
I0721 20:39:31.167713       1 round_trippers.go:447]     Content-Type: application/json
I0721 20:39:31.167727       1 round_trippers.go:447]     Content-Length: 527
I0721 20:39:31.167774       1 round_trippers.go:447]     Date: Tue, 21 Jul 2020 20:39:31 GMT
I0721 20:39:31.167814       1 request.go:947] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/apis/external.metrics.k8s.io/v1beta1","verb":"get"},"user":"system:serviceaccount:monitoring:cloudwatch-adapter","group":["system:serviceaccounts","system:serviceaccounts:monitoring","system:authenticated"]},"status":{"allowed":true,"reason":"RBAC: allowed by ClusterRoleBinding \"system:discovery\" of ClusterRole \"system:discovery\" to Group \"system:authenticated\""}}
I0721 20:39:31.167927       1 handler.go:143] cloudwatch-metrics-adapter: GET "/apis/external.metrics.k8s.io/v1beta1" satisfied by gorestful with webservice /apis/external.metrics.k8s.io/v1beta1
I0721 20:39:31.168033       1 wrap.go:47] GET /apis/external.metrics.k8s.io/v1beta1?timeout=32s: (2.33753ms) 200 [adapter/v0.0.0 (linux/amd64) kubernetes/$Format 10.30.36.223:36318]
I0721 20:39:31.880655       1 handler.go:143] cloudwatch-metrics-adapter: GET "/apis/external.metrics.k8s.io/v1beta1" satisfied by gorestful with webservice /apis/external.metrics.k8s.io/v1beta1
I0721 20:39:31.881368       1 wrap.go:47] GET /apis/external.metrics.k8s.io/v1beta1: (992.933ยตs) 200 [Go-http-client/2.0 10.30.36.223:36284]
I0721 20:39:31.882762       1 handler.go:143] cloudwatch-metrics-adapter: GET "/apis/external.metrics.k8s.io/v1beta1" satisfied by gorestful with webservice /apis/external.metrics.k8s.io/v1beta1
I0721 20:39:31.882889       1 wrap.go:47] GET /apis/external.metrics.k8s.io/v1beta1: (317.823ยตs) 200 [Go-http-client/2.0 10.30.84.162:40412]
I0721 20:39:31.884693       1 handler.go:143] cloudwatch-metrics-adapter: GET "/apis/external.metrics.k8s.io/v1beta1" satisfied by gorestful with webservice /apis/external.metrics.k8s.io/v1beta1
I0721 20:39:31.884773       1 wrap.go:47] GET /apis/external.metrics.k8s.io/v1beta1: (966.04ยตs) 200 [Go-http-client/2.0 10.30.84.162:40412]
I0721 20:39:31.885097       1 handler.go:143] cloudwatch-metrics-adapter: GET "/apis/external.metrics.k8s.io/v1beta1" satisfied by gorestful with webservice /apis/external.metrics.k8s.io/v1beta1
I0721 20:39:31.885173       1 wrap.go:47] GET /apis/external.metrics.k8s.io/v1beta1: (293.244ยตs) 200 [Go-http-client/2.0 10.30.84.162:40412]

The only setting I could find was this:

https://github.com/awslabs/k8s-cloudwatch-adapter/blob/92c2e89d5ff86af85197badf86a8bf4613ff2141/pkg/aws/client.go#L27-L29

and I don't have DEBUG defined (I tried setting it to "false" just in case; no change).

I was able to get rid of these logs by setting --logtostderr=false in the container args, however the container then doesn't log anything at all: it sounds like this option possibly turns off logs altogether.

Is there a way to turn off the verbose logging of requests, but still log warnings and/or errors?

IAM Service Accounts permission denied

Followed this guide and deployed this yaml

The only difference from the yaml above, is I removed the creation of the ServiceAccount and used a service account that was created as an EKS IAM service account(via eksctl).

All ClusterRoleBinding and RoleBindings remained and bound to the eksctl generated service account name.

Attached policy is below:

    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "cloudwatch:GetMetricData",
                "cloudwatch:GetMetricStatistics",
                "cloudwatch:ListMetrics"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

The container logs show permissions denied errors.

I0228 22:36:09.853711       1 provider_external.go:18] Received request for namespace: default, metric name: event-consumer-queue-length, metric selectors:
E0228 22:36:09.853815       1 client.go:102] err: WebIdentityErr: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token
caused by: open /var/run/secrets/eks.amazonaws.com/serviceaccount/token: permission denied
E0228 22:36:09.853830       1 provider_external.go:32] bad request: WebIdentityErr: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token
caused by: open /var/run/secrets/eks.amazonaws.com/serviceaccount/token: permission denied
I0228 22:36:09.853911       1 wrap.go:47] GET /apis/external.metrics.k8s.io/v1beta1/namespaces/default/event-consumer-queue-length: (1.551424ms) 400 [kube-controller-manager/v1.14.9 (linux/amd64) kubernetes/c0eccca/system:serviceaccount:kube-system:horizontal-pod-autoscaler 10.252.90.26:48954]

This is leading me to think IAM service accounts are not supported. Is this the case?

Doesn't work with multi-dimensions metrics

Hi there,

On CloudWatch we had a metric named queuedepth with dimensions env, app and queue on the namespace Sidekiq. env described the environment like staging, production and development. app have the application name. And queue the queue from which this data came from. We have a Lambda that we use to gather the data and send it to CloudWatch.

If we try to use this metric as an ExternalMetric, as in the example below, it doesn't work.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: queue-depth
spec:
  name: queue-depth
  resource:
    resource: "deployment"
  queries:
    - id: queue_depth
      metricStat:
        metric:
          namespace: "Sidekiq"
          metricName: "queuedepth"
          dimensions:
            - name: env
              value: staging
            - name: app
              value: appname
            - name: queue
              value: queuename
        period: 60
        stat: Average
        unit: Count
      returnData: true
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa
spec:
  minReplicas: 1
  maxReplicas: 5
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: appname
  metrics:
  - type: External
    external:
      metric:
        name: queue-depth
        selector:
          matchLabels:
            env: staging
            app: appname
            queue: queuename
      target:
        type: Value
        value: 40

If we kubectl logs -f the cloudwatch adapter pod we can see that it cannot find the metric :/

To make it work, we had to change our Lambda to create another metric (depth) with a single dimension (queue).

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: queue-depth
spec:
  name: queue-depth
  resource:
    resource: "deployment"
  queries:
    - id: queue_depth
      metricStat:
        metric:
          namespace: "StagingSidekiq"
          metricName: "depth"
          dimensions:
            - name: queue
              value: queuename
        period: 60
        stat: Average
        unit: Count
      returnData: true

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa
spec:
  minReplicas: 1
  maxReplicas: 5
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: appname
  metrics:
  - type: External
    external:
      metric:
        name: queue-depth
      target:
        type: AverageValue
        averageValue: 40

And, as soon as we applied this new configuration, the metrics were fetched and the HPA began scaling immediately.

Is this expected? As we had dimensions in plural and accepting a list, we thought that we could use multi-dimension metrics. Also, we realized that in all the examples only single-dimension metrics are being used.

Our cluster is on EKS 1.14 and using chankh/k8s-cloudwatch-adapter:v0.6.0.

Thanks

Understanding TARGETS

Hello,

Great job with this adapter! I've started playing with it and it's great; however, I don't get how should I interpret the TARGET column when doing a kubectl get hpa/...:

TARGETS
6262500m/30 (avg)

Would love some insights so I can understand this better. Thanks!

Many failures related to external metrics at kube-apiserver logs.

Hi I recently have seen that we are getting a lot of logs related to external metrics on our EKS cluster.

kube-apiserver logst

Cluster version : 1.15
Adapter version : 0.8.0 used latest deployment.
Checked api services we have no v1beta1.custom.metrics.k8s.io there , since I have seen it should be there anymore.

What can be the cause for this issues?

Feature Request: Provide ECR Hosted Image

Our clusters make use of Gatekeeper and OPA to prevent deployment of pods making use of non-ECR hosted images.

Currently the k8s-cloudwatch-adapter image is only provided via Dockerhub, meaning that we have to mirror it to our internal ECR registry every time a new tag is published before we can make use of it.

It would be great and save us this toil if instead this could also be hosted on an AWS ECR registry for users to use.

Cannot see metrics in AWS cloudwatch, Desired value is always 0.

Also posted on SO : https://stackoverflow.com/questions/64087291/kubernetes-hpa-hpa-not-showing-information-from-external-metrics

Hello friends,

I am working on setting up memory based metrics, as our app is memory intensive. I have defined a Custom/ContainerInsights namespace, but I still cannot see my metrics in cloudwatch and hpa always shows current value to 0. Only thing I changed since my trial is to migrate hpa from v1beta1 to v2beta2...no luck. What am I missing?

metrics-file :


apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: magentomemory
spec:
  name: magentomemory
  resource:
    resource: "deployment"
  queries:
    - id: magentomemory
      metricStat:
        metric:
          namespace: "Custom/ContainerInsights". ----- > tried only with ContainerInsights as well
          metricName: "pod_memory_utilization"
          dimensions:
            - name: PodName
              value: "magento-prod-deployment"
            - name: ClusterName
              value: "prod-eks"
            - name: Namespace
              value: "default"
        period: 10
        stat: Average
        unit: Percent
      returnData: true

HPA file :


apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: resize-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: magento-prod-deployment
  minReplicas: 2
  maxReplicas: 5
  metrics:
  - type: External
    external:
      metricName: magentomemory
      targetValue: 60


kubectl describe externalmetrics magentomemory


kubectl describe externalmetric magentomemory
Name:         magentomemory
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  metrics.aws/v1alpha1
Kind:         ExternalMetric
Metadata:
  Creation Timestamp:  2020-09-27T10:03:12Z
  Generation:          1
  Resource Version:    21363574
  Self Link:           /apis/metrics.aws/v1alpha1/namespaces/default/externalmetrics/magentomemory
  UID:                 a025e5eb-633e-41b7-a04c-cdeabb1fd90a
Spec:
  Name:  magentomemory
  Queries:
    Id:  magentomemory
    Metric Stat:
      Metric:
        Dimensions:
          Name:       PodName
          Value:      magento-prod-deployment
          Name:       ClusterName
          Value:      prod-eks
          Name:       Namespace
          Value:      default
        Metric Name:  pod_memory_utilization
        Namespace:    ContainerInsights
      Period:         10
      Stat:           Average
      Unit:           Percent
    Return Data:      true
  Resource:
    Resource:  deployment
Events:        <none>

In the file below, the current value is always at 60.


kubectl describe hpa resize-hpa
Name:                              resize-hpa
Namespace:                         default
Labels:                            <none>
Annotations:                       <none>
CreationTimestamp:                 Sun, 27 Sep 2020 10:03:34 +0000
Reference:                         Deployment/magento-prod-deployment
Metrics:                           ( current / target )
  "magentomemory" (target value):  0 / 60
Min replicas:                      2
Max replicas:                      5
Deployment pods:                   2 current / 2 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from external metric magentomemory(nil)
  ScalingLimited  True    TooFewReplicas    the desired replica count is increasing faster than the maximum scale rate
Events:           <none>

Please note, I don't have containerinsights installed, I only installed k8s adapter. I can see it working, logs :


I0927 15:31:26.606499       1 handler.go:67] externalMetricInfo: &{{ } {magentomemory  default /apis/metrics.aws/v1alpha1/namespaces/default/externalmetrics/magentomemory ef450c81-2c82-434f-a3c5-202d21f1f012 21408307 1 2020-09-27 14:55:51 +0000 UTC <nil>  map[] map[] [] []  []} {magentomemory   [{ magentomemory  {{[{PodName magento-prod-deployment} {ClusterName prod-eks} {Namespace default}] pod_memory_utilization Custom/ContainerInsights} 10 Average Percent} 0xc0006cf7fe}]}}
I0927 15:31:26.606598       1 handler.go:68] adding to cache item 'magentomemory' in namespace 'default'
I0927 15:31:26.606608       1 controller.go:122] successfully processed item '{default/magentomemory ExternalMetric}'
I0927 15:31:26.606617       1 controller.go:79] processing next item
I0927 15:31:26.606623       1 controller.go:86] processing item
I0927 15:31:33.009438       1 provider_external.go:18] Received request for namespace: default, metric name: magentomemory, metric selectors: 
I0927 15:31:33.009566       1 client.go:52] using AWS Region: eu-central-1
I0927 15:31:36.158493       1 trace.go:116] Trace[544514530]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/magentomemory,user-agent:kube-controller-manager/v1.15.11 (linux/amd64) kubernetes/065dcec/system:serviceaccount:kube-system:horizontal-pod-autoscaler,client:10.0.188.152 (started: 2020-09-27 15:31:33.009419307 +0000 UTC m=+21636.550396207) (total time: 3.149047824s):
Trace[544514530]: [3.148993074s] [3.148984148s] Listing from storage done
I0927 15:31:48.462973       1 provider_external.go:18] Received request for namespace: default, metric name: magentomemory, metric selectors: 
I0927 15:31:48.463088       1 client.go:52] using AWS Region: eu-central-1
I0927 15:31:51.615349       1 trace.go:116] Trace[335835293]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/default/magentomemory,user-agent:kube-controller-manager/v1.15.11 (linux/amd64) kubernetes/065dcec/system:serviceaccount:kube-system:horizontal-pod-autoscaler,client:10.0.188.152 (started: 2020-09-27 15:31:48.462954873 +0000 UTC m=+21652.003931790) (total time: 3.152369505s):
Trace[335835293]: [3.152318739s] [3.15231121s] Listing from storage done

Adapter authorization issue

When I check the logs with kubectl -n custom-metrics logs -l=app=k8s-cloudwatch-adapter, the following error message is firing off repeatedly

E0804 16:13:31.877587       1 reflector.go:153] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:209: Failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:custom-metrics:k8s-cloudwatch-adapter" cannot list resource "configmaps" in API group "" in the namespace "kube-system"

The parameter MetricDataQueries is required

Hi,

I have been following the post here: https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/ to try to autoscale based on a queue in sqs.

I have applied exactly the elements in this file: https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/deploy/adapter.yaml (exactly as they are defined) in my cluster.

I then applied the metric definition:

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: sqs-my-list-of-elements-queue-length
  namespace: unicorn
spec:
  name: sqs-my-list-of-elements-queue-length
  resource:
    resource: "deployment"
    queries:
      - id: sqs_my_list_of_elements_queue_length
        expression:
        metricStat:
          metric:
            namespace: "AWS/SQS"
            metricName: "ApproximateNumberOfMessagesVisible"
            dimensions:
              - name: QueueName
                value: "my-list-of-elements"
          period: 300
          stat: Average
          unit: Count
        returnData: true

Then the autoscaling element:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: my-worker-listening-to-my-queue-scaler
  namespace: unicorn
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-worker-listening-to-my-queue
  minReplicas: 5
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: sqs-my-list-of-elements-queue-length
      targetValue: 5

The only 2 differences with the original configuration are:

  • my metric & autoscaling are in a specific namespace
  • the value of spec.scaleTargetRef.apiVersion was apps/v1beta1, but as I was using the deployment of the apiVersion apps/v1 for "my-worker-listening-to-my-queue", so I changed it.

The problem is that I am getting those errors from the autoscaling when I check why it's not autoscaling:

$kubectl describe hpa my-worker-listening-to-my-queue-scaler --namespace unicorn

Name:                                                              my-worker-listening-to-my-queue-scaler
Namespace:                                                         unicorn
Labels:                                                            <none>
Annotations:                                                       kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"my-worker-listening-to-my-queue-scaler","namespace...
CreationTimestamp:                                                 Fri, 05 Jul 2019 14:50:37 +0100
Reference:                                                         Deployment/my-worker-listening-to-my-queue
Metrics:                                                           ( current / target )
  "sqs-my-list-of-elements-queue-length" (target value):  <unknown> / 5
Min replicas:                                                      5
Max replicas:                                                      10
Deployment pods:                                                   5 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetExternalMetric  the HPA was unable to compute the replica count: unable to get external metric unicorn/sqs-my-list-of-elements-queue-length/nil: unable to fetch metrics from external metrics API: ValidationError: The parameter MetricDataQueries is required.

  status code: 400, request id: xxx

I tried to see if I could get more information from the pod log in the custom-metrics namespace, but it didn't seem so:

 1 round_trippers.go:447]     Date: Fri, 05 Jul 2019 13:51:37 GMT
I0705 13:51:37.407653       1 request.go:942] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"resourceAttributes":{"namespace":"unicorn","verb":"list","group":"external.metrics.k8s.io","version":"v1beta1","resource":"sqs-my-list-of-elements-queue-length"},"user":"system:serviceaccount:kube-system:horizontal-pod-autoscaler","group":["system:serviceaccounts","system:serviceaccounts:kube-system","system:authenticated"]},"status":{"allowed":true,"reason":"RBAC: allowed by ClusterRoleBinding \"k8s-cloudwatch-adapter:external-metrics-reader\" of ClusterRole \"k8s-cloudwatch-adapter:external-metrics-reader\" to ServiceAccount \"horizontal-pod-autoscaler/kube-system\""}}
I0705 13:51:37.407729       1 handler.go:143] cloudwatch-metrics-adapter: GET "/apis/external.metrics.k8s.io/v1beta1/namespaces/unicorn/sqs-my-list-of-elements-queue-length" satisfied by gorestful with webservice /apis/external.metrics.k8s.io/v1beta1
I0705 13:51:37.407763       1 provider_external.go:17] Received request for namespace: unicorn, metric name: sqs-my-list-of-elements-queue-length, metric selectors: 
E0705 13:51:37.435663       1 client.go:98] err: ValidationError: The parameter MetricDataQueries is required.

	status code: 400, request id: xxx
E0705 13:51:37.435684       1 provider_external.go:31] bad request: ValidationError: The parameter MetricDataQueries is required.

I am not sure if I made a mistake or if it's something else, would you be able to help me, please?

Thank you,

k8s-cloudwatch adapter no longer an implementor of custom.metrics API?

I noticed that in the readme, it shows some output from a deployment:

apiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io created
apiservice.apiregistration.k8s.io/v1beta1.external.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/k8s-cloudwatch-adapter:custom-metrics-reader created
clusterrole.rbac.authorization.k8s.io/k8s-cloudwatch-adapter:external-metrics-reader created
clusterrole.rbac.authorization.k8s.io/k8s-cloudwatch-adapter-resource-reader created
clusterrolebinding.rbac.authorization.k8s.io/k8s-cloudwatch-adapter:custom-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/k8s-cloudwatch-adapter:external-metrics-reader created

but the manifests here no longer deploy the APIService for custom.metrics:

https://github.com/awslabs/k8s-cloudwatch-adapter/blob/master/deploy/adapter.yaml

At one point in the past, maybe it did. Is it safe to assume that the README can be updated, and additionally, if one deployed an old version of the k8s-cloudwatch-adapter and has since updated it, that this APIService should be deleted?

kubectl delete apiservice v1beta1.custom.metrics.k8s.io

Error with permissions

kubectl get hpa
sqs-consumer-scaler Deployment/<> /30 (avg) 1 10 1 4h

Warning   FailedComputeMetricsReplicas   horizontal-pod-autoscaler                               failed to get <<name>> external metric: unable to get external metric default/<<name>>/nil: unable to fetch metrics from external metrics API: <<name>>.external.metrics.k8s.io is forbidden: User "system:anonymous" cannot list resource "<<resource>>" in API group "external.metrics.k8s.io" in the namespace "default"

lots of forbidden requests

I0410 20:33:55.393355       1 authorization.go:73] Forbidden: "/", Reason: ""
I0410 20:33:55.393391       1 wrap.go:47] GET /: (80.215ยตs) 403 [Go-http-client/2.0 10.72.160.100:46374]

I'm seeing lots of log messages like this in one of my AWS EKS 1.14 Clusters

But I'm not sure where this is coming from. My manifests mirror that in this repo, with the addition of a kube2iam annotation.

ExternalMetric reports incorrect value

We have been noticing inconsistencies between the metric value reported by the HPA and the metric value reported from CW. We are struggling to scale our system to keep up with a work queue and would appreciate some clarity.

I have the following setup for a custom metric that is posted to CW every 15 minutes. It is in OUR/NAMESPACE, has a single dimension QUEUE and is named QUEUE_SIZE.

ExternalMetric

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: <replace>-queue-length
spec:
  name: <replace>-queue-length
  resource:
    resource: "deployment"
  queries:
    - id: <replace>
      metricStat:
        metric:
          namespace: "OUR/NAMESPACE"
          metricName: "QUEUE_SIZE"
          dimensions:
            - name: QUEUE
              value: "<replace>"
        period: 1800
        stat: Average
        unit: Count
      returnData: true

HPA

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: <replace>-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: our-deployment
  minReplicas: 1
  maxReplicas: 200
  metrics:
    - type: External
      external:
        metricName: <replace>-queue-length
        targetAverageValue: 10

We run the CW query directly as suggested in another issue

aws cloudwatch get-metric-statistics --metric-name QUEUE_SIZE --start-time 2020-09-13T07:30:00z --end-time 2020-09-13T08:20:00z --period=1800 --namespace OUR/NAMESPACE --statistics Average --dimensions Name=QUEUE,Value=<replace> --unit Count
{
    "Label": "QUEUE_SIZE",
    "Datapoints": [
        {
            "Timestamp": "2020-09-13T07:30:00Z",
            "Average": 381.8333333333333,
            "Unit": "Count"
        }
    ]
}

We inspect the HPA and see the following

Name:                                                 <replace>-scaler
Namespace:                                            web
Labels:                                               <none>
Annotations:                                          <none>
CreationTimestamp:                                    Sun, 13 Sep 2020 00:49:13 -0700
Reference:                                            Deployment/our-deployment
Metrics:                                              ( current / target )
  "<replace>-queue-length" (target average value):  10778m / 10
Min replicas:                                         1
Max replicas:                                         200
Deployment pods:                                      36 current / 36 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from external metric <replace>-queue-length(nil)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range

The reported value appears to be nowhere close to the true value in CW. We follow the logs in the metrics adapter and it claims to successfully capture and report the external metric.

We would appreciate any tips to help us have the correct metric value supplied to the HPA. Thanks!

CurrentAverageValue isn't an integer & CurrentValue is 0

The issue we are experiencing is that CW adapter is able to read from Cloudwatch(it appears, no auth errors anywhere) but we are getting currentValue of 0 and a currentAverageValue way too large and alphanumeric like 18856m.

It is using IAM service accounts for EKS.

HPA live annotations:

autoscaling.alpha.kubernetes.io/current-metrics: >-[{"type":"External","external":{"metricName":"REPLACE-queue-length","currentValue":"0","currentAverageValue":"18556m"}}]
autoscaling.alpha.kubernetes.io/metrics: >-[{"type":"External","external":{"metricName":"REPLACE-queue-length","targetAverageValue":"40"}}]

Here are my yaml definitions:

---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: REPLACE
  labels:
    version: REPLACE
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: REPLACE
  minReplicas: 2
  maxReplicas: 1024
  metrics:
  - type: External
    external:
      metricName: REPLACE-queue-length
      targetAverageValue: 40
---
apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: REPLACE-queue-length
spec:
  name: REPLACE-queue-length
  resource:
    resource: "deployment"
  queries:
  - id: sqs_REPLACE_files
    metricStat:
      metric:
        namespace: "AWS/SQS"
        metricName: "ApproximateNumberOfMessagesVisible"
        dimensions:
          - name: QueueName
            value: REPLACE
      period: 60
      stat: Average
      unit: Count
    returnData: true

Here is cloudwatch adapter manifest:

---
apiVersion: v1
kind: Namespace
metadata:
  name: custom-metrics
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: k8s-cloudwatch-adapter:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: k8s-cloudwatch-adapter-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: k8s-cloudwatch-adapter
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
spec:
  replicas: 1
  selector:
    matchLabels:
      app: k8s-cloudwatch-adapter
  template:
    metadata:
      labels:
        app: k8s-cloudwatch-adapter
      name: k8s-cloudwatch-adapter
    spec:
      securityContext:
        fsGroup: 65534
      serviceAccountName: k8s-cloudwatch-adapter
      containers:
      - name: k8s-cloudwatch-adapter
        env:
        - name: AWS_DEFAULT_REGION
          value: REPLACE
        image: chankh/k8s-cloudwatch-adapter:v0.8.0
        imagePullPolicy: "Always"
        args:
        - /adapter
        - --cert-dir=/tmp
        - --secure-port=6443
        - --logtostderr=true
        - --v=10
        ports:
        - containerPort: 6443
          name: https
        - containerPort: 8080
          name: http
        volumeMounts:
        - mountPath: /tmp
          name: temp-vol
      volumes:
      - name: temp-vol
        emptyDir: {}
      - name: token-vol
        projected:
          sources:
          - serviceAccountToken:
              path: token
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: k8s-cloudwatch-adapter-resource-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: k8s-cloudwatch-adapter-resource-reader
subjects:
- kind: ServiceAccount
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
---
kind: ServiceAccount
apiVersion: v1
metadata:
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
---
apiVersion: v1
kind: Service
metadata:
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
spec:
  ports:
  - name: https
    port: 443
    targetPort: 6443
  - name: http
    port: 80
    targetPort: 8080
  selector:
    app: k8s-cloudwatch-adapter
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.external.metrics.k8s.io
spec:
  service:
    name: k8s-cloudwatch-adapter
    namespace: custom-metrics
  group: external.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: k8s-cloudwatch-adapter:external-metrics-reader
rules:
- apiGroups:
  - external.metrics.k8s.io
  resources: ["*"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: k8s-cloudwatch-adapter-resource-reader
rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  - pods
  - services
  verbs:
  - get
  - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: k8s-cloudwatch-adapter:external-metrics-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: k8s-cloudwatch-adapter:external-metrics-reader
subjects:
- kind: ServiceAccount
  name: horizontal-pod-autoscaler
  namespace: kube-system
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: externalmetrics.metrics.aws
spec:
  group: metrics.aws
  version: v1alpha1
  names:
    kind: ExternalMetric
    plural: externalmetrics
    singular: externalmetric
  scope: Namespaced
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: k8s-cloudwatch-adapter:crd-metrics-reader
  labels:
    app: k8s-cloudwatch-adapter
rules:
- apiGroups:
  - metrics.aws
  resources:
  - "externalmetrics"
  verbs:
  - list
  - get
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: k8s-cloudwatch-adapter:crd-metrics-reader
  labels:
    app: k8s-cloudwatch-adapter
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: k8s-cloudwatch-adapter:crd-metrics-reader
subjects:
  - name: k8s-cloudwatch-adapter
    namespace: "custom-metrics"
    kind: ServiceAccount

Service account definition with EKSCTL:

    - metadata:
        name: k8s-cloudwatch-adapter
        namespace: custom-metrics
        labels: {aws-usage: "cluster-ops"}
      attachPolicy:
        Version: "2012-10-17"
        Statement:
        - Effect: Allow
          Action:
          - "cloudwatch:GetMetricData"
          - "cloudwatch:GetMetricStatistics"
          - "cloudwatch:ListMetrics"
          Resource: '*'

Not refreshing security token

After a few hours of running, the token expires, but it is not refreshed. I am setting the credentials via AWS_SHARED_CREDENTIALS_FILE environment variable.

E0507 16:17:46.926686       1 client.go:102] err: ExpiredToken: The security token included in the request is expired
	status code: 403, request id: d9a952e6-91dc-461c-b9b1-5182a45e304f
E0507 16:17:46.926709       1 provider_external.go:32] bad request: ExpiredToken: The security token included in the request is expired
	status code: 403, request id: d9a952e6-91dc-461c-b9b1-5182a45e304f

I would expect the adapter to handle the expired token by renewing it.
I'm running chankh/k8s-cloudwatch-adapter:v0.8.0

Multiple Replicas?

Can this be deployed with > 1 replica or does this need to be a "singleton"?

If so, what do you suggest for attempting HA?

Helm chart

Is a helm chart available, to allow configuration of some of the choices made in deploy/adapter.yaml.

Also, the install instructions don't make any use of deploy/crd.yaml, which is another thing that helm chart could handle.

SQS Metrics Appear to be Multiplied by 1000

Hi there,

I'm trying to use the ApproximateNumberOfMessagesVisible metric from an SQS queue, and for some reason the current value is being reported as the actual number * 1000. As a workaround, I've just multiplied the target value by 1000 as well, but this is unintuitive.

For reference, here are my ExternalMetric and HorizontalPodAutoscaler objects:

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: test-sqs-length
spec:
  name: test-sqs-length
  resource:
    resource: "deployment"
  queries:
    - id: sqs_test_length
      metricStat:
        metric:
          namespace: "AWS/SQS"
          metricName: "ApproximateNumberOfMessagesVisible"
          dimensions:
            - name: QueueName
              value: "test"
        period: 300
        stat: Average
        unit: Count
      returnData: true
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: test
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: test
  minReplicas: 2
  maxReplicas: 50
  metrics:
  - type: External
    external:
      metricName: test-sqs-length
      targetValue: 1000000

In this case, I'm actually aiming for a targetValue of 1000. I'm not sure if this is a bug or something I'm doing wrong, but I'd appreciate some assistance either way!

The math is not working

The following configuration returns 0 while each of metric returns valid values:

kind: ExternalMetric
metadata:
  name: elb-request-count-per-instance
spec:
  name: elb-request-count-per-instance
  resource:
    resource: "deployment"
  queries:
    - id: elb_request_count_per_instance
      expression: "elb_external_request_count / elb_external_healthy_host_count"
    - id: elb_external_request_count
      metricLabels: "elb_external_request_count"
      metricStat:
        metric:
          namespace: "AWS/ELB"
          metricName: "RequestCount"
          dimensions:
            - name: LoadBalancerName
              value: "my-load-balancer"
        period: 60
        stat: Sum
        unit: Count
      returnData: false
    - id: elb_external_healthy_host_count
      metricStat:
        metric:
          namespace: "AWS/ELB"
          metricName: "HealthyHostCount"
          dimensions:
            - name: LoadBalancerName
              value: "my-load-balancer"
        period: 60
        stat: Average
        unit: Count
      returnData: false
$ kubectl get --raw  "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/elb-request-count-per-instance" | jq .
{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/elb-request-count-per-instance"
  },
  "items": [
    {
      "metricName": "elb-request-count-per-instance",
      "metricLabels": null,
      "timestamp": "2019-07-29T14:32:46Z",
      "value": "0"
    }
  ]
}

Is it possible to work with multiple adapters?

Is there a way to work with more than a single adapter? For example, using both k8s-cloudwatch-adapter and prometheus-adapter. Once i tried to implemented both, only the latter was exposing metrics to /apis/external.metrics.k8s.io/v1beta1

Thanks

Please update the charts in helm hub

Hello! It looks like the charts in the helm hub have not been updated to match what's in master. Specifically this commit (and probably a few others) aren't present in the helm chart releases.

Can you please update the charts in the helm hub to match the latest fixes in master? Otherwise we can't set the IAM role for the service account, and possibly other fixes are missing.

cannot get queue: AWS.SimpleQueueService.NonExistentQueue

I trying to use this project to migrate my applications from EC2/AutoScalling to EKS/K8s, because on AutoScalling i use SQS Metrics to scale my instances.

I tried to follow all README, but i really didn't know what todo on Prerequsites, my kubectl is configured to my main account on aws that have all permissions to make all stuff and i have the sqs queue created.

But when i get logs from pods, what should i do to make this work?

I find i really bit confused all this permissions that i have to configured just to use a simple k8 cluster. :(

Thanks!!

using AWS Region: us-east-1
listening to queue: helloworld
cannot get queue: AWS.SimpleQueueService.NonExistentQueue: The specified queue does not exist or you do not have access to it.
status code: 400, request id: 8be62292-3a6a-5227-bc25-77cd06f9e3c9
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7582c6]

goroutine 1 [running]:
main.main()
/go/src/consumer/main.go:57 +0x4e6

Pulling in metrics from multiple AWS accounts

Is it possible to pull in metrics from multiple AWS accounts? I've tried running multiple copies (using different names) connecting to different AWS accounts, but the last one that starts takes over the external metrics API, so the metrics from the other account are no longer available.

AWS ELB CloudWatch metrics - ELB name settings

Let's have a k8s service of LoadBalancer type running on k8s cluster on AWS. It's name can be my-nginx. Now, I would like to define an external metric based on some of the load balancer CloudWatch metrics, e.g. SurgeQueueLength, RequestCount etc. In order to get this working I need to hardcode the load balancer name in the resource configuration:

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: elb-my-nginx-requests
spec:
  name: elb-my-nginx-requests
  resource:
    resource: "deployment"
  queries:
    - id: elb_my_nginx_requests
      metricStat:
        metric:
          namespace: "AWS/ELB"
          metricName: "RequestCount"
          dimensions:
            - name: LoadBalancerName
              value: "xyzxyzxyzxyz"
        period: 60
        stat: Sum
        unit: Count
      returnData: true

k8s service:

NAME         TYPE           CLUSTER-IP     EXTERNAL-IP                                                               PORT(S)        AGE
my-nginx     LoadBalancer   10.100.32.68   xyzxyzxyzxyz.us-west-2.elb.amazonaws.com   80:30099/TCP   85m

Would it be possible to get the name dynamically from the service itself?

External metrics API: ValidationError

$ kubectl describe hpa
Name: sqs-consumer-scaler
Namespace: default
Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"sqs-consumer-scaler","namespace":"default"},"...
CreationTimestamp: Sat, 18 May 2019 14:27:20 +0530
Reference: Deployment/sqs-consumer
Metrics: ( current / target )
"sqs-helloworld-length" (target value): / 30
Min replicas: 1
Max replicas: 10
Conditions:
Type Status Reason Message


AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetExternalMetric the HPA was unable to compute the replica count: unable to get external metric default/sqs-helloworld-length/nil: unable to fetch metrics from external metrics API: ValidationError: The value sqs-helloworld-length for parameter MetricDataQueries.member.1.Id is not matching the expected pattern ^[a-z][a-zA-Z0-9_]*$.

status code: 400, request id: fae93972-794a-11e9-a10c-29b005d2c12b**
Events:
Type Reason Age From Message


Warning FailedGetExternalMetric 5s horizontal-pod-autoscaler unable to get external metric default/sqs-helloworld-length/nil: unable to fetch metrics from external metrics API: ValidationError: The value sqs-helloworld-length for parameter MetricDataQueries.member.1.Id is not matching the expected pattern ^[a-z][a-zA-Z0-9_]*$.

From time to time metrics will be exposed with a different type

Hello,

I get this weird value from time to time for the result of my external metric. This is supposed to be a metric of type "Count" but for some reason the adapter thinks it's type "m".

Is there a way for me to test/see how the adapter reads the metric from cloudwatch? I'm not sure where these values come from

I'm expecting the result to come out like this:

mock-external-metric-hpa   Deployment/sqs-consumer   120/100 (avg)   1         5         4          8s

But from time to time the hpa reads values like this:

mock-external-metric-hpa   Deployment/sqs-consumer   92750m/100 (avg)   1         5         4          8s

external metric object

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: mock-queuesize-length
spec:
  name: mock-queuesize-length
  resource:
    resource: "deployment"
  queries:
    - id: mock_queuesize_length
      metricStat:
        metric:
          namespace: "CoolApp"
          metricName: "mockQueueName"
          dimensions:
            - name: QueueName2
              value: "helloworld"
        period: 300
        stat: Average
        unit: Count
      returnData: true

hpa

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: mock-external-metric-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: sqs-consumer
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: External
    external:
      metricName: mock-queuesize-length
      targetAverageValue: 100

Current Value is 0

I am getting the current value as 0.

Metrics: ( current / target )
"app-memory" (target value): 0 / 80
Min replicas: 1
Max replicas: 5
Deployment pods: 1 current / 1 desired

But in metrics I can see 60-70 % utilization. Pasting the External Metrics YAML.


apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
name: app-memory
spec:
name: app-memory
resource:
resource: "deployment"
queries:
- id: app_memory
metricStat:
metric:
namespace: "Custom/ContainerInsights"
metricName: "pod_memory_utilization_over_pod_limit"
dimensions:
- name: PodName
value: app-memory
- name: ClusterName
value: dzstaging.k8s.local
- name: Namespace
value: dz-staging
period: 10
unit: Count
stat: Average
returnData: true

I tried to change the "Unit" to Percent but no hope.

Adapter pod not able to connect to Ec2Metadata

I'm running the CloudWatch adapter with IRSA configured. While other pods running on the node are able to access the metadata API, the pod running CloudWatch adapter throws the following error

client.go:97] err: EC2RoleRequestError: no EC2 instance role found
caused by: RequestError: send request failed
caused by: Get http://169.254.169.254/latest/meta-data/iam/security-credentials: dial tcp 169.254.169.254:80: connect: connection refused
E0804 12:32:13.788312       1 provider_external.go:31] bad request: EC2RoleRequestError: no EC2 instance role found

We have set up IRSA with the required permissions. Since it is not able to connect to ec2metadata API, the region is also not picked

I0805 07:49:09.832766       1 controller.go:57] initializing controller
E0805 07:49:09.832972       1 util.go:14] unable to get current region information, Get http://169.254.169.254/latest/meta-data/placement/availability-zone/: dial tcp 169.254.169.254:80: connect: connection refused
I0805 07:49:09.832988       1 client.go:26] using AWS Region:

Setting AWS_REGION does not solve the issue either

Publish CDR helm chart

When the CRD resources were moved into a separate chart (#56), I think you forgot to publish the chart

Query the metrics APIs after adapter is deployed

Hi,
I followed this guide https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/ to create my sqs metrics.
When I run:
$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq
Error from server (ServiceUnavailable): the server is currently unable to handle the request

If I deploy the following steps, the hpa cannont work and show:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
sqs-consumer-scaler Deployment/sqs-consumer < unknown >/30 1 10 1 2m

Log for pod:
Events:
Type Reason Age From Message


Warning FailedComputeMetricsReplicas 7m48s (x12 over 10m) horizontal-pod-autoscaler failed to get external metric sqs-dev-length: unable to get external metric default/sqs-dev-length/nil: external metrics aren't supported
Warning FailedGetExternalMetric 33s (x41 over 10m) horizontal-pod-autoscaler unable to get external metric default/sqs-dev-length/nil: external metrics aren't supported

I think I attached correct IAM policy. And I have a metrics server which is running correctly:
$ kubectl get --raw "/apis/metrics.k8s.io/" | jq
{
"kind": "APIGroup",
"apiVersion": "v1",
"name": "metrics.k8s.io",
"versions": [
{
"groupVersion": "metrics.k8s.io/v1beta1",
"version": "v1beta1"
}
],
"preferredVersion": {
"groupVersion": "metrics.k8s.io/v1beta1",
"version": "v1beta1"
}
}

Environment:
kops Version 1.12.3
kubectl client: v1.15.3
kubectl server: v1.12.10

Platform:
EC2: debian-stretch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.