Coder Social home page Coder Social logo

kubernetes / kube-state-metrics Goto Github PK

View Code? Open in Web Editor NEW
5.1K 83.0 1.8K 22.79 MB

Add-on agent to generate and expose cluster-level metrics.

Home Page: https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/

License: Apache License 2.0

Go 96.75% Makefile 0.76% Shell 0.91% Dockerfile 0.04% Jsonnet 1.53%
kubernetes prometheus monitoring observability prometheus-exporter metrics kubernetes-monitoring kubernetes-exporter

kube-state-metrics's Introduction

Overview

Build Status Go Report Card Go Reference govulncheck OpenSSF Scorecard

kube-state-metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. (See examples in the Metrics section below.) It is not focused on the health of the individual Kubernetes components, but rather on the health of the various objects inside, such as deployments, nodes and pods.

kube-state-metrics is about generating metrics from Kubernetes API objects without modification. This ensures that features provided by kube-state-metrics have the same grade of stability as the Kubernetes API objects themselves. In turn, this means that kube-state-metrics in certain situations may not show the exact same values as kubectl, as kubectl applies certain heuristics to display comprehensible messages. kube-state-metrics exposes raw data unmodified from the Kubernetes API, this way users have all the data they require and perform heuristics as they see fit.

The metrics are exported on the HTTP endpoint /metrics on the listening port (default 8080). They are served as plaintext. They are designed to be consumed either by Prometheus itself or by a scraper that is compatible with scraping a Prometheus client endpoint. You can also open /metrics in a browser to see the raw metrics. Note that the metrics exposed on the /metrics endpoint reflect the current state of the Kubernetes cluster. When Kubernetes objects are deleted they are no longer visible on the /metrics endpoint.

Note

This README is generated from a template. Please make your changes there and run make generate-template.

Table of Contents

Versioning

Kubernetes Version

kube-state-metrics uses client-go to talk with Kubernetes clusters. The supported Kubernetes cluster version is determined by client-go. The compatibility matrix for client-go and Kubernetes cluster can be found here. All additional compatibility is only best effort, or happens to still/already be supported.

Compatibility matrix

At most, 5 kube-state-metrics and 5 kubernetes releases will be recorded below. Generally, it is recommended to use the latest release of kube-state-metrics. If you run a very recent version of Kubernetes, you might want to use an unreleased version to have the full range of supported resources. If you run an older version of Kubernetes, you might need to run an older version in order to have full support for all resources. Be aware, that the maintainers will only support the latest release. Older versions might be supported by interested users of the community.

kube-state-metrics Kubernetes client-go Version
v2.8.2 v1.26
v2.9.2 v1.26
v2.10.1 v1.27
v2.11.0 v1.28
v2.12.0 v1.29
main v1.29

Resource group version compatibility

Resources in Kubernetes can evolve, i.e., the group version for a resource may change from alpha to beta and finally GA in different Kubernetes versions. For now, kube-state-metrics will only use the oldest API available in the latest release.

Container Image

The latest container image can be found at:

  • registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.12.0 (arch: amd64, arm, arm64, ppc64le and s390x)
  • View all multi-architecture images at here

Metrics Documentation

Any resources and metrics based on alpha Kubernetes APIs are excluded from any stability guarantee, which may be changed at any given release.

See the docs directory for more information on the exposed metrics.

Conflict resolution in label names

The *_labels family of metrics exposes Kubernetes labels as Prometheus labels. As Kubernetes is more liberal than Prometheus in terms of allowed characters in label names, we automatically convert unsupported characters to underscores. For example, app.kubernetes.io/name becomes label_app_kubernetes_io_name.

This conversion can create conflicts when multiple Kubernetes labels like foo-bar and foo_bar would be converted to the same Prometheus label label_foo_bar.

Kube-state-metrics automatically adds a suffix _conflictN to resolve this conflict, so it converts the above labels to label_foo_bar_conflict1 and label_foo_bar_conflict2.

If you'd like to have more control over how this conflict is resolved, you might want to consider addressing this issue on a different level of the stack, e.g. by standardizing Kubernetes labels using an Admission Webhook that ensures that there are no possible conflicts.

Kube-state-metrics self metrics

kube-state-metrics exposes its own general process metrics under --telemetry-host and --telemetry-port (default 8081).

kube-state-metrics also exposes list and watch success and error metrics. These can be used to calculate the error rate of list or watch resources. If you encounter those errors in the metrics, it is most likely a configuration or permission issue, and the next thing to investigate would be looking at the logs of kube-state-metrics.

Example of the above mentioned metrics:

kube_state_metrics_list_total{resource="*v1.Node",result="success"} 1
kube_state_metrics_list_total{resource="*v1.Node",result="error"} 52
kube_state_metrics_watch_total{resource="*v1beta1.Ingress",result="success"} 1

kube-state-metrics also exposes some http request metrics, examples of those are:

http_request_duration_seconds_bucket{handler="metrics",method="get",le="2.5"} 30
http_request_duration_seconds_bucket{handler="metrics",method="get",le="5"} 30
http_request_duration_seconds_bucket{handler="metrics",method="get",le="10"} 30
http_request_duration_seconds_bucket{handler="metrics",method="get",le="+Inf"} 30
http_request_duration_seconds_sum{handler="metrics",method="get"} 0.021113919999999998
http_request_duration_seconds_count{handler="metrics",method="get"} 30

kube-state-metrics also exposes build and configuration metrics:

kube_state_metrics_build_info{branch="main",goversion="go1.15.3",revision="6c9d775d",version="v2.0.0-beta"} 1
kube_state_metrics_shard_ordinal{shard_ordinal="0"} 0
kube_state_metrics_total_shards 1

kube_state_metrics_build_info is used to expose version and other build information. For more usage about the info pattern, please check the blog post here. Sharding metrics expose --shard and --total-shards flags and can be used to validate run-time configuration, see /examples/prometheus-alerting-rules.

kube-state-metrics also exposes metrics about it config file and the Custom Resource State config file:

kube_state_metrics_config_hash{filename="crs.yml",type="customresourceconfig"} 2.38272279311849e+14
kube_state_metrics_config_hash{filename="config.yml",type="config"} 2.65285922340846e+14
kube_state_metrics_last_config_reload_success_timestamp_seconds{filename="crs.yml",type="customresourceconfig"} 1.6704882592037103e+09
kube_state_metrics_last_config_reload_success_timestamp_seconds{filename="config.yml",type="config"} 1.6704882592035313e+09
kube_state_metrics_last_config_reload_successful{filename="crs.yml",type="customresourceconfig"} 1
kube_state_metrics_last_config_reload_successful{filename="config.yml",type="config"} 1

Scaling kube-state-metrics

Resource recommendation

Resource usage for kube-state-metrics changes with the Kubernetes objects (Pods/Nodes/Deployments/Secrets etc.) size of the cluster. To some extent, the Kubernetes objects in a cluster are in direct proportion to the node number of the cluster.

As a general rule, you should allocate:

  • 250MiB memory
  • 0.1 cores

Note that if CPU limits are set too low, kube-state-metrics' internal queues will not be able to be worked off quickly enough, resulting in increased memory consumption as the queue length grows. If you experience problems resulting from high memory allocation or CPU throttling, try increasing the CPU limits.

Latency

In a 100 node cluster scaling test the latency numbers were as follows:

"Perc50": 259615384 ns,
"Perc90": 475000000 ns,
"Perc99": 906666666 ns.

A note on costing

By default, kube-state-metrics exposes several metrics for events across your cluster. If you have a large number of frequently-updating resources on your cluster, you may find that a lot of data is ingested into these metrics. This can incur high costs on some cloud providers. Please take a moment to configure what metrics you'd like to expose, as well as consult the documentation for your Kubernetes environment in order to avoid unexpectedly high costs.

kube-state-metrics vs. metrics-server

The metrics-server is a project that has been inspired by Heapster and is implemented to serve the goals of core metrics pipelines in Kubernetes monitoring architecture. It is a cluster level component which periodically scrapes metrics from all Kubernetes nodes served by Kubelet through Metrics API. The metrics are aggregated, stored in memory and served in Metrics API format. The metrics-server stores the latest values only and is not responsible for forwarding metrics to third-party destinations.

kube-state-metrics is focused on generating completely new metrics from Kubernetes' object state (e.g. metrics based on deployments, replica sets, etc.). It holds an entire snapshot of Kubernetes state in memory and continuously generates new metrics based off of it. And just like the metrics-server it too is not responsible for exporting its metrics anywhere.

Having kube-state-metrics as a separate project also enables access to these metrics from monitoring systems such as Prometheus.

Horizontal sharding

In order to shard kube-state-metrics horizontally, some automated sharding capabilities have been implemented. It is configured with the following flags:

  • --shard (zero indexed)
  • --total-shards

Sharding is done by taking an md5 sum of the Kubernetes Object's UID and performing a modulo operation on it with the total number of shards. Each shard decides whether the object is handled by the respective instance of kube-state-metrics or not. Note that this means all instances of kube-state-metrics, even if sharded, will have the network traffic and the resource consumption for unmarshaling objects for all objects, not just the ones they are responsible for. To optimize this further, the Kubernetes API would need to support sharded list/watch capabilities. In the optimal case, memory consumption for each shard will be 1/n compared to an unsharded setup. Typically, kube-state-metrics needs to be memory and latency optimized in order for it to return its metrics rather quickly to Prometheus. One way to reduce the latency between kube-state-metrics and the kube-apiserver is to run KSM with the --use-apiserver-cache flag. In addition to reducing the latency, this option will also lead to a reduction in the load on etcd.

Sharding should be used carefully and additional monitoring should be set up in order to ensure that sharding is set up and functioning as expected (eg. instances for each shard out of the total shards are configured).

Automated sharding

Automatic sharding allows each shard to discover its nominal position when deployed in a StatefulSet which is useful for automatically configuring sharding. This is an experimental feature and may be broken or removed without notice.

To enable automated sharding, kube-state-metrics must be run by a StatefulSet and the pod name and namespace must be handed to the kube-state-metrics process via the --pod and --pod-namespace flags. Example manifests demonstrating the autosharding functionality can be found in /examples/autosharding.

This way of deploying shards is useful when you want to manage KSM shards through a single Kubernetes resource (a single StatefulSet in this case) instead of having one Deployment per shard. The advantage can be especially significant when deploying a high number of shards.

The downside of using an auto-sharded setup comes from the rollout strategy supported by StatefulSets. When managed by a StatefulSet, pods are replaced one at a time with each pod first getting terminated and then recreated. Besides such rollouts being slower, they will also lead to short downtime for each shard. If a Prometheus scrape happens during a rollout, it can miss some of the metrics exported by kube-state-metrics.

Daemonset sharding for pod metrics

For pod metrics, they can be sharded per node with the following flag:

  • --node=$(NODE_NAME)

Each kube-state-metrics pod uses FieldSelector (spec.nodeName) to watch/list pod metrics only on the same node.

A daemonset kube-state-metrics example:

apiVersion: apps/v1
kind: DaemonSet
spec:
  template:
    spec:
      containers:
      - image: registry.k8s.io/kube-state-metrics/kube-state-metrics:IMAGE_TAG
        name: kube-state-metrics
        args:
        - --resource=pods
        - --node=$(NODE_NAME)
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName

To track metrics for unassigned pods, you need to add an additional deployment and set --node="", as shown in the following example:

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - image: registry.k8s.io/kube-state-metrics/kube-state-metrics:IMAGE_TAG
        name: kube-state-metrics
        args:
        - --resources=pods
        - --node=""

Other metrics can be sharded via Horizontal sharding.

Setup

Install this project to your $GOPATH using go get:

go get k8s.io/kube-state-metrics

Building the Docker container

Simply run the following command in this root folder, which will create a self-contained, statically-linked binary and build a Docker image:

make container

Usage

Simply build and run kube-state-metrics inside a Kubernetes pod which has a service account token that has read-only access to the Kubernetes cluster.

For users of prometheus-operator/kube-prometheus stack

The (kube-prometheus) stack installs kube-state-metrics as one of its components; you do not need to install kube-state-metrics if you're using the kube-prometheus stack.

If you want to revise the default configuration for kube-prometheus, for example to enable non-default metrics, have a look at Customizing Kube-Prometheus.

Kubernetes Deployment

To deploy this project, you can simply run kubectl apply -f examples/standard and a Kubernetes service and deployment will be created. (Note: Adjust the apiVersion of some resource if your kubernetes cluster's version is not 1.8+, check the yaml file for more information).

To have Prometheus discover kube-state-metrics instances it is advised to create a specific Prometheus scrape config for kube-state-metrics that picks up both metrics endpoints. Annotation based discovery is discouraged as only one of the endpoints would be able to be selected, plus kube-state-metrics in most cases has special authentication and authorization requirements as it essentially grants read access through the metrics endpoint to most information available to it.

Note: Google Kubernetes Engine (GKE) Users - GKE has strict role permissions that will prevent the kube-state-metrics roles and role bindings from being created. To work around this, you can give your GCP identity the cluster-admin role by running the following one-liner:

kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud info --format='value(config.account)')

Note that your GCP identity is case sensitive but gcloud info as of Google Cloud SDK 221.0.0 is not. This means that if your IAM member contains capital letters, the above one-liner may not work for you. If you have 403 forbidden responses after running the above command and kubectl apply -f examples/standard, check the IAM member associated with your account at https://console.cloud.google.com/iam-admin/iam?project=PROJECT_ID. If it contains capital letters, you may need to set the --user flag in the command above to the case-sensitive role listed at https://console.cloud.google.com/iam-admin/iam?project=PROJECT_ID.

After running the above, if you see Clusterrolebinding "cluster-admin-binding" created, then you are able to continue with the setup of this service.

Limited privileges environment

If you want to run kube-state-metrics in an environment where you don't have cluster-reader role, you can:

  • create a serviceaccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: your-namespace-where-kube-state-metrics-will-deployed
  • give it view privileges on specific namespaces (using roleBinding) (note: you can add this roleBinding to all the NS you want your serviceaccount to access)
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kube-state-metrics
  namespace: project1
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
  - kind: ServiceAccount
    name: kube-state-metrics
    namespace: your-namespace-where-kube-state-metrics-will-deployed
  • then specify a set of namespaces (using the --namespaces option) and a set of kubernetes objects (using the --resources) that your serviceaccount has access to in the kube-state-metrics deployment configuration
spec:
  template:
    spec:
      containers:
      - name: kube-state-metrics
        args:
          - '--resources=pods'
          - '--namespaces=project1'

For the full list of arguments available, see the documentation in docs/developer/cli-arguments.md

Helm Chart

Starting from the kube-state-metrics chart v2.13.3 (kube-state-metrics image v1.9.8), the official Helm chart is maintained in prometheus-community/helm-charts. Starting from kube-state-metrics chart v3.0.0 only kube-state-metrics images of v2.0.0 + are supported.

Development

When developing, test a metric dump against your local Kubernetes cluster by running:

Users can override the apiserver address in KUBE-CONFIG file with --apiserver command line.

go install kube-state-metrics --port=8080 --telemetry-port=8081 --kubeconfig= --apiserver=

Then curl the metrics endpoint

curl localhost:8080/metrics

To run the e2e tests locally see the documentation in tests/README.md.

Developer Contributions

When developing, there are certain code patterns to follow to better your contributing experience and likelihood of e2e and other ci tests to pass. To learn more about them, see the documentation in docs/developer/guide.md.

kube-state-metrics's People

Contributors

andreihagiescu avatar andyxning avatar asifdxtreme avatar brancz avatar catherinef-dev avatar chrischdi avatar clamoriniere avatar cofyc avatar dalehenries avatar dependabot[bot] avatar dgrisonnet avatar fabxc avatar fpetkovski avatar iamnoah avatar k8s-ci-robot avatar kaitoii11 avatar lilic avatar mikulas avatar mishra-prabhakar avatar mrueg avatar mxinden avatar olivierlemasle avatar r0fls avatar reetasingh avatar rexagod avatar scottrigby avatar serializator avatar sylr avatar tariq1890 avatar zouyee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kube-state-metrics's Issues

Docs for kube_replicaset metrics don't match implementation

The project README.md lists the ReplicaSet metrics as:

  • kube_replicaset_status_replicas
  • kube_replicaset_status_replicas_available
  • kube_replicaset_status_replicas_unavailable
  • kube_replicaset_status_replicas_updated
  • kube_replicaset_status_replicas_observed_generation
  • kube_replicaset_spec_replicas
  • kube_replicaset_spec_paused
  • kube_replicaset_metadata_generation

but the implementation defines:

  • kube_replicaset_status_replicas
  • kube_replicaset_status_fully_labeled_replicas
  • kube_replicaset_status_ready_replicas
  • kube_replicaset_status_observed_generation
  • kube_replicaset_spec_replicas
  • kube_replicaset_metadata_generation

It looks like the documentation is just a duplicate of the metrics for Deployments w/ 'replicaset' replacing 'deployment' - should the docs be updated to match? Is the implementation as desired?

Not able to get "kube_node_status_phase" metric

I am running Kubernets on coreos. For monitoring, I am using Prometheus, Grafana and kube-state metrics.
I wanted to check status of nodes in cluster , whether they are running or not. I thought, I could get info from "kube_node_status_phase" metric. But , it is not showing in prometheus. Except this, I am able to see all metrics.
Can anyone help me to sort this out?

[feature-request] Add metrics from CronJobs

Most important for monitoring: last successful run time. Last run time, duration, overrun indication, etc would be awesome too :).

CronJobs are still in alpha though, so it might be good to wait for them to at least make it to beta.

Add support for ready metrics for daemonset

Currently, the metrics exported for daemonset doesn't allow to detect that a damonset pod is not ready.
There is not equivalent of kube_replicaset_status_ready_replicas.

Would it be possible to add support for a similar metric for daemonset ?

Thanks in advance.

Yann

1.0 stabilization

As discussed in the last SIG instrumentation meeting, we plan to do a first stable release of kube-state-metrics.
As we have been mostly adding functionality for a while, rather than changing existing one, there's nothing fundamental to change here.

  • double-check all existing metrics for compliance with our guidelines
  • double-check current functionality does not conflict with future plans of fetching partial metrics, i.e. only for pod metrics for certrain deployments
  • load test kube-state-metrics to ensure it scales with large clusters and derive resource requirements (@piosz, can you help with that?)
  • provide deployment manifest that scales with cluster size using pod nanny

NodeInfo Metric

I think it is worth adding the Node IP address to the NodeInfo metric but I'd like feedback on if the public, private or both (or all if there is more than the 2) should be added as labels?

This is the structure I see when I get a node and out put as yaml:

status:
  addresses:
  - address: X.X.X.X
    type: InternalIP
  - address: X.X.X.X
    type: ExternalIP

I'm happy to create a PR for this if it is something you think would be good to have and if I can get that bit of direction.

Grafana dashboard

I think that it might be really cool to have a dashboard using this project.

kube-state-metrics process generates uninteresting metrics about itself with go_* prefix

If you look at the output of the curl (mentioned in the README.md ) the start of the output has these:

go_gc_duration_seconds{quantile="0"} 6.1791e-05
go_gc_duration_seconds{quantile="0.25"} 6.3312e-05
go_gc_duration_seconds{quantile="0.5"} 0.000103029
go_gc_duration_seconds{quantile="0.75"} 0.000126511
go_gc_duration_seconds{quantile="1"} 0.000670194
go_gc_duration_seconds_sum 0.001212694
go_gc_duration_seconds_count 7
// HELP go_goroutines Number of goroutines that currently exist.
// TYPE go_goroutines gauge
go_goroutines 23

As far as I understand these are go framework metrics about the kube-state-metrics process, which are quite uninteresting.

The reason of that should be the prometheus.UninstrumentedHandler() function call.
If I am not mistaken that handler by default adds metrics about the process.

The definition says this:
// UninstrumentedHandler returns an HTTP handler for the DefaultGatherer.
//
// Deprecated: Use promhttp.Handler instead. See there for further documentation.
func UninstrumentedHandler() http.Handler {

This issue could be resolved by using promHttp handler and separating the metrics that start with go_ from the rest. (I know this is a vague explanation. After some more investigation, I can add more details about the proposed impl changes to this issue. )

We would like to improve the kube-state-metrics in that regard.
The primary purpose of this issue is to communicate this to other developers and avoid replicated work.

kube_node_info should add node labels

We run several node pools and use labels to differentiate them (as well as taints). One node pool in particular is built to be a canary for future versions of our underlying OS. I'd like to be able to build an alert the fires if that node goes away (assumably because some change in the lower env causes it's kubelet to fail).

Being able to to say sum(kube_node_info{mynodelabel="canary"}) == 0 would be awesome.

Should allow aggregation of pod/container metrics by deployment

Pods and containers export important metrics like health or container restart count. These metrics are most useful when viewed at a deployment aggregation level (i.e., summed over all pods belonging to the same deployment) or on the replica set level. Individual pods are less useful, because a pod might go away for benign reasons.

To do the aggregation, I need labels that reference the deployment. For example, for the standard pod from a deployment named "foo-12345-fpgj", I'd need a label "foo" that doesn't include the replica set identifier ("12345") or the pod identifier ("fpgj").

This bug is for tracking. We're already in touch with the Stackdriver and Kubernetes folks in Google who're hopefully making this happen.

Support non-ssl api server

I am not currently running k8s with TLS on either the api servers or the kubelet. I was hoping to run the kube-state-metrics deployment, but it seems to want to authenticate strictly with service tokens over HTTPS.

I have tried passing in api server directly.

docker run -i -t gcr.io/google_containers/kube-state-metrics:v0.3.0 --in-cluster=false --apiserver=http://10.101.0.1:8080
F1025 18:57:59.409874       1 main.go:86] Failed to create client: invalid configuration: no configuration has been provided
goroutine 1 [running]:
k8s.io/kube-state-metrics/vendor/github.com/golang/glog.stacks(0x1cc5700, 0xc400000000, 0x7d, 0xac)
    src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:766 +0xa5
k8s.io/kube-state-metrics/vendor/github.com/golang/glog.(*loggingT).output(0x1ca5220, 0xc400000003, 0xc4200e6c00, 0x1c402df, 0x7, 0x56, 0x0)
    src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:717 +0x337
k8s.io/kube-state-metrics/vendor/github.com/golang/glog.(*loggingT).printf(0x1ca5220, 0x3, 0x1475343, 0x1b, 0xc420665ec8, 0x1, 0x1)
    src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:655 +0x14c
k8s.io/kube-state-metrics/vendor/github.com/golang/glog.Fatalf(0x1475343, 0x1b, 0xc420665ec8, 0x1, 0x1)
    src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:1145 +0x67
main.main()
    src/k8s.io/kube-state-metrics/main.go:86 +0x20b

It's not very clear to me what exactly is expected in the --config flag, but I also provided a valid ./kube/config file for that API server but the container still complains about invalid configuration.

BTW, the currently --help indicates that flags should be in the form of --flag value, but IME, you have to specify --flag=value.

Is this a worthwhile thing to support or am I fighting upstream by not adding TLS to my api/workers?

Feature request: Add ReplicationController and Daemonsets metrics

Hello,
Is it possible to add ReplicationController and Daemonsets metrics as well?
I am using the following metrics to alert if number of running pods is not equal to replicas define in rc, ds :
rc_status_readyReplicas, rc_spec_replicas
ds_status_currentNumberScheduled, ds_status_desiredNumberScheduled

I find also, that Kubernetes API don't provide the rc_status_readyReplicas metric at all when it's 0 - how this can be fixed ?
Thanks!

Report CPU and Memory requests and limits

We are trying to keep an eye on the overall CPU and Memory requests and limits across the cluster.

kubectl describe nodes reports this information per node, but this exporter does not seem to expose it.

Example output:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
 1095m (54%)	5480m (274%)	786Mi (10%)	3418Mi (45%)

As kubectl is reporting this information it is certainly available to this exporter as well.

Is there any way to get these numbers from the already exported metrics or could someone look into exporting them?

IMHO these "Cluster Usage" metrics are the most important metrics K8s has, and right now it's a little hard (or at least unclear) how to get them.

Panic when showing flags

I find it unexpected that a simple kube-state-metrics -h will cause a stack trace to be printed.

Errors with ReplicationController metrics

Love this app! It's always worked great with zero issues, thank you.

Today, I excitedly tried out commit b68036d, however I'm getting errors like these in the logs and no rc metrics in /metrics.

E0307 14:41:26.060729       1 reflector.go:199] github.com/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.ReplicationController: the server could not find the requested resource
E0307 14:41:27.064053       1 reflector.go:199] github.com/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.ReplicationController: the server could not find the requested resource
E0307 14:41:28.066296       1 reflector.go:199] github.com/kube-state-metrics/vendor/k8s.io/client-go/tools/cache/reflector.go:94: Failed to list *v1.ReplicationController: the server could not find the requested resource

Bad certificate

I get this error when I start the pod
do I need to change my CA?
or there is a workaround
Failed to create client: ERROR communicating with apiserver: Get https://10.254.0.1:443/version: x509: certificate is valid for 10.0.1.5, 10.0.0.1, not 10.254.0.1 goroutine 1 [running]: k8s.io/kube-state-metrics/vendor/github.com/golang/glog.stacks(0x1cc5700, 0xc400000000, 0xcd, 0x1e8) src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:766 +0xa5 k8s.io/kube-state-metrics/vendor/github.com/golang/glog.(*loggingT).output(0x1ca5220, 0xc400000003, 0xc42016cc00, 0x1c402df, 0x7, 0x56, 0x0) src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:717 +0x337 k8s.io/kube-state-metrics/vendor/github.com/golang/glog.(*loggingT).printf(0x1ca5220, 0x3, 0x1475343, 0x1b, 0xc420645ec8, 0x1, 0x1) src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:655 +0x14c k8s.io/kube-state-metrics/vendor/github.com/golang/glog.Fatalf(0x1475343, 0x1b, 0xc420645ec8, 0x1, 0x1) src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:1145 +0x67 main.main() src/k8s.io/kube-state-metrics/main.go:86 +0x20b

Make metric collection synchronous

Currently metrics get updated every 10 seconds in the background. This means scrapers record metrics that are off by an unknown duration of 0 to 10 seconds. It also means that the maximum sampling interval is fixed to 10 seconds, which does not generally fit every use case.

When writing Prometheus exporters, the best practice approach is to gather the data synchronously as /metrics is accessed to guarantee accurate sample data. Sampling frequency should be defined by the clients.

I'd like this exporter to follow that approach. I understand that the current behavior is an easy way to prevent overloading (esp. with multiple clients). If development on this repo is continued quota and auth have to be addressed anyway to defend against mis-behaving clients. But we probably shouldn't limit possibilities at the core of the application.

@kubernetes/sig-instrumentation

Support for API-groups and other kinds like the openshift objects

What are the plans regarding supporting other non core
API groups?

We are using OpenShift and are seriously looking into this, but it would be nice to support kinds like DeploymentConfig, BuildConfig and ImageStreams.

Currently they exist in the oapi API server but it is changing with the introduction of API groups. openshift/origin#12986

I am at kubecon in Berlin so if any of you are here I would love to chat about this. Ping me as @bjartek on twitter or just reply here.

Failed to create the client

When I try to start the kube-state-metrics in Kubernetes(1.3.7), I get the following error:

I0425 01:13:00.524520       1 main.go:139] Using default collectors
I0425 01:13:00.525263       1 main.go:186] service account token present: true
I0425 01:13:00.525332       1 main.go:187] service host: https://10.253.0.1:443
I0425 01:13:00.526352       1 main.go:213] Testing communication with server
F0425 01:13:00.550739       1 main.go:156] Failed to create client: ERROR communicating with apiserver: the server has asked for the client to provide credentials

Then I delete the rc and svc to create again, the error becomes:

I0425 01:30:18.545484       1 main.go:139] Using default collectors
I0425 01:30:18.545707       1 main.go:186] service account token present: true
I0425 01:30:18.545721       1 main.go:187] service host: https://10.253.0.1:443
I0425 01:30:18.546196       1 main.go:213] Testing communication with server
F0425 01:30:18.547422       1 main.go:156] Failed to create client: ERROR communicating with apiserver: Get https://10.253.0.1:443/version: x509: certificate signed by unknown authority

The rc file is:

apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-state-metrics
  namespace: monitor
  labels:
    name: kube-state-metrics
spec:
  replicas: 1
  selector:
    name: kube-state-metrics
  template:
    metadata:
      labels:
        name: kube-state-metrics
      annotations:
        prometheus.io/scrape: "true"
    spec:
      containers:
      - name: kube-state-metrics
        image: gcr.io/google_containers/kube-state-metrics:v0.4.1
        ports:
        - containerPort: 8080
      nodeSelector:
        kubernetes.io/hostname: 10.22.96.10

The svc file is:

apiVersion: v1
kind: Service
metadata:
  name: kube-state-metrics
  namespace: monitor
  labels:
    name: kube-state-metrics
spec:
  type: NodePort
  ports:
  - name: main
    port: 8080
    nodePort: 30880
  selector:
    name: kube-state-metrics

Does anyone know this? Thank you in advance

kubeconfig flag is ignored when not in cluster

When using the exporter setting the --in-cluster to false, the --kubeconfig flag is ignored. If your kubeconfig file is not at one of the standard locations, then the exporter is unable to contact the apiserver.

[feature-request] Deployment Conditions

That'd be great if kube-state-metrics exposed Deployment Conditions. My use case is to easily know if a Deployment is in progress or not, which I plan to use along with Prometheus and an inhibit rule in AlertManager.

I see Deployment Conditions were added to the go k8s client around git commit ae6775eeec5cc9e96cc5cec848f589158acf7d92, and I have tried to add this support in myself by modifying kube-state-metrics, but I'm running into issues building against the newer go client. This is at least partly because I'm hacking at it right now - I don't really know golang or the process of golang development.

I'm happy to keep trying on this, but it's possible one of you will figure it out before I do.

Makefile tip

You can get the latest git tag in the makefile as follows:

# http://stackoverflow.com/questions/1404796/how-to-get-the-latest-tag-name-in-current-branch-in-git
TAG := $(shell git describe --abbrev=0)

Missing deployment metric metadata.generation

I'm trying to use the metrics provided by this project but it's missing the metadata.generation metric to check for status of a deployment. It would be great if that could be added in!

Below is from the kubernetes docs about deployments. I'm using that as a guideline to check if deployments fail or not and then send out an alert if it does fail.

After creating or updating a Deployment, you would want to confirm whether it succeeded or not. The simplest way to do this is through kubectl rollout status.

This verifies the Deployment’s .status.observedGeneration >= .metadata.generation, and its up-to-date replicas (.status.updatedReplicas) matches the desired replicas (.spec.replicas) to determine if the rollout succeeded. If the rollout is still in progress, it watches for Deployment status changes and prints related messages.
Note that it’s impossible to know whether a Deployment will ever succeed, so if the above command doesn’t return success, you’ll need to timeout and give up at some point.
Additionally, if you set .spec.minReadySeconds, you would also want to check if the available replicas (.status.availableReplicas) matches the desired replicas too.

No logs available

I have kube-state-metrics running as a deployment via ansible on my clusters:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kube-state-metrics
spec:
  replicas: 4
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      containers:
      - name: kube-state-metrics
        image: gcr.io/google_containers/kube-state-metrics:v0.3.0
        ports:
        - name: metrics
          containerPort: 8080
        resources:
          requests:
            memory: {{ kube_state_mem_req }}
            cpu: 100m
          limits:
            memory: {{ kube_state_mem_lim }}
            cpu: 200m

I've had to bump kube_state_mem_(req|lim) to 800Mi in order to get the pods to stay up; the pods have started OOMKilling/CrashLoopBackoff'ing.

I'd like to know why, but the containers are basically inscrutable. There's no way to shell in and docker logs is empty.

It'd be great if there was more information on what's going on, please and thanks!

Release 0.3.0 is not pushed up into the container registry

It is currently not there.

docker pull gcr.io/google_containers/kube-state-metrics:v0.3.0
Pulling repository gcr.io/google_containers/kube-state-metrics
Tag v0.3.0 not found in repository gcr.io/google_containers/kube-state-metrics

[feature] Add metrics for v1.ComponentStatusList

Hi all!

I just stumbled across this project and was doing what you are doing in a much more sloppy way via python. I would like to switch over to kube-state-metric but there is one thing missing: Getting the component statuses. They can be retrieved via API. I was trying to do it myself and create a PR but I have never worked with GO before, so I am afraid I am not much of a help here.

This endpoint basicaly shows the internal monitoring information about the kube-controller-manager, the sheduler and the etcd servers attached. I am mainly using it to create alerts via Kapacitor. Would that be something you could implement?

Cheers,
Christian

[ingress]feature request

gentle ping @brancz
I will work on adding some metrics,here is the relevant information . would you have some suggeustions?

service

Metric name Metric type Labels/tags
kube_service_spec_servicetype Gauge clusterIP =<cluster ip>
namespace=<service-namespace>
service=<service-name>
type=<ClusterIP|NodePort|None&gt;

pv

Metric name Metric type Labels/tags
kube_persistentvolume_status Gauge persistentvolume=<pv-name>
namespace=<pvc-namespace>
phase=<Bound|Failed|Pending|Available|Released>
volume=<pvc-namespace>

namespace

Metric name Metric type Labels/tags
kube_namespace_status_phase Gauge name=<namespace-name>
namespace=<pvc-namespace>
phase=<Terminating|Active>
create_time=<date-time the server time when this object was created> 

ingress

Metric name Metric type Labels/tags
kube_ingress_info Gauge name=<ingress-name>
namespace=<ingress-namespace>
create_time=<date-time the server time when this object was created>
kube_ingress_metadata-generation Gauge name=<ingress-name>
namespace=<ingress-namespace>
kube_ingress_loadbalancer Gauge name=<ingress-name>
namespace=<ingress-namespace>
IP=<IP set for loadbalancer>
hostname=<loadbalancer hostname based on dns>

Panic runtime error for 0.4.0 in GKE 1.5.2

Hello,

I just tried to upgrade kube-state-metric pod in 0.4.0. I'm running a GKE cluster in 1.5.2.
Kube-state-metrics is running in Ubuntu 16.04 container and built using:

apt-get install -y golang git
cd /root
mkdir -p "$GOPATH"
go get github.com/kubernetes/kube-state-metrics
mv "${GOPATH}/bin/kube-state-metrics" /usr/local/bin/

Here is the stacktrace I got when I try to get /metric

kube-state-metrics
I0210 07:30:30.326926     175 main.go:139] Using default collectors
I0210 07:30:30.327196     175 main.go:186] service account token present: true
I0210 07:30:30.327224     175 main.go:187] service host: https://10.255.240.1:443
I0210 07:30:30.327902     175 main.go:213] Testing communication with server
I0210 07:30:30.359162     175 main.go:218] Communication with server successful
I0210 07:30:30.359497     175 main.go:263] Active collectors: resourcequotas,replicasets,daemonsets,deployments,pods,nodes
I0210 07:30:30.359533     175 main.go:227] Starting metrics server: :80
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x402578]
goroutine 81 [running]:
panic(0x18c3a00, 0xc82000a0a0)
	/usr/lib/go-1.6/src/runtime/panic.go:481 +0x3e6
main.(*deploymentCollector).collectDeployment(0xc820456490, 0xc82006e900, 0x0, 0x0, 0x0, 0x0, 0xc820495bc4, 0x6, 0x0, 0x0, ...)
	/root/go/src/github.com/kubernetes/kube-state-metrics/deployment.go:153 +0x238
main.(*deploymentCollector).Collect(0xc820456490, 0xc82006e900)
	/root/go/src/github.com/kubernetes/kube-state-metrics/deployment.go:135 +0x232
github.com/kubernetes/kube-state-metrics/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func2(0xc8204842c0, 0xc82006e900, 0x7f4865a756a8, 0xc820456490)
	/root/go/src/github.com/kubernetes/kube-state-metrics/vendor/github.com/prometheus/client_golang/prometheus/registry.go:433 +0x58
created by github.com/kubernetes/kube-state-metrics/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
	/root/go/src/github.com/kubernetes/kube-state-metrics/vendor/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x360

Release 0.4

There are new changes such as Daemonset metrics and per-container limits, but the last release is 0.3, from November. Also, I have code for the Datadog agent to scrape k-s-m for the new data, but quite understandably the maintainers want to see a new official release before accepting my change.

What does it take or who needs to be convinced to make a new release?

the server has asked for the client to provide credentials

kube-state-metrics :2017-04-12T01:56:13.362556000Z F0412 01:56:13.361945 1 main.go:73] Failed to create client: ERROR communicating with apiserver: the server has asked for the client to provide credentials
2017-04-12T01:56:13.362761000Z goroutine 1 [running]:
2017-04-12T01:56:13.362917000Z k8s.io/kube-state-metrics/vendor/github.com/golang/glog.stacks(0x234fb00, 0x0, 0x0, 0x0)
2017-04-12T01:56:13.363052000Z /usr/local/google/home/pszczesniak/go/src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:766 +0xb8
2017-04-12T01:56:13.363185000Z k8s.io/kube-state-metrics/vendor/github.com/golang/glog.(*loggingT).output(0x232f800, 0xc800000003, 0xc82028aa80, 0x2303db0, 0x7, 0x49, 0x0)
2017-04-12T01:56:13.363307000Z /usr/local/google/home/pszczesniak/go/src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:717 +0x259
2017-04-12T01:56:13.363422000Z k8s.io/kube-state-metrics/vendor/github.com/golang/glog.(*loggingT).printf(0x232f800, 0xc800000003, 0x1a0e560, 0x1b, 0xc820555ec0, 0x1, 0x1)
2017-04-12T01:56:13.363540000Z /usr/local/google/home/pszczesniak/go/src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:655 +0x1d4
2017-04-12T01:56:13.363669000Z k8s.io/kube-state-metrics/vendor/github.com/golang/glog.Fatalf(0x1a0e560, 0x1b, 0xc820555ec0, 0x1, 0x1)
2017-04-12T01:56:13.363784000Z /usr/local/google/home/pszczesniak/go/src/k8s.io/kube-state-metrics/vendor/github.com/golang/glog/glog.go:1145 +0x5d
2017-04-12T01:56:13.363903000Z main.main()
2017-04-12T01:56:13.364022000Z /usr/local/google/home/pszczesniak/go/src/k8s.io/kube-state-metrics/main.go:73 +0x271

yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-state-metrics-deployment
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
app: kube-state-metrics
version: "v0.3.0"
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8080'
spec:
containers:
- name: kube-state-metrics
image: registry.cn-shenzhen.aliyuncs.com/kim-docker/kube-state-metrics:v0.3.0
ports:
- containerPort: 8080
volumeMounts:
- name: "kubeconfig"
mountPath: "/etc/kubernetes/"
readOnly: true
args:
# - --in-cluster=true
# - --apiserver=https://kubernetes.default.svc
- --kubeconfig=/etc/kubernetes/kubeconfig
volumes:
- name: "kubeconfig"
hostPath:
path: "/etc/kubernetes/"

kubeconfig:

apiVersion: v1
kind: Config
clusters:

  • name: local
    cluster:
    certificate-authority: /etc/kubernetes/ssl/ca.crt
    server: https://10.254.0.1
    users:
  • name: kubelet
    user:
    client-certificate: /etc/kubernetes/ssl/node.crt
    client-key: /etc/kubernetes/ssl/node.key
    contexts:
  • context:
    cluster: local
    user: kubelet
    name: kubelet-context
    current-context: kubelet-context

Feature request: Add more labels to kube_pod_container_status_running

Thanks to this project I can create Prom alerts on the hosted deployment state. Big +1 in filling this gap.

Now Im attempting to create an alert based on a deployment's "dispersion" ratio.... that is, should the Kube scheduler had little choice in picking nodes with resources (i.e. node outage), that a given deployment's replicas are running on the same or too few Kubernetes node.

I think kube_pod_container_status_running would be the right place... but I'd need to have the deployment name and the node the pod is running on as labels. Then a Prom query would compute a ratio on POD running / unique nodes. Does that make sense ?

Documentation of the metrics

I can be considered pretty much new to monitoring land. It would be really cool to document metrics with some example usages. I know that some metrics are pretty much obvious from its name but its usage with some other metrics would be beneficial for all. I can also try my best to provide necessary information with my limited knowledge.

Another important thing is that compatibility of metrics with k8s version. Maybe something like since might be a good idea.

Feature Request: Add a new metric for currently used resources in nodes.

Hi All,
We would like to propose a new metric addition that describes the used resources by pods in every node.
Its name could be
kube_node_current_pod_cpu_requests,
kube_node_requested_cpu_resources,
kube_node_allocated_cpu_resources.

The names extend naturally for memory

Or something else suggested by you we are open to suggestions for names.

As the desired semantics, we propose this metric to contain the sum of requested resources of all the currently running pods in that node.

The same computation is done every time one executes kubectl describe node.
The resources table is computed every time from scratch.
For example

 Namespace          Name                        CPU Requests    CPU Limits  Memory Requests Memory Limits
  ---------         ----                        ------------    ----------  --------------- -------------
  default           k8s-master-127.0.0.1                0 (0%)      0 (0%)      0 (0%)      0 (0%)
  default           k8s-proxy-127.0.0.1             0 (0%)      0 (0%)      0 (0%)      0 (0%)
  hello-world-hbaba-build   hello-world-hbaba-989929512-6jiaf       800m (20%)  20 (500%)   100Mi (2%)  100Mi (2%)
  hello-world-hbaba-build   hello-world-hbaba-989929512-8hq9v       800m (20%)  20 (500%)   100Mi (2%)  100Mi (2%)
  hello-world-hbaba-build   hello-world-hbaba-989929512-jjd5n       800m (20%)  20 (500%)   100Mi (2%)  100Mi (2%)
  hello-world-hbaba-build   hello-world-hbaba-989929512-n00t2       800m (20%)  20 (500%)   100Mi (2%)  100Mi (2%)
  kube-system           kube-dns-v10-z4ics              310m (7%)   310m (7%)   170Mi (4%)  170Mi (4%)
Allocated resources:
  (Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
  CPU Requests  CPU Limits  Memory Requests Memory Limits
  ------------  ----------  --------------- -------------
  3510m (87%)   80310m (2007%)  570Mi (14%) 570Mi (14%)

The code for that table is in kubernetes/pkg/kubectl/describe.go file in describeNodeResource function. It iterates over a collection that is accessed like this :
nodeNonTerminatedPodsList, err := d.Core().Pods(namespace).List(api.ListOptions{FieldSelector: fieldSelector})

For the above example the printed metrics would be

kube_node_current_pod_cpu_requests{node="compute-node01"} 3510
kube_node_current_pod_memory_requests{node="compute-node01"} 80310

Our use case for this metric is the following:
We would like to understand the "utilization" of the nodes and we would like to understand how much resources have been allocated to running pods and how much is left.
We are interested in this "currentPodRequests" information at the node level as well as at the global cluster level.

We have created this issue to get any reactions to this suggestion. Has there been any similar discussions before? Would a PR adding this metric welcomed ?

Another alternative would be to extend the NodeStatus type and add a "currentPodRequests" field in the NodeStatus type in the kubernetes API.

We have also created an issue about that approach in the kubernetes repo. Since that is an API change we cannot anticipate the reaction it is going to get .

We propose that it is valuable to add such a currentPodRequests metric for nodes into kube-state-metrics.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.