Coder Social home page Coder Social logo

cablespaghetti / k3s-monitoring Goto Github PK

View Code? Open in Web Editor NEW
135.0 7.0 35.0 163 KB

A quick start guide for getting a full monitoring and alerting stack up and running on your k3s cluster, with Prometheus Operator and the kube-prometheus-stack Helm Chart.

prometheus-operator helm-chart grafana monitoring k3s kubernetes

k3s-monitoring's Introduction

Monitoring K3S with Prometheus Operator

I originally put this guide together for a talk at a Civo Cloud Community Meetup in July 2020; here is the video. This guide has since been updated to reflect changes in k3s and kube-prometheus-stack but not a great deal has changed since the video was recorded. Sign up for Civo's free KUBE100 beta here if you want a cluster to try out this guide on.

I seem incapable of keeping this up to date. If anyone has a fork they are maintaining please send me a message and I'll link to it. However I have archived this repo.

Prometheus can be complicated to get started with, which is why many people pick hosted monitoring solutions like Datadog. However it doesn't have to be and if you're monitoring Kubernetes, Prometheus is in my opinion the best option.

The great people over at CoreOS developed a Prometheus Operator for Kubernetes which allows you to define your Prometheus configuration in YAML and deploy it alongside your application manifests. This makes a lot of sense if you're deploying a lot of applications, maybe across many teams. They can all just define their own monitoring alerts.

You will need:

  • A k3s cluster like Civo Cloud (the "development" version is no longer needed, ignore what I say in the video) or maybe one installed on a Raspberry Pi with k3sup
  • kubectl installed on your machine and configured for that cluster
  • Helm 3 installed on your machine

I'm using Mailhog to receive my alerts for this demo because it's simple. However you might choose to hook into your mail provider to send emails (see commented settings for Gmail example) or send a Slack message (see Prometheus documentation). To install mailhog:

helm repo add codecentric https://codecentric.github.io/helm-charts
helm upgrade --install mailhog codecentric/mailhog

Install Prometheus Operator

Now installing Prometheus Operator from the kube-prometheus-stack Helm chart is as simple as running:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm upgrade --install prometheus prometheus-community/kube-prometheus-stack --version 13.4.1 --values kube-prometheus-stack-values.yaml

I've picked a specific version of the Helm Chart here which I know works with my config. Feel free to remove the --version parameter to get the latest version.

This deploys Prometheus, Alert Manager and Grafana with a few options disabled which don't work for k3s. You'll get a set of default Prometheus Rules (Alerts) configured which will alert you about most of things you need worry about when running a Kubernetes cluster.

There are a few commented out sections like CPU and Memory resource requests and limits which you should definitely set when you know the resources each service needs in your environment.

I also recommend setting up some Pod Priority Classes in your cluster and making the core parts of the system a high priority so if the cluster is low on resources Prometheus will still run and alert you.

Under routes you will see I've sent a few of the default Prometheus Rules to the null receiver which effectively mutes them. You might choose to remove some of these or add different alerts to the list.

Each time you change your values file, just re-run the helm upgrade command above for Helm to apply your changes.

Accessing Prometheus, Alert Manager and Grafana

I haven't configured any Ingress or Load Balancers for access to the services in my values file. This is because Prometheus and Alert Manager don't support any authentication out of the box and Grafana will be spun up with default credentials (Username: admin and Password: prom-operator). In our production environments we use oauth2-proxy to put Google authentication in front of these services. You could also set up Basic Authentication using Traefik.

This means you need to use kubectl port-forward to access the services for now. In separate terminal windows run the following commands:

kubectl port-forward svc/prometheus-grafana 8080:80
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090
kubectl port-forward svc/prometheus-kube-prometheus-alertmanager 9093

This will make Grafana accessible on http://localhost:8080, Prometheus on http://localhost:9090 and Alert Manager on http://localhost:9093

You'll see that Grafana is already configured with lots of useful dashboards and Prometheus is configured with Rules to send alerts for pretty much everything you need to monitor in a production cluster.

The power of Prometheus Operator

Because k3s uses Traefik for ingress, we want to add monitoring to that. Prometheus "scrapes" services to get metrics rather than having metrics pushed to it like many other systems. Many "cloud native" applications will expose a port for Prometheus metrics out of the box and Traefik is no exception. For any apps you build you will need a metrics endpoint and a Kubernetes Service with that port exposed.

All we need to do to get Prometheus scraping Traefik is add a Prometheus Operator ServiceMonitor resource which tells it the details of the existing service to scrape.

kubectl apply -f traefik-servicemonitor.yaml

You can verify that Prometheus is now scraping Traefik for metrics at http://localhost:9090/targets.

You can also do something similar with Grafana dashboards. Just deploy them in a ConfigMap like this:

kubectl apply -f traefik-dashboard.yaml

This dashboard JSON is copied from Grafana's amazing dashboards site.

For this reason we haven't configured Grafana with any persistent storage so any dashboards imported or created and not put in a ConfigMap will disappear if the Pod restarts.

We can now create alerts with Prometheus Rules using the Prometheus Operator PrometheusRule:

kubectl apply -f traefik-prometheusrule.yaml

You can verify that Prometheus has got your rule configured at http://localhost:9090/rules.

Blackbox Exporter

I've also configured Prometheus Blackbox exporter on my cluster which polls HTTP endpoints. These can be anywhere on the Internet. In this case I'm just monitoring my example website to check everything is working as expected. I've also deployed another dashboard to Grafana for it.

helm upgrade --install blackbox-exporter prometheus-community/prometheus-blackbox-exporter --version 4.10.0 --values blackbox-exporter-values.yaml
kubectl apply -f blackbox-exporter-dashboard.yaml

I've picked a specific version of the Helm Chart here which I know works with my config. Feel free to remove the --version parameter to get the latest version.

Monitoring the monitoring

Xzibit Meme

But what if my cluster goes down and my monitoring goes with it? One of the alerts we have sent to the null receiver in the Prometheus Operator values is Watchdog. This is a Prometheus Rule which always fires. If you send this to somewhere outside of your cluster, you can be alerted if this "Dead Man's Switch" stops firing.

At Pulselive we developed a simple solution using AWS Lambda for this https://github.com/PulseInnovations/prometheus-deadmansswitch

k3s-monitoring's People

Contributors

cablespaghetti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

k3s-monitoring's Issues

traefik dashboard

Hi,
I've deployed the proetheus stack on a fresh K3S cluster (v1.21.7+k3s1), using values that are, for the essential part, the same as what you're proposing here.
I've deployed the traefik dashboard on grafana. BTW, it apparently needs a change in the label: grafana_dashboard: "true" does not work for me, I had to change it to grafana_dashboard: "1".

Thing is, half of the dashboard is not working right. Beginning with the backend variable, which does not resolve. Looking at which traefik variables are available on prometheus, I'm getting very few:
image
image

Do you know if this means I'm missing somthing in my config ?

Question: K8s components monitoring

Hello, why did You disabled https://github.com/cablespaghetti/k3s-monitoring/blob/master/kube-prometheus-stack-values.yaml#L1-L6 ?

You can use:

  kubeEtcd:
    enabled: false
  kubeControllerManager:
    endpoints:
    - 88.XX.XX.XX
  kubeScheduler:
    endpoints:
    - 88.XX.XX.XX
  kubeProxy:
    endpoints:
    - 88.XX.XX.XX

, it will create Service/Endpoints in kube-system, what is problematic that ip in Endpoints does not allow FQDN, so You have to know IP address apriori. For example Service/Endpoints kubelet is created with that IP by operator automatically.

EDIT: seems these are scraped via apiserver and proxy ?

Traefik metrics in latest K3s

Hi

It seems k3s version v1.21.1+k3s1 that uses Traefik V 2.4.8 does not by default have metrics enabled.

Trying to track down how to make the configuration change.

panic: Unable to create mmap-ed active query log

Oddly, I have had this chart running without issue on my 3 raspberry pis. I have upgraded to "v1.20.2+k3s1" and suddenly I am seeing this error on the prometheus container.

I've installed the helm chart as per instructions, and in this case used the same version.

level=info ts=2021-02-03T01:07:15.031Z caller=main.go:364 msg="Starting Prometheus" version="(version=2.24.0, branch=HEAD, revision=02e92236a8bad3503ff5eec3e04ac205a3b8e4fe)"
level=info ts=2021-02-03T01:07:15.032Z caller=main.go:369 build_context="(go=go1.15.6, user=root@d9f90f0b1f76, date=20210106-14:46:38)"
level=info ts=2021-02-03T01:07:15.032Z caller=main.go:370 host_details="(Linux 5.4.0-1028-raspi #31-Ubuntu SMP PREEMPT Wed Jan 20 11:30:45 UTC 2021 aarch64 prometheus-prometheus-kube-prometheus-prometheus-0 (none))"
level=info ts=2021-02-03T01:07:15.032Z caller=main.go:371 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2021-02-03T01:07:15.032Z caller=main.go:372 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2021-02-03T01:07:15.032Z caller=query_logger.go:87 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0xffffecc5c4bb, 0xb, 0x14, 0x2942c80, 0x4000362ea0, 0x2942c80)
        /app/promql/query_logger.go:117 +0x38c
main.main()
        /app/cmd/prometheus/main.go:400 +0x47e8

Adding extra scrape configuration options does not reflect change

First of all, thank you for making this tutorial and the repo.

At the end of this file: https://raw.githubusercontent.com/cablespaghetti/k3s-monitoring/master/kube-prometheus-stack-values.yaml

I add:

extraScrapeConfigs: |
- job_name: 'pihole'
    static_configs:
      - targets: ['localhost:9617']

But the configmap doesn't list this after I upgrade my helm chart with the new values. Any suggestions/pointer on what I'm doing incorrectly?

I'm trying to add this (https://github.com/eko/pihole-exporter) to my prometheus targets and have a dashboard ready to reaad from prometheus.

Thank you.

arm64 Support

This is awesome.

Sadly the solution doesn't completely work on arm64 as the prometheus-kube-state-metrics doesn't support that architecture. I tried to find some other images that might work, but haven't had any luck.

Can you point me in the right direction?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.