Coder Social home page Coder Social logo

grafana-dashboards's Introduction

Grafana Dashboards

Node Exporter Full

  • For node_exporter
  • Monitor Linux system.

Only requires the default job_name: node, add as many targets as you need in /etc/prometheus/prometheus.yml.

  - job_name: node
    static_configs:
      - targets: ['localhost:9100']

Recommended for prometheus-node-exporter the arguments --collector.systemd and --collector.processes because the graph uses some of their metrics.

  • timeInterval in the Grafana data source has to be set accordingly to the > scrape_interval configured in Prometheus. You can do this by navigating to connections > Data sources > Prometheus and set Scrape Interval under Interval behaviour. When using provisioning, this is set with the attribute jsonData.timeInterval.
  • For prometheus-node-exporter v.0.16 or older, use node-exporter-full-old.> json
  • Thanks to the PCP project for document the values reported > by the kernel in /proc (in their /pmdas/linux/help src file mainly).

Node Exporter FreeBSD

  • For node_exporter in FreeBSD system
  • Monitor FreeBSD system.

Only requires a configured target under any job_name.

Haproxy Full (deprecated)

  • For haproxy_exporter
  • Monitor Haproxy service.

Only requires a configured target under any job_name.

Haproxy 2 Full

  • For Haproxy compiled with Prometheus support
  • Monitor Haproxy service direct.

Only requires a configured target under any job_name.

Apache Full

  • Monitor Apache service.

Moved to https://github.com/grafana/jsonnet-libs

NFS Full

  • For node_exporter
  • Monitor all NFS and NFSd exported values.

Check that the process was started with the arguments --collector.nfs and --collector.nfsd.

The same as Node Exporter Full. Only requires the default job_name: node, add as many targets as you need in /etc/prometheus/prometheus.yml.

BIND 9 Full

Required configuration in /etc/bind/named.conf.options:

statistics-channels {
  inet 127.0.0.1 port 8053 allow { 127.0.0.1; };
};

On Grafana, it only requires a configured target under any job_name. For example:

  - job_name: 'bind'
    static_configs:
        - targets:
           - server_hostname:9000

Unbound Full

Required configuration in /etc/unbound/unbound.conf:

        extended-statistics: yes

remote-control:
        control-enable: yes
        control-interface: /run/unbound.ctl

On Grafana, it only requires a configured target under any job_name. For example:

  - job_name: 'unbound'
    static_configs:
        - targets:
           - server_hostname:9167

grafana-dashboards's People

Contributors

ava-jyothi avatar bananeweizen avatar byteborg avatar calestyo avatar cktse avatar cobexer avatar desaintmartin avatar dragoangel avatar druggeri avatar elboulangero avatar f9n avatar geezabiscuit avatar grembo avatar imetlenko avatar izeye avatar jellevdk avatar jobcespedes avatar jsbergbau avatar kibab avatar leewis101 avatar maesterz avatar mindw avatar psvmcc avatar rfmoz avatar rfrail3 avatar robbat2 avatar semekh avatar smholvoet avatar towolf avatar waindor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grafana-dashboards's Issues

Port must be the same for multiple targets

I noticed that if the port on all targets is not the same, the the Dashboard picks only one of them as the port and displays only that one, presenting the targets with different ports as 'no data points'

My workaround was to have a consistent node-exporter port for all hosts.

Works only on Debian servers and NOT Ubuntu

Hi,
I have installed node-exporter using apt install on all my servers - and I can ONLY get data from my Debian Servers (Debian 10) and NOT from my Ubuntu Servers (18.0.4) At all.
Any one have the same issue?
Please let me know how to make it work with Ubuntu Servers.

Thanks.

HAProxy 2.0 - broken numbers

Hi,

I am a bit curious how to read broken numbers in this dashboard.
So for example number of connections (current) = 0.133
Number of HTTP Responses, min, max and current, 0.067, 4.333, 0.067.
Why is the current connection not either 0 or 1, but 0.133?

See also the below image

haproxy_grafana

Regards,

Maarten

node_exporter 0.16 has LOTS of different labels

the new version appends the unit to the metric, i.e. node_boot_time becomes node_boot_time_seconds.
therefore the entire dashboard is not working.
Hope you find an automated way to rename them ;)

Nothing loads on fresh install

On a fresh install, no metrics load and none of the dropdowns work.

I'm using the following versions:

node_exporter, version 0.18.1 (branch: , revision: )
prometheus, version 2.18.1 (branch: non-git, revision: non-git)
Grafana version 7.0.0

My prometheus.yml is very simple:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "prometheus"
    static_configs:
    - targets: ["localhost:9090"]
  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]

I've confirmed there are no issues between Prometheus/Grafana/node_exporter that are readily apparent - I'm able to create charts in Grafana that are in Prometheus and from node_exporter.

Attaching a screenshot of what I see.

Screen Shot 2020-05-22 at 10 29 48 AM

Incorrect graph for node_disk_io_now

In the panel "Disk Detail" is a graph "Disk IOs Current In Progress". The query it uses is:

irate(node_disk_io_now{instance=~"$node:$port",job=~"$job"}[5m])

However, node_disk_io_now is not a counter, it's a gauge:

# HELP node_disk_io_now The number of I/Os currently in progress.
# TYPE node_disk_io_now gauge

The io_now value is field 9 documented here:

Field  9 -- # of I/Os currently in progress
    The only field that should go to zero. Incremented as requests are
    given to appropriate struct request_queue and decremented as they finish.

Therefore I believe the irate(...) wrapper needs to be removed. This is already an instantaneous snapshot of the outstanding I/O requests.

图没有数据问题(即启用collector的方法)

我是在kubernets 上部署的node-exporter,请问一下启用 netstat vmstat ,是在镜像启动添加args 参数吗 ? 我的写法为
- image: prom/node-exporter:latest
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
args:
- --collector.netstat.fields=(.) --collector.vmstat.fields=(.) --collector.interrupts

启动没有问题,但是图还是没有补全。请执教

Not working if machine has empty api port

This is my node uname on rancher with custom hostname

node_uname_info{domainname="(none)",endpoint="https",instance="c1",job="node-exporter",machine="x86_64",namespace="monitoring",nodename="c1",pod="node-exporter-l922f",release="5.3.0-19-generic",service="node-exporter",sysname="Linux",version="#20~18.04.2-Ubuntu SMP Tue Oct 22 18:09:07 UTC 2019"}
--

Your dashboard is expecting all nodenames to be host:port which means I can't view any data...

How to use with multiple scrape_configs?

My prometheus.yml looks like this:

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
  - job_name: 'node'

    # Override the global default and scrape targets from this job every 10 seconds.
    scrape_interval: 10s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
        - targets: ['localhost:9090','cadvisor:8080','node-exporter:9100']

  - job_name: 'servers'
    static_configs:
        - targets: ['server1','server2']
          labels:
            group: 'production'

When I import the dashboard, I can only select the Prometheus node itself. How can I view server1 and server2 stats with this dashboard?

no hosts match

I was able to load your dashboard after changing the datasource name but cannot get any hosts to appear.
I verified that all of my other dashboards work where the label="Host:" matches the name="host" which I also changed.
How can I debug this to get your dashboard to work?

Issue with v16

I've just tried to use v16 on a new install (node_exporter 0.18.1) - all I get is the top 'filter bar' but no graphs etc.

Everything seems to work as expected with v15.

Let me know if there's any extra debugging info I can provide...

Grafana Dashborard compatible

Hi,

I did not find if your node_exporter_full dashboard is compatible with node_exporter 1.0.0 or high, can you tell me if it is compatible?

count_scalar gone from prometheus 1.8.2

In prometheus 1.8.2's change log, it said that the function count_scalar is removed,so, some dashboards which use this function may cause an error. which function i can use to replace? maybe 'absent'?

error thrown is "grafana unknown function with name count_scalar" for one of the graphs.
solution is to replace count_scalar(...) with scalar(count(...))
credit: https://github.com/grafana/grafana-plugins/issues/45

CPU metric computation

Hi,
Thanks for all your hard work! It's nice not to have to create all those panels from scratch!
I have a question about the CPU usage PromQL. I've noticed that on multi-cpu systems the CPU panels if the Y-axis is set to autoscale the CPU usage sums to N*100 where N is the number of CPUs.
Depending on a axis scaling to make the graph come out to 100% caused me to question the whole formula, so I hunted around the web and found this blog post: https://movio.co/blog/prometheus-lighting-the-way/ with a different formula that comes out to 100(%) even when autoscaled.
So for example for 'user' time, instead of:
sum by (instance)(irate(node_cpu_seconds_total{mode="user",instance="$node",job="$job"}[5m])) * 100
this:
avg(irate(node_cpu_seconds_total{mode='user',instance="$node",job="$job"}[5m])) * 100

Thanks again!
C.

Support meaningful instance labels

When using meaningful instance labels there is only a name, and no port number, in the instance label. Unfortunately, the existing node_exporter dashboard does not work without a port.

I suggest using the instance label directly.

Also, given that job and node are not multi-select, these can use = rather than =~ for matching (more efficient, less likely to trip up on regexp metacharacters)

I've made these changes in 12486 which is a direct fork of 1860.

Missing Copyright Info

Hi team,
I couldn’t identify the copyright information. I am not sure if the copyright info is accurate in the License file. If it is not, do you mind to provide the copyright information, maybe in the copyright notice file?

Make resolution consistent (not mixture of 1/2 and 1/4)

In node-exporter-full, some graphs have resolution 1/4 (which really hides important detail) and some have resolution 1/2; sometimes even mixed on the same graph.

image

Personally I'd prefer 1/1, but I can understand people wanting a faster draw time with 1/2. However if that's the case, it would still be better to make it 1/2 consistently everywhere.

I can't quite work out what's going on in the JSON. Some items have a small step, and some huge:

$ grep '"step"' node-exporter-full.json  | sort | uniq -c
   2               "step": 2
   1               "step": 20
  82               "step": 240
 107               "step": 4
   4               "step": 480
   9               "step": 8
   1           "step": 1800
  14           "step": 240
  11           "step": 900

Network Traffic Basic

The panel Network Traffic basic does work very well for hosts that had multiple containers running and then stopped.

Should we add sum(irate(.....)) to the queries ?

Job selector is not working

I click Job selector and all the jobs are displayed, then I click any of the jobs to get it selected, but the top one is always active in result. Any ideas?

haproxy dashboard - some connection metrics does not exist

The haproxy dashboard uses the haproxy_server_connections_total, haproxy_frontend_connections_total and haproxy_backend_connections_total metrics but this metrics seems not the exist.

When i query prometheus directly i get nothing. It also shows not up with the autocomplete feature so its not just that the the metric is 0.

All other metrics just work fine (even stuff like haproxy_server_connection_errors_total) and the dashboard is really useful :-)

I've also searched the metricsname in https://github.com/prometheus/haproxy_exporter sourcecode but found nothing.

So where are these metrics coming from?

no data on dashboards

This is my yml config, what is the problem?
scrape_configs:
- job_name: node
static_configs:
- targets: ['192.168.0.17:9100']
labels:
instance: 192.168.0.17:9100

Convert to using [$__rate_interval]

Grafana 7.2 (released 23 Sep 2020) introduces a new variable for prometheus queries, $__rate_interval - see doc link.

This is intended to get rid of the problems around irate() and rate() queries missing spikes where the graph interval skips over them.

To apply this on the node exporter full dashboard, you'd change every instance of

irate(....[5m])

to

rate(...[$__rate_interval])

The way it works: $__rate_interval is equal to the sum of the graph step (the time interval between horizontal data points) and the prometheus sampling interval (set as the sample rate in the data source definition).

Remember that rate() calculates the rate between the first and last data points contained within the window. So, say you are scraping node_exporter at 1 minute intervals. Then rate(...[6m]) contains 6 data points, and calculates the rate over the 5 minute period between the first and last point in the window.

Consider various different zoom levels for grafana for the intervals between data points on the X axis:

  • 1 minute interval: you get rate(...[2m])
  • 5 minute interval: you get rate(...[6m])
  • 1 hour interval: you get rate(...[61m])

In each case, the rate correctly calculates the average over the time period between two data points. Spikes are never missed - although of course if you're averaging a spike over a longer time period then the peak shown will be lower.

The only downside I can see for doing this is that it will make the dashboard only usable with grafana 7.2 and later.

Haproxy Monitoring through Grafana

Hi,

I have imported haproxy json in grafana server and add prometheus datasource in it but that data source is unable to connect and plugin is not fetching data. Please look into attached screenshots.
haproxy1
haproxy2
output

Looking forward to hear from you.
Thank you
Regards,
rlinux57

File system alerts in Grafana using Promethius

Hi,

I using the below query for my new alerts to monitor FS usage. However this query shows all the nodes that have grafana client running on them. I am trying to filter only postgres nodes. Can anyone help tweak the below query to achieve the desired results?

( 1 - (node_filesystem_free_bytes{device!'rootfs'} /
node_filesystem_size_bytes{device!
'rootfs'})) * 100 * on(instance) group_left(nodename) (node_uname_info)
image

RAM used Gauge misleading formula in node-exporter-full

"RAM used" gauge improperly reports RAM allocation.
It can make you think applications could run out of memory, which is not the case.

My proposal for the formula is the following, which aligns with the way RAM used is calculated in the "Memory basic" Panel.

(node_memory_MemTotal_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_MemFree_bytes{instance=~"$node:$port",job=~"$job"} - (node_memory_Cached_bytes{instance=~"$node:$port",job=~"$job"} + node_memory_Buffers_bytes{instance=~"$node:$port",job=~"$job"})) / (node_memory_MemTotal_bytes{instance=~"$node:$port",job=~"$job"}) * 100

filesystem fill up time

One reason why I still have the host stats dashboard is because it has this neat little table of "Filesystem Fill Up Time" which (tries to?) compute the time at which the filesystem will fill up.

I don't think it's working very well because the results are just off here. But it got me thinking about how this could be implemented and whether you'd be interested in adding this to the dashboard...

The hosts stats dashboard uses this formula:

(node_filesystem_size_bytes{job='node',instance='$instance'} - node_filesystem_free_bytes{job='node',instance='$instance'}) / deriv(node_filesystem_free_bytes{job='node',instance='$instance',fstype!='rootfs',mountpoint!~'/(run|var).*',mountpoint!=''}[3d]) > 0

This blog post suggests instead just using the derivative as a base:

(deriv(node_filesystem_free{device=~"/dev/sd.*",instance=~"$node:.*"}[4h]) > 0)

I would suggest using node_filesystem_avail_bytes in any case, as that is the user-visible metric that will detect actual failures in userspace...

I'm not very familiar with Prometheus formulas, so I'm not sure how it works. I suspect it just doesn't, because it gives me negative numbers here (they don't show up) or absurd estimates (293481462547366 year for a 99% full disk), etc.

Yet this could be an interesting addition.

No values shown

I have a setup with node_exporter 0.15.2, Prometheus 2.1 and Grafana 4.6.3
I am not able to display any values.

When I open Prometheus, it is able to show values from node_exporter. Other dashboards work too.

novalues

Network unit measure on Node Exporter Full

Hello,

Network speed is measure on bits per second not bytes. The metrics collected by node_exporter are given in bytes, for example:
node_network_receive_bytes_total
This needs to be multiplied by 8 to get the real value in bits.
To be more specific, the "Network Traffic Basic" has the following query:
irate(node_network_receive_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])
This query shows the data as it's given by the node_exporter, in other words in bytes.

The query should be:
irate(node_network_receive_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])*8

As a consequence of this, the "unit measure" to show the data in the left "Y" axis needs to be change to "bits/sec" under "data rate":
image

Grafana dashboard multiple series error

Hi, I'm using this dashboard 3894 and it works great. I'm running the exporter as sidecar on kubernetes. Problem is when kubernetes restarts deployment it gives random name and grafana dashboard is getting Multiple series error. Anyone know how can I fix this error?
image

Multi Series Error on Node Exporter Full

I am using Node Exporter Full on my Prometheus setup, everything was working fine, until today out of no reason, some graphs gave me Multi Series Error, but on newly added server, it is working fine.

image

Multiple Series Error

Hi, I'm having a bit of a problem. I've imported the dashboard but many of the graphs show error "Multiple Series Error" so not loading any data

screen shot 2017-07-11 at 15 14 58

I haven't changed anything, basic installed of prometheus, grafana and then imported your dashboard.

Parameter correction for a fuly working dashboard

For get a fully working dashboard, this is the right start commend. The documantation contains brackets instead of "

nohup ./node_exporter —collector.netstat.fields=“.” --collector.vmstat.fields=“.” --collector.interrupts &

Encounter negative values of CPU Busy during upgrade/downtime of platform

Hello,

In our use case, we imported the node-exporter-full dashboard to monitor our platform.

As shown in the screenshot, we got negative values during the upgrade/downtime.
image
so we are considering switching to another expression to avoid the unexpected negative values.

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle",instance=~"${instanceIP}:${PORT_NODE_EXPORTER}"}[5m])) * 100)

image

Would do you consider it's an issue or a bug on your side?
And how would you suggest to solve the issue?

Thanks in advance!

node-exporter-full add panels for processes collector

Hi

would it be possible to add panels for the processes collector:

# HELP node_processes_max_processes Number of max PIDs limit
# TYPE node_processes_max_processes gauge
node_processes_max_processes 32768
# HELP node_processes_max_threads Limit of threads in the system
# TYPE node_processes_max_threads gauge
node_processes_max_threads 30441
# HELP node_processes_pids Number of PIDs
# TYPE node_processes_pids gauge
node_processes_pids 185
# HELP node_processes_state Number of processes in each state.
# TYPE node_processes_state gauge
node_processes_state{state="S"} 185
# HELP node_processes_threads Allocated threads in system
# TYPE node_processes_threads gauge
node_processes_threads 212

This will require to start the node_exporter with --collector.processes because this collector is disabled by default.

Is it possible to get Top URLs by Response Code?

Hi,

For the HAProxy grafana dashboard, is it possible to build a graph for Top URL's (http_request) accessed and their response codes?
From the metrics gathered from Prometheus I can't seem to find the http_request being captured as part of stats. How can this be enabled?

node-exporter-full: templating variable

Hey there, a more stable label for templating off might be node_exporter_build_info. There is a version label on that as well which might be useful 🤷‍♂️

Node Exporter Full 0.16 problem when node_exporter is on different ports

hi Ricardo, first of all - thank you for this excellent dashboard.

On my servers, some of the node_exporters are on port 9100 and some are on port 80.
I discovered that when I use the Host dropdown to select a host where the metrics are on port 80
that the dashboards are all empty (no data). I enabled the port dropdown with the following change:

19505c19505
<         "hide": 2,
---
>         "hide": 0,

When I do that, and select port 80 from the dropdown, then the dashboard charts get populated.
Is this an issue with the regex on line 19513 of the json file?
We basically need the port to track the host appropriately.

here is the relevant section of my prometheus.yml - you can scrape my URLs to test

  # node_exporter metrics from various servers
  - job_name: node
    scrape_interval: 60s
    static_configs:
      - targets:
        - localhost:9100

      - targets:
        - devops.fywss.com:80
        - thrash.fywss.com:80
        - home.fywss.com:80

Cheers
Steve

HAProxy Query Params Outdated

HI, thanks for your amazing Dashboard.
We use it for our HAProxy Installation. Unfortunately it seems as several metric names must have changed. Therefor we had to compensate it via following config:

    metric_relabel_configs:
    - source_labels: 
        - __name__
        - proxy
      target_label: frontend
      action: replace
      regex: (haproxy_frontend_.*);(.*)
      replacement: ${2}
    - source_labels:
        - __name__
        - proxy
      target_label: backend
      action: replace
      regex: (haproxy_backend_.*);(.*)
      replacement: ${2}
    - source_labels:
        - __name__
        - proxy
      target_label: backend
      action: replace
      regex: (haproxy_server_.*);(.*)
      replacement: ${2}
    - source_labels:
        - __name__
      target_label: __name__
      regex: haproxy_process_jobs
      replacement: haproxy_up
    - source_labels:
        - __name__
      target_label: __name__
      regex: haproxy_backend_status
      replacement: haproxy_backend_up
    - source_labels:
        - __name__
      target_label: __name__
      regex: haproxy_server_connection_attempts_total
      replacement: haproxy_server_connections_total
    - source_labels:
        - __name__
      target_label: __name__
      regex: haproxy_server_status
      replacement: haproxy_server_up
    - regex: proxy
      action: labeldrop

Without this workaround your Dashboard doesn't even find any host, backend and frontends.Main reason is, that HAProxys' built in prom exporter labels "frontend" and "backend" as "proxy".

Despite relabeling I wasn't to provide a replacement for:

  • haproxy_server_current_session_rate
  • haproxy_server_check_duration_milliseconds (-> now called "haproxy_server_check_duration_seconds")

thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.