yunlzheng / prometheus-book Goto Github PK

Prometheus操作指南

Home Page: https://yunlzheng.gitbook.io/prometheus-book/

prometheus book gitbook prometheus2 devops kubernetes alertmanager promql metrics grafana

prometheus-book's Introduction

wangycc heijigaoke jianjianjianjianjian weizai118 qinzhao168 ottodeng flying2016 cloudislife snow8261 bwboy troyying kainhong zyshjklm wubob turpure marsty damien9527 jialzhang hongzhenglin leisheng819 mh-zook trestea melvynpan delusionxb raymondlwb lcgogo fengzixu 2d0g louyiping yuwentao toper2035 yangchuansheng heisenbergye afresher lusyoe heidsoft cloudonly nanne007 xiashuijun zhangxiangui40542 kuozo ieihadn sunxuhongaaron bingdong13 diannaowa lijunyi0198 tralikt churqule caiqing0204 hongdongxiao weiyanwei412 regardfs greengerong daijinghao wangwenchao niuxinghua forging2012 zengkefu arronsh zxylina zhenlei-sun idle0init1 feeeenng greatdiscovery wyfaq staysun ethandyp sevenplusplus cadenof wilelm123 qinrui777 xueyujie sirfengyu zhubingbing cheferrari kubeoperater kedayago yinjk uglyliu frank8862017 yang20031205 nibble2016 the-starter yan-weijie hobbytp huntsman-li stlf liyunfei65105 mrzhangxd wuyupengwoaini hulaoan xiangxingchina kqsmea8 zeuskingzb rickchen1979 lipengfei-xh allen12921 realm2boot pingod silenceshell

prometheus-book's Issues

邮件告警resolved中{{ $labels.value }}的值不是恢复后的值,这个怎么解决呢

告警邮件

[1] Firing
Labels
alertname = NodeCpuUsage
environment = infra
instance = 172.17.11.5:9100
type = cpu
Annotations
description = CPU使用率大于80%,当前值: 100%
summary = 172.17.11.5:9100 CPU故障Source

恢复邮件

[1] Resolved
Labels
alertname = NodeCpuUsage
environment = infra
instance = 172.17.11.5:9100
type = cpu
Annotations
description = CPU使用率大于80%,当前值: 100%
summary = 172.17.11.5:9100 CPU故障

附配置文件
rules:
groups:
- name: os-cpu
rules:
- alert: NodeCpuUsage
expr: ceil (100 - (avg(irate(node_cpu{mode='idle'}[5m])) by (instance) * 100)) > 90
for: 1m
labels:
type: "cpu"
annotations:
summary: "{{ $labels.instance }} CPU故障"
description: "CPU使用率大于90%,当前值: {{ $value}}%"
这个问题,困扰好久...

第5章数据与可视化

第1节使用Console Template
"读者已经对Prometheus已经有了一个相对完成的认识"中的"相对完成"应为"相对完整"
"但是确定也很明显"中的"确定"应为"缺点"

请问：怎么返回多个指标的值（显示为一行），并可以根据其中一个指标的值来排序？

添加grafana图表时，变量如何匹配？

up{instance="172.29.50.175:9256",region="ap-southeast-1"}
namedprocess_namegroup_num_procs{instance="172.29.50.175:9256",region="ap-southeast-1"}

我在同一服务器上安装了多个exporter，对应不同的端口。在grafana中我定义了$HOST变量为取出来instance IP，在图表展示时，如果我只需要 9100端口，如 node_load1{instance=~~"$HOST:9100"}，我对生成的query url进行解码，发现是 query=node_load1{instance=~~"1.1.1.1|2.2.2.2|3.3.3.3:9100"}&start=1559125515&end=1559132715&step=15"这种格式，实际上我是想匹配 query=node_load1{instance=~"1.1.1.1:9100|2.2.2.2:9100|3.3.3.3:9100"}&start=1559125515&end=1559132715&step=15", 请问这种情况表达式如何写呢？

使用手机微信 GitHub 小程序便捷查看该教程

Hi, 最近开发了一个 GitHub 的微信小程序，可以方便的在手机上查看 GitHub 的相关内容，扫描下方二维码就可以方便查看这个教程了~

小程序是开源的，有任何问题和意见可以直接提 Issue 反馈：GitHub Trending Hub

current value: {{ $value }} 报警恢复的时候，这个值不准确，怎么办？

Use 'apps/v1' to replace 'extensions/v1beta1' for example-app deployment

Since kubernetes latest version has reached 1.18, in new Version we need new apiVersion value

Use 'apps/v1' to replace 'extensions/v1beta1' for example-app deployment

path : use-operator-manage-prometheus.md

current:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: example-app

new:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app

第3章告警处理的“屏蔽告警通知”表述有严重问题

抑制描述中：“当已经发送的告警通知匹配到target_match和target_match_re规则，当有新的告警规则如果满足source_match或者定义的匹配规则，并且已发送的告警与新产生的告警中equal定义的标签完全相同，则启动抑制机制，新的告警不会发送。“
应该刚好相反，source_match是用来匹配已经存在的告警的，而target_match是用来匹配新发的待抑制的告警的。
正确的表述是：规则开始启用后，已经存在/发送的告警通知匹配到source_match和source_match_re规则，当有新的告警满足source_match或者定义的匹配规则，同时已发送的告警与新产生的告警中equal定义的标签完全相同，则启动抑制机制，新的告警不会发送。

可参见Prometheus官方文档中的注释：

# Matchers that have to be fulfilled in the alerts to be muted.
target_match:
  [ <labelname>: <labelvalue>, ... ]
target_match_re:
  [ <labelname>: <regex>, ... ]

# Matchers for which one or more alerts have to exist for the
# inhibition to take effect.
source_match:
  [ <labelname>: <labelvalue>, ... ]
source_match_re:
  [ <labelname>: <regex>, ... ]

# Labels that must have an equal value in the source and target
# alert for the inhibition to take effect.
[ equal: [<labelname>, ... ] ]

饱和度问题

您好

想问下，google四大黄金指标中的饱和度，与USE方法论中的饱和度有什么区别？

从文章中看还是不太理解，谢谢

how to edit the black_exporter's dashboard of grafana by xx-dashboard.json or who have the ready-made dashboard?

i can get the metrics as below:

https://yunlzheng.gitbook.io/prometheus-book/ 在线文档应该挂掉了；请问是不维护了吗？

prometheus的HA搭建方案

您好，prometheus的HA搭建方案可以共享下嘛，非常感谢！

执行不起来，什么原因

-bash: /usr/local/bin/node_exporter: cannot execute binary file

如何使用blackbox-export监控带账号密码认证的service服务

k8s集群中，使用blackbox-export监控带账号密码认证的service服务，不太会用，有没有例子？

gitbook页面优化

https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/quickstart/prometheus-quick-start
请问上面那个gitbook生成的文档是怎么美化操作的

please update the test data , many metrics are gone or upgraded

please update the test data, many metrics are gone or upgraded if you can.

see https://github.com/yunlzheng/prometheus-book/blob/master/quickstart/install-prometheus-server.md#%E4%BB%8E%E4%BA%8C%E8%BF%9B%E5%88%B6%E5%8C%85%E5%AE%89%E8%A3%85

https://prometheus.io/docs/prometheus/latest/getting_started/

now the version is 2.17

擦，你这篇install-prometheus-server.md的方法是Mac的

Alertmanager 高可用章节给了错误的配置范例

假如我们有三个 AM 实例，分别运行在 A-8081 B-8082 C-8083 端口，那我们应该为 A 实例配置 peer 8082 和 8083，B实例配置 8081和 8083，C 实例配置 8081 和 8082 端口。

按照你的配置方法，很容易在 A 实例出现问题时无法 bootstrap 集群。

handbook 网站使用的是什么工具？

请问一下这个 handbook 是使用什么制作的？gitbook 界面不是这样的

"最佳实践：4个黄金指标和USE方法 "能否附上一个配置实例?

假设是个10台机器的分布式应用组网, 能否建议一个配置实例供参考?
谢谢

是否能监控windows下tomcat服务？

只找到了jmx-exporter linux版本，在windows里使用的时候出现了一些错误。

另外，如果只监控服务是否运行端口是否监听可以使用blackbox_exporter（windows），有相关的安装文档吗？

有几处小的书写错误

typo issue in one md file.

In prometheus-book/exporter/custom_app_support_prometheus.md

please change "Sring" to "Spring" in this title.

Hobby

如何实现跟zabbix一样设置维护时间段的功能？

如何实现跟zabbix 一样，设置某个组或所有监控每天 1 ~ 2点为维护时段，这期间告警静默，这类需求？

”Alertmanager高可用”章节搭建案例问题

感谢作者简洁清晰的解释了 gossip 协议如何实现了多个 alertmanager 实例对来自相同 prometheus 实例的报警进行去重~~
建议对 fullmesh 结构下 alertmanager 是如何对来自不同 prometheus 实例的相同报警进行去重的提及一下，即："alertmanager 判定两个指标一致的前提是所有 label 完全一致" ；之所以增加这个描述是因为通常互为备份的两个 prometheus 会设置 external_labels 对 metrics 来源进行标记（尤其是在使用 remote write 时避免冲突），在这种场景下就需要配置alert_relabel_configs将不一致的 label 重置为一致再发送到 alertmanager

补充这部分内容可以有助于原理表述的完整性，仅供参考 :D

请教一个关于企业微信的alertmanager问题

我使用的版本如下：
alertmanager-0.18.0.linux-amd64
prometheus-2.11.1.linux-amd64
我的告警模板 wechat.tmpl 内容如下：

{{ define "wechat.tmpl" }}
{{ range .Alerts }}
========start==========
告警程序: prometheus_alert
告警级别: {{ .Labels.severity }}
告警类型: {{ .Labels.alertname }}
故障主机: {{ .Labels.instance }}
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
========end==========
{{ end }}
{{ end }}

日志无报错
level=debug ts=2019-07-17T12:53:33.495Z caller=dispatch.go:430 component=dispatcher aggrGroup="{}:{alertname="node_status"}" msg=flushing alerts=[node_status[1a2f380][active]]
level=debug ts=2019-07-17T12:53:38.496Z caller=dispatch.go:430 component=dispatcher aggrGroup="{}:{alertname="node_status"}" msg=flushing alerts=[node_status[1a2f380][active]]

也能收到企业微信告警，但是内容为空

尝试找了别的模板，也把alertmanager降了一个版本，还是一样，可能是哪里的问题呢？

请问使用client_java 自定义的exporter 如何实现账号密码登录后可见？

如题，感谢答疑

是否适用监控服务器某个端口下的Tomcat是否运行？

您好，需求是要能够知道某个服务器下的Tomcat是否运行，请问prometheus是否支持？支持的话是用node_exporter这个探针还是别的？

在应用中内置Prometheus支持不支持Spring boot 2版本

如题

如何计算当前指标与5分钟之前的差值？

我想监控某个指标的波动率，如计算CPU使用率当前使用率跟5分钟前的对比，这个差值大于多少或者百分之多少时报警，这个是有内置函数来实现，还是要用prometheus SQL来计算二者的差值？

docker安装prometheus有误

您好！
采用你的命令安装，一直提示如下错误：
[root@vm-ecs-104 ~]# docker run -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/etc/prometheus/prometheus.yml\\\" to rootfs \\\"/var/lib/docker/overlay2/2101fbe118b3d1f5c38d83fa80464e53a0aa1851089dcce978525f815c66c80d/merged\\\" at \\\"/var/lib/docker/overlay2/2101fbe118b3d1f5c38d83fa80464e53a0aa1851089dcce978525f815c66c80d/merged/etc/prometheus/prometheus.yml\\\" caused \\\"not a directory\\\"\"": unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.

查看了本机的/etc/prometheus目录，发现prometheus.yaml被创建成了目录，而不是文件。建议修改下。

另：推荐还是用Linux系统演示，发现好多都是MacOS的安装包。

yunlzheng / prometheus-book Goto Github PK

prometheus-book's Introduction

目录

Part I - Prometheus基础

Part II - Prometheus进阶

Part III - Prometheus实战

prometheus-book's People

Contributors

Stargazers

Watchers

Forkers

prometheus-book's Issues

告警邮件

恢复邮件

Recommend Projects

Recommend Topics

Recommend Org