yunlzheng / prometheus-book Goto Github PK
View Code? Open in Web Editor NEWPrometheus操作指南
Home Page: https://yunlzheng.gitbook.io/prometheus-book/
Prometheus操作指南
Home Page: https://yunlzheng.gitbook.io/prometheus-book/
[1] Firing
Labels
alertname = NodeCpuUsage
environment = infra
instance = 172.17.11.5:9100
type = cpu
Annotations
description = CPU使用率大于80%,当前值: 100%
summary = 172.17.11.5:9100 CPU故障Source
[1] Resolved
Labels
alertname = NodeCpuUsage
environment = infra
instance = 172.17.11.5:9100
type = cpu
Annotations
description = CPU使用率大于80%,当前值: 100%
summary = 172.17.11.5:9100 CPU故障
附配置文件
rules:
groups:
- name: os-cpu
rules:
- alert: NodeCpuUsage
expr: ceil (100 - (avg(irate(node_cpu{mode='idle'}[5m])) by (instance) * 100)) > 90
for: 1m
labels:
type: "cpu"
annotations:
summary: "{{ $labels.instance }} CPU故障"
description: "CPU使用率大于90%,当前值: {{ $value}}%"
这个问题,困扰好久...
第1节 使用Console Template
"读者已经对Prometheus已经有了一个相对完成的认识"中的"相对完成"应为"相对完整"
"但是确定也很明显"中的"确定"应为"缺点"
up{instance="172.29.50.175:9256",region="ap-southeast-1"}
namedprocess_namegroup_num_procs{instance="172.29.50.175:9256",region="ap-southeast-1"}
我在同一服务器上安装了多个exporter,对应不同的端口。在grafana中我定义了$HOST变量为取出来instance IP, 在图表展示时,如果我只需要 9100端口,如 node_load1{instance="$HOST:9100"},我对生成的query url进行解码,发现是 query=node_load1{instance="1.1.1.1|2.2.2.2|3.3.3.3:9100"}&start=1559125515&end=1559132715&step=15"这种格式,实际上我是想匹配 query=node_load1{instance=~"1.1.1.1:9100|2.2.2.2:9100|3.3.3.3:9100"}&start=1559125515&end=1559132715&step=15", 请问这种情况表达式如何写呢?
Hi, 最近开发了一个 GitHub 的微信小程序,可以方便的在手机上查看 GitHub 的相关内容,扫描下方二维码就可以方便查看这个教程了~
小程序是开源的,有任何问题和意见可以直接提 Issue 反馈:GitHub Trending Hub
current value: {{ $value }} 报警恢复的时候,这个值不准确,怎么办?
Since kubernetes latest version has reached 1.18, in new Version we need new apiVersion value
Use 'apps/v1' to replace 'extensions/v1beta1' for example-app deployment
path : use-operator-manage-prometheus.md
current:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: example-app
new:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
抑制描述中:“当已经发送的告警通知匹配到target_match和target_match_re规则,当有新的告警规则如果满足source_match或者定义的匹配规则,并且已发送的告警与新产生的告警中equal定义的标签完全相同,则启动抑制机制,新的告警不会发送。“
应该刚好相反,source_match是用来匹配已经存在的告警的,而target_match是用来匹配新发的待抑制的告警的。
正确的表述是:规则开始启用后,已经存在/发送的告警通知匹配到source_match和source_match_re规则,当有新的告警满足source_match或者定义的匹配规则,同时已发送的告警与新产生的告警中equal定义的标签完全相同,则启动抑制机制,新的告警不会发送。
可参见Prometheus官方文档中的注释:
# Matchers that have to be fulfilled in the alerts to be muted.
target_match:
[ <labelname>: <labelvalue>, ... ]
target_match_re:
[ <labelname>: <regex>, ... ]
# Matchers for which one or more alerts have to exist for the
# inhibition to take effect.
source_match:
[ <labelname>: <labelvalue>, ... ]
source_match_re:
[ <labelname>: <regex>, ... ]
# Labels that must have an equal value in the source and target
# alert for the inhibition to take effect.
[ equal: [<labelname>, ... ] ]
您好
想问下,google四大黄金指标中的饱和度,与USE方法论中的饱和度有什么区别?
从文章中看还是不太理解,谢谢
您好,prometheus的HA搭建方案可以共享下嘛,非常感谢!
-bash: /usr/local/bin/node_exporter: cannot execute binary file
k8s集群中,使用blackbox-export监控带账号密码认证的service服务,不太会用,有没有例子?
https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/quickstart/prometheus-quick-start
请问上面那个gitbook生成的文档是怎么美化操作的
please update the test data, many metrics are gone or upgraded if you can.
https://prometheus.io/docs/prometheus/latest/getting_started/
now the version is 2.17
擦,你这篇install-prometheus-server.md的方法是Mac的
假如我们有三个 AM 实例,分别运行在 A-8081 B-8082 C-8083 端口,那我们应该为 A 实例配置 peer 8082 和 8083,B实例配置 8081和 8083,C 实例配置 8081 和 8082 端口。
按照你的配置方法,很容易在 A 实例出现问题时无法 bootstrap 集群。
请问一下这个 handbook 是使用什么制作的?gitbook 界面不是这样的
假设是个10台机器的分布式应用组网, 能否建议一个配置实例供参考?
谢谢
In prometheus-book/exporter/custom_app_support_prometheus.md
please change "Sring" to "Spring" in this title.
Hobby
如何实现跟zabbix 一样,设置某个组或所有监控每天 1 ~ 2点 为维护时段,这期间告警静默,这类需求?
感谢作者简洁清晰的解释了 gossip 协议如何实现了多个 alertmanager 实例对来自相同 prometheus 实例的报警进行去重~~
建议对 fullmesh 结构下 alertmanager 是如何对来自不同 prometheus 实例的相同报警进行去重的提及一下,即:"alertmanager 判定两个指标一致的前提是所有 label 完全一致" ;之所以增加这个描述是因为通常互为备份的两个 prometheus 会设置 external_labels 对 metrics 来源进行标记(尤其是在使用 remote write 时避免冲突),在这种场景下就需要配置alert_relabel_configs将不一致的 label 重置为一致再发送到 alertmanager
相关讨论以及解释见:
prometheus/alertmanager#1448
https://www.robustperception.io/high-availability-prometheus-alerting-and-notification
补充这部分内容可以有助于原理表述的完整性,仅供参考 :D
我使用的版本如下:
alertmanager-0.18.0.linux-amd64
prometheus-2.11.1.linux-amd64
我的告警模板 wechat.tmpl 内容如下:
{{ define "wechat.tmpl" }}
{{ range .Alerts }}
========start==========
告警程序: prometheus_alert
告警级别: {{ .Labels.severity }}
告警类型: {{ .Labels.alertname }}
故障主机: {{ .Labels.instance }}
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
========end==========
{{ end }}
{{ end }}
日志无报错
level=debug ts=2019-07-17T12:53:33.495Z caller=dispatch.go:430 component=dispatcher aggrGroup="{}:{alertname="node_status"}" msg=flushing alerts=[node_status[1a2f380][active]]
level=debug ts=2019-07-17T12:53:38.496Z caller=dispatch.go:430 component=dispatcher aggrGroup="{}:{alertname="node_status"}" msg=flushing alerts=[node_status[1a2f380][active]]
也能收到企业微信告警,但是内容为空
尝试找了别的模板,也把alertmanager降了一个版本,还是一样,可能是哪里的问题呢?
如题,感谢答疑
您好,需求是要能够知道某个服务器下的Tomcat是否运行,请问prometheus是否支持?支持的话是用node_exporter这个探针还是别的?
如题
我想监控某个指标的波动率,如计算CPU使用率当前使用率跟5分钟前的对比,这个差值大于多少或者百分之多少时报警,这个是有内置函数来实现,还是要用prometheus SQL来计算二者的差值?
您好!
采用你的命令安装,一直提示如下错误:
[root@vm-ecs-104 ~]# docker run -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/etc/prometheus/prometheus.yml\\\" to rootfs \\\"/var/lib/docker/overlay2/2101fbe118b3d1f5c38d83fa80464e53a0aa1851089dcce978525f815c66c80d/merged\\\" at \\\"/var/lib/docker/overlay2/2101fbe118b3d1f5c38d83fa80464e53a0aa1851089dcce978525f815c66c80d/merged/etc/prometheus/prometheus.yml\\\" caused \\\"not a directory\\\"\"": unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.
查看了本机的/etc/prometheus目录,发现prometheus.yaml被创建成了目录,而不是文件。建议修改下。
另:推荐还是用Linux系统演示,发现好多都是MacOS的安装包。
其中以后面是不是少了什么词呐,比如_name
点此直达
我在使用Prometheus联邦制时,通过grafana变量自定义模板,发现不能获取集群下的主机信息。
你好,建议文档中的中英文之间添加空格
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.