alertmanger告警输出多个值
【1】display some metrics value in alert email
(1.1)如何在告警中让其显示多个值?且只显示 value
在rule 规则中
groups: - name: example rules: - alert: Load alert expr: node_load1 > 1 for: 5s labels: severity: page annotations: title: 'load1: {{ $value }}, load5: {{ printf `node_load5{instance="%s"}` $labels.instance | query | first | value }}, load15: {{ printf `node_load15{instance="%s"}` $labels.instance | query | first | value}}' summary: High load
After configuring alertmanager and adding webhook_configs, I can capture the result of alert as following:
{"receiver":"default","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"Load alert","instance":"127.0.0.1:9100","job":"prometheus","severity":"page"},"annotations":{"summary":"High load","title":"load1: 60.1494140625, load5: 38.009765625, load15: 23.18359375"},"startsAt":"2018-07-15T22:59:09.508199934+08:00","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://bogon:9090/graph?g0.expr=node_load1+%3E+1\u0026g0.tab=1"}],"groupLabels":{},"commonLabels":{"alertname":"Load alert","instance":"127.0.0.1:9100","job":"prometheus","severity":"page"},"commonAnnotations":{"summary":"High load","title":"load1: 60.1494140625, load5: 38.009765625, load15: 23.18359375"},"externalURL":"http://bogon:9093","version":"4","groupKey":"{}:{}"}
最后的结果:
We can get the values of load average in annotations:
load1: 60.1494140625, load5: 38.009765625, load15: 23.18359375
Afert receiving the message, we know the detail of load average in a machine.
(1.2)如何计算
【2】告警结果如何对应多个 lables?
案例:
如下图,我们可以发现使用率和剩余空间根本对不上,也就是说,报警的磁盘分区,和我们显示总空间的磁盘分区根本不是同一个
修改后:
报警的磁盘分区,和我们显示总空间的磁盘分区已经是同一个了;
修改代码:
description: 'mountpoint:{{ $labels.mountpoint }},device:{{ $labels.device }},当前使用率 {{ $value }}% ,总空间:{{ printf `node_filesystem_size_bytes{fstype=~"ext.?|xfs",instance="%s",mountpoint="%s"}/1024/1024/1024` $labels.instance $labels.mountpoint | query | first | value }} GB ,当前剩余 {{ printf `node_filesystem_free_bytes{fstype=~"ext.?|xfs",instance="%s"}/1024/1024/1024` $labels.instance | query | first | value }} GB '
【3】结果值如何保留2位小数?
代码参考:'Very High memory usage on {{ $labels.instance }}: {{ $value | printf "%.2f" }}%',
双引号需要转义:如描图
结合本文如下图:
操作前后:
【最佳配置实践】
(1)node_exporter
# swap
description: '当前使用率 {{ $value }}% ,总空间: {{ printf `node_memory_SwapTotal_bytes{instance="%s"}/1024/1024/1024` $labels.instance | query | first | value | printf "%.2f" }} GB,当前剩余空闲: {{ printf `node_memory_SwapFree_bytes{instance="%s"}/1024/1024/1024` $labels.instance | query | first | value | printf "%.2f" }}TETETE GB ' # disk
description: 'mountpoint:{{ $labels.mountpoint }},device:{{ $labels.device }},当前剩余空闲率 {{ $value }}% ,总空间:{{ printf `node_filesystem_size_bytes{fstype=~"ext.?|xfs",instance="%s",mountpoint="%s"}/1024/1024/1024` $labels.instance $labels.mountpoint | query | first | value | printf "%.2f" }} GB ,当前剩余 {{ printf `node_filesystem_free_bytes{fstype=~"ext.?|xfs",instance="%s",mountpoint="%s"}/1024/1024/1024` $labels.instance $labels.mountpoint | query | first | value | printf "%.2f" }} GB ' # memory
description: '当前剩余内存 {{ $value }}% ,总空间:{{ printf `node_memory_MemTotal_bytes{instance="%s"}/1024/1024/1024` $labels.instance | query | first | value | printf "%.2f"}} GB ,当前剩余 {{ printf `node_memory_MemAvailable_bytes{instance="%s"}/1024/1024/1024` $labels.instance | query | first | value | printf "%.2f"}} GB '
【参考文档】
官网:https://github.com/prometheus/alertmanager/issues/549
本文转自:https://www.cnblogs.com/zhuangzebo/p/9315540.html