alertmanger告警输出多个值

【1】display some metrics value in alert email

(1.1)如何在告警中让其显示多个值?且只显示 value

在rule 规则中

groups:
- name: example
  rules:
  - alert: Load alert
    expr: node_load1 > 1
    for: 5s
    labels:
      severity: page
    annotations:
      title: 'load1: {{ $value }}, load5: {{ printf `node_load5{instance="%s"}` $labels.instance | query | first | value }}, load15: {{ printf `node_load15{instance="%s"}` $labels.instance | query | first | value}}'
      summary: High load

After configuring alertmanager and adding webhook_configs,  I can capture the  result of alert as following:

{"receiver":"default","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"Load alert","instance":"127.0.0.1:9100","job":"prometheus","severity":"page"},"annotations":{"summary":"High load","title":"load1: 60.1494140625, load5: 38.009765625, load15: 23.18359375"},"startsAt":"2018-07-15T22:59:09.508199934+08:00","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://bogon:9090/graph?g0.expr=node_load1+%3E+1\u0026g0.tab=1"}],"groupLabels":{},"commonLabels":{"alertname":"Load alert","instance":"127.0.0.1:9100","job":"prometheus","severity":"page"},"commonAnnotations":{"summary":"High load","title":"load1: 60.1494140625, load5: 38.009765625, load15: 23.18359375"},"externalURL":"http://bogon:9093","version":"4","groupKey":"{}:{}"}

 

最后的结果:

We can get the values of load average in annotations: 

load1: 60.1494140625, load5: 38.009765625, load15: 23.18359375

Afert receiving the message, we know the detail of load average in a machine.

(1.2)如何计算

  

【2】告警结果如何对应多个 lables?

案例:

  如下图,我们可以发现使用率和剩余空间根本对不上,也就是说,报警的磁盘分区,和我们显示总空间的磁盘分区根本不是同一个

  

 修改后:

  

 

报警的磁盘分区,和我们显示总空间的磁盘分区已经是同一个了;

修改代码:

 description: 'mountpoint:{{ $labels.mountpoint }},device:{{ $labels.device }},当前使用率 {{ $value }}% ,总空间:{{ printf `node_filesystem_size_bytes{fstype=~"ext.?|xfs",instance="%s",mountpoint="%s"}/1024/1024/1024` $labels.instance $labels.mountpoint | query | first | value }} GB ,当前剩余 {{ printf `node_filesystem_free_bytes{fstype=~"ext.?|xfs",instance="%s"}/1024/1024/1024` $labels.instance | query | first | value }} GB '

【3】结果值如何保留2位小数?

代码参考:'Very High memory usage on {{ $labels.instance }}: {{ $value | printf "%.2f" }}%',

双引号需要转义:如描图

  

结合本文如下图:

   

操作前后:

      

【最佳配置实践】

(1)node_exporter

# swap
description: '当前使用率 {{ $value }}% ,总空间: {{ printf `node_memory_SwapTotal_bytes{instance="%s"}/1024/1024/1024` $labels.instance | query | first | value | printf "%.2f" }} GB,当前剩余空闲: {{ printf `node_memory_SwapFree_bytes{instance="%s"}/1024/1024/1024` $labels.instance | query | first | value | printf "%.2f" }}TETETE GB ' # disk
description:
'mountpoint:{{ $labels.mountpoint }},device:{{ $labels.device }},当前剩余空闲率 {{ $value }}% ,总空间:{{ printf `node_filesystem_size_bytes{fstype=~"ext.?|xfs",instance="%s",mountpoint="%s"}/1024/1024/1024` $labels.instance $labels.mountpoint | query | first | value | printf "%.2f" }} GB ,当前剩余 {{ printf `node_filesystem_free_bytes{fstype=~"ext.?|xfs",instance="%s",mountpoint="%s"}/1024/1024/1024` $labels.instance $labels.mountpoint | query | first | value | printf "%.2f" }} GB ' # memory
description:
'当前剩余内存 {{ $value }}% ,总空间:{{ printf `node_memory_MemTotal_bytes{instance="%s"}/1024/1024/1024` $labels.instance | query | first | value | printf "%.2f"}} GB ,当前剩余 {{ printf `node_memory_MemAvailable_bytes{instance="%s"}/1024/1024/1024` $labels.instance | query | first | value | printf "%.2f"}} GB '

 

 

【参考文档】

官网:https://github.com/prometheus/alertmanager/issues/549

本文转自:https://www.cnblogs.com/zhuangzebo/p/9315540.html

 

posted @ 2022-05-25 11:57  郭大侠1  阅读(210)  评论(0编辑  收藏  举报