alertmanager 通过钉钉告警&自定义告警模板

准备工作

  通过 钉钉 进行告警,首先需要有一个钉钉群。和钉钉机器人。

进入钉钉群设置中的智能群助手

 

 

添加自定义机器人

 

 

 

 

 

 

 

 

安全设置选择加签

  此处需要保存加签秘钥。

 

 

完成添加器人

  此处需要保存 webhook 地址。

部署 prometheus-webhook-dingtalk

   Github上已经有人写好了项目可以直接使用,用于钉钉机器人推送告警信息
   项目地址https://github.com/timonwong/prometheus-webhook-dingtalk

部署dingtalk

wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.0.0/prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz

tar xf prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz 
ln -s prometheus-webhook-dingtalk-2.0.0.linux-amd64 prometheus-webhook

准备配置文件

mv config.example.yml config.yml
[root@ops prometheus-webhook]# cat config.yml 
## Request timeout
timeout: 5s   # 超时时间
# 模板文件
templates:
  - templates/alertmanager-dingtalk.tmpl

targets:
  webhook1:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxx    # webhook 地址
    # secret for signature
    secret: SECdcc64f10xxxxxx                                       # 加签秘钥
    message:
      text: '{{ template "email.to.message" . }}'

准备模板文件

[root@ops prometheus-webhook]# cat templates/alertmanager-dingtalk.tmpl 
{{ define "email.to.message" }}

{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}

=========  **监控告警** =========  

**告警程序:**     Alertmanager   
**告警类型:**    {{ $alert.Labels.alertname }}   
**告警级别:**    {{ $alert.Labels.severity }} 级   
**告警状态:**    {{ .Status }}   
**故障主机:**    {{ $alert.Labels.instance }} {{ $alert.Labels.device }}   
**告警主题:**    {{ .Annotations.summary }}   
**告警详情:**    {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}   
**主机标签:**    {{ range .Labels.SortedPairs  }}  </br> [{{ .Name }}: {{ .Value | markdown | html }} ] 
{{- end }} </br>

**故障时间:**    {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}  
========= = end =  =========  
{{- end }}
{{- end }}

{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}

========= 告警恢复 =========  
**告警程序:**     Alertmanager   
**告警主题:**    {{ $alert.Annotations.summary }}  
**告警主机:**    {{ .Labels.instance }}   
**告警类型:**    {{ .Labels.alertname }}  
**告警级别:**    {{ $alert.Labels.severity }} 级   
**告警状态:**    {{   .Status }}  
**告警详情:**    {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}  
**故障时间:**    {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}  
**恢复时间:**    {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}  

========= = **end** =  =========
{{- end }}
{{- end }}
{{- end }}

准备配置文件

[root@ops prometheus-webhook]# cat /usr/lib/systemd/system/prometheus-webhook-dingtalk.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/usr/local/prometheus-webhook-dingtalk-2.0.0.linux-amd64/prometheus-webhook-dingtalk  --config.file=/usr/local/prometheus-webhook-dingtalk-2.0.0.linux-amd64/config.yml
[Install]
WantedBy=default.target

启动程序

systemctl start prometheus-webhook-dingtalk.service 
systemctl status prometheus-webhook-dingtalk.service 
systemctl enable prometheus-webhook-dingtalk.service 

alertmanager 配置文件

route:                                      # 路由组
  group_by: ['alertname', 'app']                   # 
  group_wait: 30s
  group_interval: 40s
  repeat_interval: 1m
  receiver: webhook                 # 默认组


  
receivers:                                 # 收件人组
- name: webhook
  webhook_configs:
  - url: http://localhost:8060/dingtalk/webhook1/send     # dingtalk 地址
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match_re:
      severity: '.*'
    equal: ['instance']

测试

告警事件

 

 

告警恢复

 

 

 

 

posted @ 2021-11-13 19:00  闫世成  阅读(2607)  评论(0编辑  收藏  举报