Prometheus 钉钉告警模板

Alertrules

groups:
- name: 'node running status'
rules:
- alert: 'Instance Down'
expr: 'up == 0'
for: 5s
annotations:
title: 'Instance Down'
description: "{{ $labels.instance }}down"
labels:
robot: 'jcss'
severity: 'warning'
owner: 'xxxxxxxxxxx'
- name: 'node memory usage'
rules:
- alert: 'memory usage'
expr: '((node_memory_MemTotal_bytes - node_memory_MemFree_bytes) / node_memory_MemTotal_bytes * 100)> 85'
for: 5s
annotations:
title: 'Mem'
description: '{{ $labels.instance }} Memusage {{ $value }}'
labels:
robot: 'jcss'
ops: 'true'
severity: 'warning'
owner: "xxxxxxxxxxx"

Alertmanager Router

route:
group_by: ['alertname']
group_wait: 30s
group_interval: 1s
repeat_interval: 30s
receiver: 'ops'
routes:
- match:
ops: 'true'
receiver: 'ops'
continue: true
- match:
robot : 'jcss'
receiver: 'jcss'
receivers:
- name: 'ops'
webhook_configs:
- url: 'http://notice.liyblog.com:8060/dingtalk/ops/send'
- name: 'jcss'
webhook_configs:
- url: 'http://notice.liyblog.com:8060/dingtalk/jcss/send'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']

Prometheus-Webhook-Dingtalk

config.yml

templates:
- contrib/templates/*.tmpl
targets:
jcss:
url: https://oapi.dingtalk.com/robot/send?access_token=
secret:
mention:
mobiles: ['xxxxxxxxxxx']
ops:
url: https://oapi.dingtalk.com/robot/send?access_token=
secret:
message:
title: '{{ template "ops.title" . }}'
text: '{{ template "ops.content" . }}'

default.tmpl

{{ define "__subject" }}
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]
{{ end }}
{{ define "__alert_list" }}{{ range . }}
---
{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}
**告警名称**: {{ index .Annotations "title" }}
**告警级别**: {{ .Labels.severity }}
**告警主机**: {{ .Labels.instance }}
**告警信息**: {{ index .Annotations "description" }}
**告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}
{{ define "__resolved_list" }}{{ range . }}
---
{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}
**告警名称**: {{ index .Annotations "title" }}
**告警级别**: {{ .Labels.severity }}
**告警主机**: {{ .Labels.instance }}
**告警信息**: {{ index .Annotations "description" }}
**告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**恢复时间**: {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}
{{ define "default.title" }}
{{ template "__subject" . }}
{{ end }}
{{ define "default.content" }}
{{ if gt (len .Alerts.Firing) 0 }}
**====侦测到{{ .Alerts.Firing | len }}个故障====**
{{ template "__alert_list" .Alerts.Firing }}
---
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
**====恢复{{ .Alerts.Resolved | len }}个故障====**
{{ template "__resolved_list" .Alerts.Resolved }}
{{ end }}
{{ end }}
{{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
{{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}
{{ template "default.title" . }}
{{ template "default.content" . }}

ops.tmpl

{{ define "__ops_alert_list" }}{{ range . }}
---
**告警名称**: {{ index .Annotations "title" }}
**告警级别**: {{ .Labels.severity }}
**告警主机**: {{ .Labels.instance }}
**告警信息**: {{ index .Annotations "description" }}
**告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}
{{ define "__ops_resolved_list" }}{{ range . }}
---
**告警名称**: {{ index .Annotations "title" }}
**告警级别**: {{ .Labels.severity }}
**告警主机**: {{ .Labels.instance }}
**告警信息**: {{ index .Annotations "description" }}
**告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**恢复时间**: {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}
{{ define "ops.title" }}
{{ template "__subject" . }}
{{ end }}
{{ define "ops.content" }}
{{ if gt (len .Alerts.Firing) 0 }}
**====侦测到{{ .Alerts.Firing | len }}个故障====**
{{ template "__ops_alert_list" .Alerts.Firing }}
---
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
**====恢复{{ .Alerts.Resolved | len }}个故障====**
{{ template "__ops_resolved_list" .Alerts.Resolved }}
{{ end }}
{{ end }}

参考资料

Prometheus 官网
作者【SoulChild随笔记】的alertmanager自定义告警模板

posted @   liy36  阅读(1328)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
点击右上角即可分享
微信分享提示