Prometheus 钉钉告警模板
Alertrules
groups: - name: 'node running status' rules: - alert: 'Instance Down' expr: 'up == 0' for: 5s annotations: title: 'Instance Down' description: "{{ $labels.instance }}down" labels: robot: 'jcss' severity: 'warning' owner: 'xxxxxxxxxxx' - name: 'node memory usage' rules: - alert: 'memory usage' expr: '((node_memory_MemTotal_bytes - node_memory_MemFree_bytes) / node_memory_MemTotal_bytes * 100)> 85' for: 5s annotations: title: 'Mem' description: '{{ $labels.instance }} Memusage {{ $value }}' labels: robot: 'jcss' ops: 'true' severity: 'warning' owner: "xxxxxxxxxxx"
Alertmanager Router
route: group_by: ['alertname'] group_wait: 30s group_interval: 1s repeat_interval: 30s receiver: 'ops' routes: - match: ops: 'true' receiver: 'ops' continue: true - match: robot : 'jcss' receiver: 'jcss' receivers: - name: 'ops' webhook_configs: - url: 'http://notice.liyblog.com:8060/dingtalk/ops/send' - name: 'jcss' webhook_configs: - url: 'http://notice.liyblog.com:8060/dingtalk/jcss/send' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
Prometheus-Webhook-Dingtalk
config.yml
templates: - contrib/templates/*.tmpl targets: jcss: url: https://oapi.dingtalk.com/robot/send?access_token= secret: mention: mobiles: ['xxxxxxxxxxx'] ops: url: https://oapi.dingtalk.com/robot/send?access_token= secret: message: title: '{{ template "ops.title" . }}' text: '{{ template "ops.content" . }}'
default.tmpl
{{ define "__subject" }} [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ end }} {{ define "__alert_list" }}{{ range . }} --- {{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }} **告警名称**: {{ index .Annotations "title" }} **告警级别**: {{ .Labels.severity }} **告警主机**: {{ .Labels.instance }} **告警信息**: {{ index .Annotations "description" }} **告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }} {{ end }}{{ end }} {{ define "__resolved_list" }}{{ range . }} --- {{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }} **告警名称**: {{ index .Annotations "title" }} **告警级别**: {{ .Labels.severity }} **告警主机**: {{ .Labels.instance }} **告警信息**: {{ index .Annotations "description" }} **告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }} **恢复时间**: {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }} {{ end }}{{ end }} {{ define "default.title" }} {{ template "__subject" . }} {{ end }} {{ define "default.content" }} {{ if gt (len .Alerts.Firing) 0 }} **====侦测到{{ .Alerts.Firing | len }}个故障====** {{ template "__alert_list" .Alerts.Firing }} --- {{ end }} {{ if gt (len .Alerts.Resolved) 0 }} **====恢复{{ .Alerts.Resolved | len }}个故障====** {{ template "__resolved_list" .Alerts.Resolved }} {{ end }} {{ end }} {{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }} {{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }} {{ template "default.title" . }} {{ template "default.content" . }}
ops.tmpl
{{ define "__ops_alert_list" }}{{ range . }} --- **告警名称**: {{ index .Annotations "title" }} **告警级别**: {{ .Labels.severity }} **告警主机**: {{ .Labels.instance }} **告警信息**: {{ index .Annotations "description" }} **告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }} {{ end }}{{ end }} {{ define "__ops_resolved_list" }}{{ range . }} --- **告警名称**: {{ index .Annotations "title" }} **告警级别**: {{ .Labels.severity }} **告警主机**: {{ .Labels.instance }} **告警信息**: {{ index .Annotations "description" }} **告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }} **恢复时间**: {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }} {{ end }}{{ end }} {{ define "ops.title" }} {{ template "__subject" . }} {{ end }} {{ define "ops.content" }} {{ if gt (len .Alerts.Firing) 0 }} **====侦测到{{ .Alerts.Firing | len }}个故障====** {{ template "__ops_alert_list" .Alerts.Firing }} --- {{ end }} {{ if gt (len .Alerts.Resolved) 0 }} **====恢复{{ .Alerts.Resolved | len }}个故障====** {{ template "__ops_resolved_list" .Alerts.Resolved }} {{ end }} {{ end }}
参考资料
Prometheus 官网
作者【SoulChild随笔记】的alertmanager自定义告警模板
分类:
Prometheus
标签:
监控系统
, Prometheus
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!