prometheus系列【3】告警

告警主要由alertmanager实现,由prometheus配置的rules规则触发,然后由alertmanager来进行告警。

alertmanager支持多种途径的告警,例如邮件、短信、微信、钉钉等等等。。
我们这里说明微信邮件的告警。

告警配置alertmanager.yml

global:
  resolve_timeout: 1m
  
  # 微信告警
  wechat_api_corp_id: 'xxxxxx'
  wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
  wechat_api_secret: 'xxxxxxx'

  # 邮件告警
  smtp_smarthost: 'mail.xxx.com:25'
  smtp_from: '123456@qq.com'
  smtp_auth_username: '123456@qq.com'
  smtp_auth_password: '123456'
  smtp_require_tls: false

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 30m
  receiver: 'email'

  routes:
  - match_re:
  # 可分别匹配相关的job触发不同的告警方式
      job: .*
    receiver: 'wechat'
    repeat_interval: 2h 
  - match_re:
      job: .*
    receiver: 'email'
    repeat_interval: 2h
  
receivers:
- name: 'wechat'
  wechat_configs:
  - send_resolved: true
    to_party: 'Prometheus'
    # 发送给应用组所有人
    to_user: '@all'
    agent_id: xxx
    corp_id: 'xxx'
    api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
    api_secret: '企业微信中相关应用的秘钥'
    # 配置默认模板,可参考alert-template.tmpl文件
    message: '{{ template "wechat.default.message" . }}'

- name: 'email'
  email_configs:
  - to: 'xxx@xxx.com'
    headers: { Subject: "[WARN] 报警邮件" }
# 这里可自定义html的模板样式,可取到告警信息中的任意值,部分取值可参考以下
    html: '{{ index $alert.Labels "group" }}
	{{ index $alert.Labels "alertname" }}
	{{ index $alert.Labels "status" }}
	{{ index $alert.Labels "instance" }}
	{{ index $alert.Labels "job" }}
	{{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
	{{ index $alert.Annotations "summary" }}
	{{ index $alert.Annotations "description" }}'    

#inhibit_rules:
#  - source_match:
#      severity: 'critical'
#    target_match:
#      severity: 'warning'
#    equal: ['alertname', 'dev', 'instance']

告警模板alert-template.tmpl

{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
{{ define "__description" }}{{ end }}

{{ define "wechat.default.message" }}
{{ if gt (len .Alerts.Firing) 0 -}}
Alerts Firing:
{{ range .Alerts}}
告警级别: {{ .Labels.severity }}
告警类型: {{ .Labels.alertname }}
故障主机: {{ .Labels.pod_name }}
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
================
{{- end }}
{{- end }}
{{ if gt (len .Alerts.Resolved) 0 -}}
Alerts Resolved:
{{ range .Alerts}}
告警级别: {{ .Labels.severity }}
告警类型: {{ .Labels.alertname }}
故障主机: {{ .Labels.pod_name }}
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
恢复时间: {{ .EndsAt.Format "2006-01-02 15:04:05" }}
================
{{- end }}
{{- end }}
{{- end }}


{{ define "email.html" }}
<table>
    <tr><td>报警名</td><td>开始时间</td></tr>
    {{ range $i, $alert := .Alerts }}
        <tr><td>{{ index $alert.Labels "alertname" }}</td><td>{{ $alert.StartsAt }}</td></tr>
    {{ end }}
</table>
{{ end }}



            <tr><td>报警名</td><td>开始时间</td></tr>
                {{ range $i, $alert := .Alerts }}
            <tr><td>{{ index $alert.Labels "alertname" }}</td><td>{{ $alert.StartsAt }}</td></tr>
           {{ end }}

posted @ 2020-03-31 16:36  faylinn  阅读(411)  评论(0编辑  收藏  举报
、、、