prometheus监控告警(alertmanager)发送邮件通知
特别注意:防止发送通知过快或频繁,导致警告通知轰炸
下载alertmanager
下载地址:https://prometheus.io/download/
下载解压之后直接双击exe文件启动,打开 http://localhost:9093,等 prometheus配置之后重启等会,
修改alertmanager.yml
global:
resolve_timeout: 5m
smtp_from: 'xxxxxxxx@qq.com'
smtp_smarthost: 'smtp.qq.com:465'
smtp_auth_username: 'xxxxxxxxxxx@qq.com'
smtp_auth_password: 'xxxxxxxxxxxxxxx'
smtp_require_tls: false
smtp_hello: 'qq.com'
route:
group_by: ['alertname']
group_wait: 5s
group_interval: 5s
repeat_interval: 5m
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: 'xxxxxxxxxx@qq.com'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
修改prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093
rule_files:
- "machine_alert_rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_liux_70'
static_configs:
- targets: ['10.0.0.70:9100']
添加machine_alert_rules.yml
groups:
- name: simulator-alert-rule
rules:
- alert: check_node_liux_70
expr: sum(up{job="node_liux_70"}) == 0
for: 1m
labels:
severity: critical
annotations:
description: "已经宕机或下线超过1分钟."