Prometheus邮件报警配置
Prometheus邮件报警配置
1:安装配置 Alertmanager
[root@prometheus ~]# tar xf alertmanager-0.23.0.linux-amd64.tar.gz
[root@prometheus ~]# mv alertmanager-0.23.0.linux-amd64 alertmanager
[root@prometheus ~]# mv alertmanager /usr/local/
2:创建启动文件
[root@prometheus ~]# cat << eof>>/usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
Documentation=https://github.com/prometheus/alertmanager
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alert.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
eof
3:Alertmanager 安装目录下默认有 alertmanager.yml 配置文件,可以创建新的配置文件,在启动时指定即可
[root@prometheus ~]# cat << eof>>/usr/local/alertmanager/alert.yml
global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'xxxxx@163.com' #发件人邮箱
smtp_auth_username: 'xxxxxx@163.com' #邮箱
smtp_auth_password: 'HJPEYUHCWTDQPACI' #这里是授权码
smtp_require_tls: false
templates:
- '/alertmanager/template/*.tmpl'
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 10m
receiver: default-receiver
receivers:
- name: 'default-receiver'
email_configs:
- to: 'xxxx@qq.com' #接收人
html: ''
headers: { Subject: "[WARN] 报警邮件" }
eof
smtp_smarthost:是用于发送邮件的邮箱的 SMTP 服务器地址 + 端口;
smtp_auth_password:是发送邮箱的授权码而不是登录密码;
smtp_require_tls:不设置的话默认为 true,当为 true 时会有 starttls 错误,为了简单这里设置为 false;
templates:指出邮件的模板路径;
receivers 下 html 指出邮件内容模板名,这里模板名为 “alert.html”,在模板路径中的某个文件中定义。
headers:为邮件标题;
3:配置告警规则
1:配置 role.yml
[root@prometheus prometheus]# cat <<eof>>/usr/local/prometheus/role.yml
groups:
- name: alert.yml
rules:
- alert: InstanceStatus # alert 名字
expr: up{job="linux-node01"} == 0 # 判断条件
for: 10s # 条件保持 10s 才会发出 alter
labels: # 设置 alert 的标签
severity: "critical"
annotations: # alert 的其他标签,但不用于标识 alert
description: 服务器 已当机超过 20s
summary: 服务器 运行状态
在 prometheus.yml 中指定 role.yml 的路径
[root@prometheus prometheus]# cat << eof>>/usr/local/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- 10.0.0.13:9093
rule_files:
- "/usr/local/prometheus/role.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090",'10.0.0.13:9100']
- job_name: "node_exporter"
scrape_interval: 5s
static_configs:
- targets: ["10.0.0.14:9100"]
eof
重启 Prometheus 服务
[root@prometheus prometheus]# chown -R prometheus.prometheus /usr/local/prometheus/role.yml
[root@prometheus prometheus]# systemctl restart prometheus
4:编写邮件模板
注意:文件后缀为 tmpl
[root@prometheus prometheus]# mkdir -pv /alertmanager/template/
mkdir: created directory ‘/alertmanager’
mkdir: created directory ‘/alertmanager/template/’
[root@prometheus prometheus]# cat /alertmanager/template/alert.tmpl
<table>
<tr><td>报警名</td><td>开始时间</td></tr>
<tr><td></td><td></td></tr>
</table>
5:启动 Alertmanager
[root@prometheus ~]# systemctl daemon-reload
[root@prometheus ~]# systemctl start alertmanager.service
[root@prometheus ~]# systemctl status alertmanager.service
[root@prometheus ~]# ss -tnl|grep 9093
6:验证
然后停止 10.0.0.14 节点上的 node_exporter 服务,然后再看效果