Prometheus邮件报警配置

Prometheus邮件报警配置

1:安装配置 Alertmanager

[root@prometheus ~]# tar xf alertmanager-0.23.0.linux-amd64.tar.gz 
[root@prometheus ~]# mv alertmanager-0.23.0.linux-amd64 alertmanager
[root@prometheus ~]# mv alertmanager /usr/local/

2:创建启动文件

[root@prometheus ~]# cat << eof>>/usr/lib/systemd/system/alertmanager.service 
[Unit]
Description=alertmanager
Documentation=https://github.com/prometheus/alertmanager
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alert.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
eof

3:Alertmanager 安装目录下默认有 alertmanager.yml 配置文件,可以创建新的配置文件,在启动时指定即可

[root@prometheus ~]# cat << eof>>/usr/local/alertmanager/alert.yml
global:
  smtp_smarthost: 'smtp.163.com:25'
  smtp_from: 'xxxxx@163.com' #发件人邮箱
  smtp_auth_username: 'xxxxxx@163.com'  #邮箱
  smtp_auth_password: 'HJPEYUHCWTDQPACI' #这里是授权码
  smtp_require_tls: false
templates:
  - '/alertmanager/template/*.tmpl'
route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 10m
  receiver: default-receiver
receivers:
- name: 'default-receiver'
  email_configs:
  - to: 'xxxx@qq.com' #接收人
    html: ''
    headers: { Subject: "[WARN] 报警邮件" }
eof

smtp_smarthost:是用于发送邮件的邮箱的 SMTP 服务器地址 + 端口;
smtp_auth_password:是发送邮箱的授权码而不是登录密码;
smtp_require_tls:不设置的话默认为 true,当为 true 时会有 starttls 错误,为了简单这里设置为 false;
templates:指出邮件的模板路径;
receivers 下 html 指出邮件内容模板名,这里模板名为 “alert.html”,在模板路径中的某个文件中定义。
headers:为邮件标题;

3:配置告警规则

1:配置 role.yml
[root@prometheus prometheus]# cat <<eof>>/usr/local/prometheus/role.yml 
groups:
- name: alert.yml
  rules:
  - alert: InstanceStatus # alert 名字
    expr: up{job="linux-node01"} == 0 # 判断条件
    for: 10s # 条件保持 10s 才会发出 alter
    labels: # 设置 alert 的标签
      severity: "critical"
    annotations:  # alert 的其他标签,但不用于标识 alert
      description: 服务器  已当机超过 20s
      summary: 服务器  运行状态
      
在 prometheus.yml 中指定 role.yml 的路径
[root@prometheus prometheus]# cat << eof>>/usr/local/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 10.0.0.13:9093
rule_files:
  - "/usr/local/prometheus/role.yml"
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090",'10.0.0.13:9100']

  - job_name: "node_exporter"
    scrape_interval: 5s
    static_configs:
      - targets: ["10.0.0.14:9100"]
eof

重启 Prometheus 服务
[root@prometheus prometheus]# chown -R prometheus.prometheus /usr/local/prometheus/role.yml 
[root@prometheus prometheus]# systemctl restart prometheus

4:编写邮件模板

注意:文件后缀为 tmpl
[root@prometheus prometheus]# mkdir -pv /alertmanager/template/
mkdir: created directory ‘/alertmanager’
mkdir: created directory ‘/alertmanager/template/’
[root@prometheus prometheus]# cat /alertmanager/template/alert.tmpl 
<table>
    <tr><td>报警名</td><td>开始时间</td></tr>
        <tr><td></td><td></td></tr>
</table>

5:启动 Alertmanager

[root@prometheus ~]#  systemctl daemon-reload
[root@prometheus ~]#  systemctl start alertmanager.service
[root@prometheus ~]#  systemctl status alertmanager.service
[root@prometheus ~]#  ss -tnl|grep 9093

6:验证

image

然后停止 10.0.0.14 节点上的 node_exporter 服务,然后再看效果

image
image

posted @ 2022-02-14 00:01  Layzer  阅读(267)  评论(0编辑  收藏  举报