Alertmanager邮件告警

Alertmanager安装配置

wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz
tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz -C /usr/local
cd /usr/local
mv alertmanager-0.21.0.linux-amd64/ alertmanager

创建启动文件

复制代码
vim /usr/lib/systemd/system/alertmanager.service 

[Unit]
Description=alertmanager
Documentation=https://github.com/prometheus/alertmanager
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alert-test.yml --storage.path=/usr/local/alertmanager/data
Restart=on-failure

[Install]
WantedBy=multi-user.target
复制代码

Alertmanager 安装目录下默认有 alertmanager.yml 配置文件,可以创建新的配置文件,在启动时指定即可。

复制代码
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_from: 'aa@qq.com'
  smtp_auth_username: 'aa@qq.com'
  smtp_auth_password: 'aa'
  smtp_require_tls: false

templates:
  - '/usr/local/alertmanager/template/*.tmpl' 邮件告警模板

# route标记:告警如何发送分配
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m
  receiver: 'mail'

receivers:
- name: 'mail'
  email_configs:
    - to: 'dd5@qq.com'
send_resolved: true #告警恢复 html: '{{ template "default-monitor.html" }}' #应用的哪个模板 headers: {Subject: "[WARN] 报警邮件 test"} #邮件主题信息 如果不写headers也可以再模板中自定义默认加载email.default.subject这个模板
复制代码
  • smtp_smarthost:是用于发送邮件的邮箱的 SMTP 服务器地址+端口;
  • smtp_auth_password:是发送邮箱的授权码而不是登录密码;
  • smtp_require_tls:不设置的话默认为 true,当为 true 时会有 starttls 错误,为了简单这里设置为 false;
  • templates:指出邮件的模板路径;
  • receivers 下 html 指出邮件内容模板名,这里模板名为 “alert.html”,在模板路径中的某个文件中定义。
  • headers:为邮件标题;

配置告警规则

配置 rule.yml

复制代码
groups:
- name: node_alerts
  rules:
  - alert: node-up告警
    expr: up==0
    for: 10s
    labels:
      serverity: page
    annotations:
      summary: "{{ $labels.instance }} 已停止运行超过10s"
复制代码

配置prometheus.yml指定rule.yml的路径

复制代码
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - localhost:9093    #添加alertmanager# 新增

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
   #- "/usr/local/prometheus/rules/*_alerts.yml"
   - "rules/*_alerts.yml"   # 新增

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:

    - targets: ['xxxxxxxxxxx:9090']
- job_name: 'xxxxxxxxxx'
    static_configs:
    - targets: ['xxxxxxxxxxxxx:9100']
      labels:
        instance: test
复制代码

重启 Prometheus 服务:

chown -R prometheus.prometheus /usr/local/prometheus/rule.yml
systemctl restart prometheus

编写邮件模板

注意:文件后缀为 tmpl

告警模版

复制代码
vi /usr/local/alertmanager/template/mail.tmpl
{{ define "default-monitor.html" }} {{ range .Alerts }} <pre> =============start=========== 告警程序: prometheus_alert 告警级别: {{ .Labels.severity }}
告警类型: {{ .Labels.alertname }}
故障主机: {{ .Labels.instance }}
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:23" }}
==============end============
</pre>
{{ end }}
{{ end }}
复制代码

告警回复模版

复制代码

vi /usr/local/alertmanager/template/mail.tmpl
{{ define "default-monitor.html" }}
{{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }}
@警报
<pre>
类型: {{ .Labels.alertname }}
实例: {{ .Labels.instance }}
信息: {{ .Annotations.summary }}
详情: {{ .Annotations.description }}
时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
</pre>
{{ end }}{{ end -}}
{{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }}
@恢复
<pre>
类型: {{ .Labels.alertname }}
实例: {{ .Labels.instance }}
信息: {{ .Annotations.summary }}
时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
恢复: {{ .EndsAt.Format "2006-01-02 15:04:05" }}
</pre>
{{ end }}{{ end -}}
{{- end }}
 
复制代码
posted @   fat_girl_spring  阅读(2214)  评论(0编辑  收藏  举报
编辑推荐:
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
阅读排行:
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!
点击右上角即可分享
微信分享提示