Alertmanager集成Dingtalk/Wechat/Email报警
grafana对报警的支持真的很弱,而Prometheus提供的报警系统就强大很多
Prometheus将数据采集和报警分成了两个模块。报警规则配置在Prometheus Servers上,然后发送报警信息到AlertManger,然后我们的AlertManager就来管理这些报警信息,包括silencing、inhibition,聚合报警信息过后通过email、PagerDuty、HipChat、Slack 等方式发送消息提示。
让AlertManager提供服务总的来说就下面3步:
1.安装和配置AlertManger
2.配置Prometheus来和AlertManager通信
3.在Prometheus中创建报警规则
一个报警信息在生命周期内有下面3种状态:
1.inactive: 表示当前报警信息既不是firing状态也不是pending状态
2.pending: 表示在设置的阈值时间范围内被激活了
3.firing: 表示超过设置的阈值时间被激活了
Alertmanager配置文件
global:
resolve_timeout: 5m
# smtp配置
smtp_from: "prom-alert@example.com"
smtp_smarthost: 'email-smtp.us-west-2.amazonaws.com:465'
smtp_auth_username: "user"
smtp_auth_password: "pass"
smtp_require_tls: true
templates:
- '/data/alertmanager/templates/*.tmpl'
route:
receiver: test1
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: [alertname]
routes:
# ads webhook
- receiver: test1
group_wait: 10s
match:
team: ads
# ops webhook
- receiver: test2
group_wait: 10s
match:
team: operations
receivers:
- name: test1
email_configs:
- to: '9935226@qq.com'
headers: { Subject: "[ads] 报警邮件"} # 接收邮件的标题
webhook_configs:
- url: http://localhost:8060/dingtalk/ads/send
- name: test2
email_configs:
- to: '9935226@qq.com,deniss.wang@gmail.com'
send_resolved: true
headers: { Subject: "[ops] 报警邮件"} # 接收邮件的标题
webhook_configs:
- url: http://localhost:8060/dingtalk/ops/send
# wx config
wechat_configs:
- corp_id: 'wwxxxxxxxxxxxxxx'
api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
send_resolved: true
to_party: '2'
agent_id: '1000002'
api_secret: '1FvHxuGbbG35FYsuW0YyI4czWY/.2'
将Dingtalk接入 Prometheus AlertManager WebHook
在钉钉中申请钉钉机器人:
二进制方式安装Dingtalk-Webhook插件插件
cd /usr/local/src/
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
tar -zxvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
mv prometheus-webhook-dingtalk-0.3.0.linux-amd64 /data/alertmanager/webhook-dingtalk
# 创建Systemd webhook-dingtalk 服务
cat > /etc/systemd/system/webhook-dingtalk.service << EOF
[Unit]
Description=webhook-dingding
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/data/alertmanager/webhook-dingtalk/prometheus-webhook-dingtalk \
--ding.profile="ads=https://oapi.dingtalk.com/robot/send?access_token=284de68124e97420a2ee8ae1b8f12fabe3213213213" \
--ding.profile="ops=https://oapi.dingtalk.com/robot/send?access_token=8bce3bd11f7040d57d44caa5b6ef9417eab24e1123123123213"
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
# 启动服务
systemctl enable webhook-dingtalk
systemctl start webhook-dingtalk
systemctl status webhook-dingtalk
# 查看端口是否正常
netstat -anplt|grep 8060
tcp6 0 0 :::8060 :::* LISTEN 1635/prometheus-web