prometheus告警alertmanager邮件告警
下载并配置
wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz -C /apps
tar -xvf alertmanager-0.24.0.linux-amd64.tar.gz
ln -sv /apps/alertmanager-0.24.0.linux-amd64/ /apps/alertmanager
配置开机启动
cat /etc/systemd/system/alertmanager.service
[Unit]
Description=Prometheus alertmanager
After=network.target
[Service]
ExecStart=/apps/alertmanager/alertmanager --config.file=/apps/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl restart alertmanager
systemctl enable alertmanager
配置alertmanager.yml
vim alertmanager.yml
global:
resolve_timeout: 1m
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: '760478xxx@qq.com'
smtp_auth_username: '760478xxx@qq.com'
smtp_auth_password: 'sxcpymhdrkenbegd'
smtp_hello: '@qq.com'
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 1s
group_interval: 5s
repeat_interval: 10s
receiver: 'web.hook'
receivers:
- name: 'web.hook'
email_configs:
- to: '1500120xxxx@163.com' #收件人
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
重启alertmanager,浏览器访问alertmanager,查看status
编辑prometheus.yml修改alerting中的targets配置
vim prometheus/prometheus.yml
编辑rules配置
vim prometheus/rules/yzy_rules.yml
groups:
- name: alertmanager_pod.rules
rules:
- alert: Pod_all_cpu_usage
expr: (sum by(name)(rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 1
for: 2m
labels:
severity: critical
service: pods
annotations:
description: 容器 {{ $labels.name }} CPU 资源利用率大于10% , (current value is {{ $value }})
summary: Dev CPU 负载告警
- alert: Pod_all_memory_usage
#expr: sort_desc(avg by(name)(irate(container_ memory_usage_bytes{name!=""}[5m]))*100) > 10% #内存大于10%
expr: sort_desc(avg by(name)(irate(node_memory_MemFree_bytes {name!=""}[5m]))) > 2147483648 #内存大于 2G
for: 2m
labels:
severity: critical
annotations:
description: 容器 {{ $labels.name }} Memory资源利用率大于 2G,(current value is {{ $value }})
summary: Dev Memory 负载告警
- alert: Pod_all_network_receive_usage
expr: sum by (name) (irate(container_network_receive_bytes_total{container_name="POD"}[1m])) > 1
for: 2m
labels:
severity: critical
annotations:
description: 容器 {{ $labels.name }} network_receive 资源利用率大于 50M , (current value is {{ $value }}
- alert: node内存可用大小
expr: node_memory_MemFree_bytes < 4*1024*1024*1024 #故意写错的
for: 2m
labels:
severity: critical
annotations:
description: node节点的可用内存小于4G
将rule.yml配置在prometheus.yml中
vim /apps/prometheus/prometheus.yml
查看configuration看下配置有没有加载
查看Alters告警是否发送
进入收件箱看是否有新的告警邮件