prometheus告警alertmanager邮件告警

下载并配置

wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz -C /apps
tar -xvf alertmanager-0.24.0.linux-amd64.tar.gz
ln -sv /apps/alertmanager-0.24.0.linux-amd64/ /apps/alertmanager

配置开机启动

cat /etc/systemd/system/alertmanager.service 
[Unit]
Description=Prometheus alertmanager
After=network.target

[Service]
ExecStart=/apps/alertmanager/alertmanager --config.file=/apps/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.target

systemctl daemon-reload
systemctl restart alertmanager
systemctl enable alertmanager

配置alertmanager.yml

vim alertmanager.yml

global:
  resolve_timeout: 1m
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_from: '760478xxx@qq.com'
  smtp_auth_username: '760478xxx@qq.com'
  smtp_auth_password: 'sxcpymhdrkenbegd'
  smtp_hello: '@qq.com'
  smtp_require_tls: false

route:
  group_by: ['alertname']
  group_wait: 1s
  group_interval: 5s
  repeat_interval: 10s
  receiver: 'web.hook'
receivers:
  - name: 'web.hook'
    email_configs:
      - to: '1500120xxxx@163.com'   #收件人
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

重启alertmanager,浏览器访问alertmanager,查看status

编辑prometheus.yml修改alerting中的targets配置

vim prometheus/prometheus.yml 

编辑rules配置

vim prometheus/rules/yzy_rules.yml 

groups:
  - name: alertmanager_pod.rules
    rules:
    - alert: Pod_all_cpu_usage
      expr: (sum by(name)(rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 1
      for: 2m
      labels:
        severity: critical
        service: pods
      annotations:
        description: 容器 {{ $labels.name }} CPU 资源利用率大于10% , (current value is {{ $value }})
        summary: Dev CPU 负载告警

    - alert: Pod_all_memory_usage
      #expr: sort_desc(avg by(name)(irate(container_ memory_usage_bytes{name!=""}[5m]))*100) > 10% #内存大于10%
      expr: sort_desc(avg by(name)(irate(node_memory_MemFree_bytes {name!=""}[5m]))) > 2147483648 #内存大于 2G
      for: 2m
      labels:
        severity: critical
      annotations:
        description: 容器 {{ $labels.name }} Memory资源利用率大于 2G,(current value is {{ $value }})
        summary: Dev Memory 负载告警

    - alert: Pod_all_network_receive_usage
      expr: sum by (name) (irate(container_network_receive_bytes_total{container_name="POD"}[1m])) > 1
      for: 2m
      labels:
        severity: critical
      annotations:
        description: 容器 {{ $labels.name }} network_receive 资源利用率大于 50M , (current value is {{ $value }}

    - alert: node内存可用大小
      expr: node_memory_MemFree_bytes < 4*1024*1024*1024 #故意写错的
      for: 2m
      labels:
        severity: critical
      annotations:
        description: node节点的可用内存小于4G

将rule.yml配置在prometheus.yml中

vim /apps/prometheus/prometheus.yml

查看configuration看下配置有没有加载

查看Alters告警是否发送

 

进入收件箱看是否有新的告警邮件

posted @ 2022-08-17 13:47  Maniana  阅读(141)  评论(0编辑  收藏  举报