Fork me on GitHub

Prometheus(四):Prometheus+Alertmanager 配置邮件报警

此处默认已安装Prometheus服务,服务地址:192.168.56.200 

一、安装Alertmanager

此处采用源码编译的方式安装。首先下载alertmanager的软件包,下载地址:https://github.com/prometheus/alertmanager/releases/download/v0.19.0/alertmanager-0.19.0.linux-amd64.tar.gz

下载完成后,将下载中软件包上传至Prometheus服务所在的机器(192.168.56.200)的 /usr/local 目录下

 解压alertmanager软件包:

#   tar -zvxf alertmanager-0.19.0.linux-amd64.tar.gz
#   mv alertmanager-0.19.0.linux-amd64/ alertmanager

进入解压后的alertmanager文件夹,修改alertmanager.yml文件,配置报警信息,alertmanager.yml 内容如下:

global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.126.com:465'
  smtp_from: '****@126.com' # 用于发送告警右键的邮箱
  smtp_auth_username: '****@126.com'
  smtp_auth_password: '****'    #此处为邮箱的授权密码,非邮箱登录密码
  smtp_require_tls: false

route:  # 设置报警分发策略
  group_by: ['alertname'] # 分组标签
  group_wait: 10s      # 告警等待时间。告警产生后等待10s,如果有同组告警一起发出
  group_interval: 10s  # 两组告警的间隔时间
  repeat_interval: 1m  # 重复告警的间隔时间,减少相同右键的发送频率 此处为测试设置为1分钟 
  receiver: 'mail'  # 默认接收者
routes: # 指定那些组可以接收消息 - receiver: mail receivers: - name: 'mail' email_configs: - to: '****@126.com' # 接收报警邮件的邮箱 #inhibit_rules: # - source_match: # severity: 'critical' # target_match: # severity: 'warning' # equal: ['alertname', 'dev', 'instance']

检查alertmanager.yml 配置是否正确

# ./amtool check-config alertmanager.yml

 配置正确

启动alertmanager

#  ./alertmanager

 可以看到alertmanager服务已经起来,服务所在的端口为9093

浏览器访问: http://192.168.56.200:9093  (IP:9093)

 alertmanager成功启动。

二、配置Prometheus

Ctrl+C 结束掉alertmanager服务进程,进入Prometheus的安装目录下修改Prometheus配置。

#  cd /usr/local/prometheus
#  vim prometheus.yml

修改Prometheus.yml文件中的 alerting 配置项及rule_files配置项

alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']

  rule_files:  #配置告警规则
   - "rule.yml"

修改完成后保存退出

以下是Prometheus.yml 文件全部内容:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rule.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'Linux'
    static_configs:
    - targets: ['192.168.56.201:9100']
      labels:
        instance: Linux

  - job_name: 'Windows'
    static_configs:
    - targets: ['192.168.56.1:9182']
      labels:
        instance: Windows

  - job_name: 'snmp'
    scrape_interval: 10s
    static_configs: 
     - targets: 
       - 172.20.2.83  # 交换机IP地址
    metrics_path: /snmp
    # params:
     # module: [if_mib]
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.56.100:9116 # snmp_exporter 服务IP地址

编写告警规则文件rule.yml

#  vim rule.yml

将以下内容写入文件当中,(此处用于测试,设置为当内存占用高于10%时,就会告警)

groups:
- name: mem-rule
  rules:
  - alert: "内存报警"
    expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 10
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "服务名:{{$labels.alertname}} 内存报警"
      description: "{{ $labels.alertname }} 内存资源利用率大于 10%"
      value: "{{ $value }}"

保存退出

三、告警检测

重启Prometheus服务,使配置的告警规则生效

#  systemctl restart prometheus

进入alertmanager的安装文件夹,启动alertmanager

#  cd /usr/local/alertmanager
#  ./alertmanager

稍等片刻,登录设置的接收告警右键的邮箱,可以看到已经接收到告警邮件

 浏览器访问 http://192.168.56.200:9093/#/alerts  ,也能看到告警信息

 四、配置alertmanager服务开机自启

Ctrl+C 结束掉 alertmanager 服务进程,创建 alertmanager服务,让 alertmanager 以服务的方式,开机自启。

添加系统服务

#  vim /etc/systemd/system/alertmanager.service

将以下内容写入文件中

[Unit]
Description=alertmanager
After=network.target

[Service]
WorkingDirectory=/usr/local/alertmanager
ExecStart=/usr/local/alertmanager/alertmanager --config.file=alertmanager.yml --log.level=debug --log.format=json
Restart=on-failure

[Install]
WantedBy=multi-user.target

保存退出

启动服务,设置开机自启

#  systemctl daemon-reload
#  systemctl enable alertmanager
#  systemctl start alertmanager

至此Prometheus+alertmanage配置邮件报警完成。

posted @ 2019-11-22 16:53  Aiden郭祥跃  阅读(2741)  评论(0编辑  收藏  举报
";