Prometheus监控之Blackbox Exporter(一)

先安装环境:

下载链接分享:

链接:https://pan.baidu.com/s/1xzyoDLnvs8OTq9nLopU32A 提取码:jz6m

 安装 Prometheus

cd /usr/local/src
tar -zxvf prometheus.tar.gz
cp -R prometheus-2.45.3.linux-amd64 /usr/local/prometheus

vim /usr/lib/systemd/system/prometheus.service

[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
User=root
WorkingDirectory=/usr/local/prometheus
ExecStart=/usr/local/prometheus/prometheus
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
启动服务和设置开机自动启动
systemctl daemon-reload
systemctl enable prometheus.service
systemctl start prometheus.service

浏览器访问 http://192.168.230.130:9090/targets?search=

 安装 Grafana

yum install -y grafana.rpm
 
 
systemctl enable grafana-server
systemctl start grafana-server

在浏览器访问地址:http://192.168.230.130:3000/ ,默认密码:admin/admin,添加Prometheus为数据源

安装 node_exporter

tar -zxvf node_exporter-1.6.1.linux-amd64.tar.gz
cp -R node_exporter-1.6.1.linux-amd64 /usr/local/node_exporter

vim /usr/lib/systemd/system/node_exporter.service

[Unit]
Description=node_exporter
After=network.target
 
[Service]
Type=simple
User=root
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
设置开机自动启动
 
systemctl daemon-reload
systemctl enable node_exporter.service
systemctl start node_exporter.service

监控主机

编辑 prometheus.yml 

    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "linux"
    static_configs:
      - targets: ["127.0.0.1:9100"]
        labels:
          instance: localhost

Grafana导入ID 8919监控主机模板

 安装alertmanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz
tar xvf alertmanager-0.17.0.linux-amd64.tar.gz
mv alertmanager-0.17.0.linux-amd64 /usr/local/alertmanager

 加入系统启动命令

[root@linux-node1 src]#vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager server daemon
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
 
[Service]
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml --storage.path=/usr/local/alertmanager/data
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
KillMode=process
Restart=on-failure
RestartSec=42s
 
[Install]
WantedBy=multi-user.target
# 启动服务
systemctl daemon-reload
systemctl start  alertmanager
systemctl enable alertmanager
ps -ef|grep alertmanager

alertmanager与prometheus集成

在prometheus server上定义监控规则
当监控的阈值超过定义的阈值后会发送报警
推送报警规则给alertmanager
通过alertmanager的流程处理发送给报警接收人

vim /usr/local/prometheus/prometheus.yml
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 127.0.0.1:9093
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
    - test.yml  #报警规则存放文件夹
vim /usr/local/prometheus/test.yml
groups: 
  - name: general.rules #同性质的一组报警,监控当前节点的指标的组名称
    rules:
    - alert: InstanceDown
      expr: up == 0 #每一个实例都会有一个up的状态,up是默认赋予被监控端的一个指标,0为失败状态,1为存活状态
      for: 1m #当前报警的持续时间,1分钟之内如果都是up == 0的状态,才会发出报警
      labels: #设置报警级别
        severity: error #报警级别为error级别
      annotations: #注释信息
        summary: "Instance {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
systemctl restart prometheus    

报警收敛--静默

静默是一种简单的特定时间的静音提醒机制,使用标签来匹配这一批是不发送的

添加静默
http://192.168.230.130:9093/#/alerts

右上角-->new silence-->start开始时间-->end结束时间-->matchers匹配
-->name匹配名称-->填写job-->value值-->填写linux-node-->
creator名称-->填写静默名称-->点击create

作用
阻止预期的报警,通常是在系统上线或维护阶段使用

 

  

posted @ 2024-11-27 18:39  w787815  阅读(10)  评论(0编辑  收藏  举报