Prometheus监控之Blackbox Exporter(一)
先安装环境:
下载链接分享:
链接:https://pan.baidu.com/s/1xzyoDLnvs8OTq9nLopU32A 提取码:jz6m
安装 Prometheus
cd /usr/local/src tar -zxvf prometheus.tar.gz cp -R prometheus-2.45.3.linux-amd64 /usr/local/prometheus
vim /usr/lib/systemd/system/prometheus.service
[Unit] Description=Prometheus Documentation=https://prometheus.io/ After=network.target [Service] Type=simple User=root WorkingDirectory=/usr/local/prometheus ExecStart=/usr/local/prometheus/prometheus Restart=on-failure [Install] WantedBy=multi-user.target
启动服务和设置开机自动启动 systemctl daemon-reload systemctl enable prometheus.service systemctl start prometheus.service
浏览器访问 http://192.168.230.130:9090/targets?search=
安装 Grafana
yum install -y grafana.rpm systemctl enable grafana-server systemctl start grafana-server
在浏览器访问地址:http:
//192.168.230.130:3000/ ,默认密码:admin/admin,添加Prometheus为数据源
安装 node_exporter
tar -zxvf node_exporter-1.6.1.linux-amd64.tar.gz cp -R node_exporter-1.6.1.linux-amd64 /usr/local/node_exporter
vim /usr/lib/systemd/system/node_exporter.service
[Unit] Description=node_exporter After=network.target [Service] Type=simple User=root ExecStart=/usr/local/node_exporter/node_exporter Restart=on-failure [Install] WantedBy=multi-user.target
设置开机自动启动 systemctl daemon-reload systemctl enable node_exporter.service systemctl start node_exporter.service
监控主机
编辑 prometheus.yml
static_configs: - targets: ["localhost:9090"] - job_name: "linux" static_configs: - targets: ["127.0.0.1:9100"] labels: instance: localhost
Grafana导入ID 8919监控主机模板
安装alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz tar xvf alertmanager-0.17.0.linux-amd64.tar.gz mv alertmanager-0.17.0.linux-amd64 /usr/local/alertmanager
加入系统启动命令
[root@linux-node1 src]#vim /usr/lib/systemd/system/alertmanager.service [Unit] Description=alertmanager server daemon Documentation=https://prometheus.io/docs/introduction/overview/ After=network.target [Service] ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml --storage.path=/usr/local/alertmanager/data ExecReload=/bin/kill -HUP $MAINPID ExecStop=/bin/kill -s QUIT $MAINPID KillMode=process Restart=on-failure RestartSec=42s [Install] WantedBy=multi-user.target
# 启动服务 systemctl daemon-reload systemctl start alertmanager systemctl enable alertmanager ps -ef|grep alertmanager
alertmanager与prometheus集成
在prometheus server上定义监控规则
当监控的阈值超过定义的阈值后会发送报警
推送报警规则给alertmanager
通过alertmanager的流程处理发送给报警接收人
vim /usr/local/prometheus/prometheus.yml # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - 127.0.0.1:9093 rule_files: # - "first_rules.yml" # - "second_rules.yml" - test.yml #报警规则存放文件夹
vim /usr/local/prometheus/test.yml groups: - name: general.rules #同性质的一组报警,监控当前节点的指标的组名称 rules: - alert: InstanceDown expr: up == 0 #每一个实例都会有一个up的状态,up是默认赋予被监控端的一个指标,0为失败状态,1为存活状态 for: 1m #当前报警的持续时间,1分钟之内如果都是up == 0的状态,才会发出报警 labels: #设置报警级别 severity: error #报警级别为error级别 annotations: #注释信息 summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." /usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml systemctl restart prometheus
报警收敛--静默
静默是一种简单的特定时间的静音提醒机制,使用标签来匹配这一批是不发送的
添加静默 http://192.168.230.130:9093/#/alerts 右上角-->new silence-->start开始时间-->end结束时间-->matchers匹配 -->name匹配名称-->填写job-->value值-->填写linux-node--> creator名称-->填写静默名称-->点击create 作用 阻止预期的报警,通常是在系统上线或维护阶段使用