Promethues + Grafana + AlertManager使用总结
Prometheus是一个开源监控报警系统和时序列数据库,通常会使用Grafana来美化数据展示。
1. 监控系统基础架
1.1核心组件
- Prometheus Server, 主要用于抓取数据和存储时序数据,另外还提供查询和 Alert Rule 配置管理。
- exporters ,数据采样器,例如采集机器数据的node_exporter,采集MongoDB 信息的 MongoDB exporter 等等。
- alertmanager ,用于告警通知管理。
- Grafana ,监控数据图表化展示模块。
2. 基础组件安装
由于是学习研究使用,这里通过docker快速安装环境。
2.1 安装Node Exporter
-
docker-compose-node-export.yml
version: '3' services: node-exporter: image: prom/node-exporter container_name: node-exporter hostname: node-exporter restart: always ports: - "9100:9100"
2.2 安装Alert Manager
-
docker-compose-alertmanager.yml
version: '3' services: alertmanager: image: prom/alertmanager container_name: alertmanager hostname: alertmanager restart: always volumes: - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml ports: - "9093:9093"
-
alertmanager.yml
global: smtp_smarthost: 'smtp.qq.com:25' #QQ服务器 smtp_from: '793272861@qq.com' #发邮件的邮箱 smtp_auth_username: '793272861@qq.com' #发邮件的邮箱用户名,也就是你的邮箱 smtp_auth_password: '****************' #发邮件的邮箱密码 smtp_require_tls: false #不进行tls验证 route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 10m receiver: live-monitoring receivers: - name: 'live-monitoring' email_configs: - to: '793272861@qq.com' #收邮件的邮箱
2.3 安装Prometheus
-
docker-compose-prometheus.yml
version: '3' services: prometheus: image: prom/prometheus container_name: prometheus hostname: prometheus restart: always volumes: - /data/docker_file/prometheus/data:/prometheus - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090"
-
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: ['alertmanager:9093'] # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. # 配置定时任务,轮询拉取监控数据 scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['prometheus:9090'] - job_name: 'node-exporter' scrape_interval: 5s static_configs: - targets: ['node-exporter:9100']
2.4 安装Grafana
-
docker-compose-grafana.yml
version: '3' services: grafana: image: grafana/grafana container_name: grafana hostname: grafana restart: always environment: - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - /data/docker_file/grafana/data:/var/lib/grafana - /data/docker_file/grafana/log:/var/log/grafana ports: - "3000:3000"
-
添加数据源(Prometheus)
-
访问:http://localhost:30000/ , 默认用户名:admin,密码:admin
2.5 Docker-Compose脚本
version: '3'
services:
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /data/docker_file/prometheus/data:/prometheus
- /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
networks:
- monitor
alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
networks:
- monitor
grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: always
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- /data/docker_file/grafana/data:/var/lib/grafana
- /data/docker_file/grafana/log:/var/log/grafana
ports:
- "3000:3000"
networks:
- monitor
node-exporter:
image: prom/node-exporter
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "9100:9100"
networks:
- monitor
networks:
monitor:
driver: bridge
3. 配置Grafana DashBoard
Grafana通过PromQL查询语句从Prometheus拉取数据,并有Pannel进行渲染,一个个Grafana Pannel 组成一个Grafana DashBoard。
3.1下载Grafana DashBoard文件
可以从官网下载已经写好的Grafana DashBoard文件,导入到我们Grafana系统就可以直接使用。
推荐的Grafana DashBoard
- JVM (Micrometer)
- Spring Boot 2.1 Statistics
- 主机基础监控(cpu,内存,磁盘,网络)
- Node Exporter for Prometheus Dashboard CN
- Druid Connection Pool Dashboard
导入Grafana DashBoard
3.2 添加修改Grafana Panel(扩展)
官方自带的Spring Boot 2.1 Statistics Dashboard没有展示第三方请求的数据报表,我们以此为例,添加第三方请求的Client Request Count报表和Client Response Time报表。
Client Request Count
irate(http_client_requests_seconds_count{instance="$instance", application="$application", uri!~".*actuator.*"}[5m])
注意:应用中的Meter的名称必须为http.client.requests
Client Response Time
irate(http_client_requests_seconds_sum{instance="$instance", application="$application",uri!~".*actuator.*"}[5m]) / irate(http_client_requests_seconds_count{instance="$instance", application="$application",uri!~".*actuator.*"}[5m])
4. Spring Boot 集成Micrometer
Metrics(译:指标,度量)
Micrometer提供了与供应商无关的接口,包括 timers(计时器), gauges(量规), counters(计数器), distribution summaries(分布式摘要), long task timers(长任务定时器)。它具有维度数据模型,当与维度监视系统结合使用时,可以高效地访问特定的命名度量,并能够跨维度深入研究。
4.1 引入依赖
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<version>${micrometer.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
4.2 开启Prometheus功能
spring:
application:
name: spring-boot-node
management:
metrics:
# 1.添加全局的tags,后面可以作为变量搜索数据
tags:
application: ${spring.application.name}
endpoints:
web:
exposure:
# 2.打开prometheus端点功能
include: 'health,prometheus'
4.3 实现第三方请求的监控
基于OkHttpMetricsEventListener
可以有好的对OkHttp Client
的请求进行监控。
配置OkHttp Client事件监听
@Bean("okHttpClient")
public OkHttpClient okHttpClient(ConnectionPool connectionPool) {
return new OkHttpClient().newBuilder().connectionPool(connectionPool)
.connectTimeout(5, TimeUnit.SECONDS)
.readTimeout(10, TimeUnit.SECONDS)
.eventListener(eventListener())
.build();
}
/**
* 事件监听器 OkHttpMetricsEventListener
* metricsProperties.getWeb().getClient().getRequestsMetricName() equals 'http.client.request',可称为度量。
* @return
*/
private EventListener eventListener(){
return OkHttpMetricsEventListener.builder(
meterRegistry, metricsProperties.getWeb().getClient().getRequestsMetricName())
.build();
}
原理:OkHttpMetricsEventListener.java
public class OkHttpMetricsEventListener extends EventListener {
/**
* Header name for URI patterns which will be used for tag values.
*/
public static final String URI_PATTERN = "URI_PATTERN";
@Override
public void callFailed(Call call, IOException e) {
CallState state = callState.remove(call);
if (state != null) {
state.exception = e;
// 请求完成时,注册监控数据
time(state);
}
}
@Override
public void responseHeadersEnd(Call call, Response response) {
CallState state = callState.remove(call);
if (state != null) {
state.response = response;
// 请求完成时,注册监控数据
time(state);
}
}
private void time(CallState state) {
String uri = state.response == null ? "UNKNOWN" :
(state.response.code() == 404 || state.response.code() == 301 ? "NOT_FOUND" : urlMapper.apply(state.request));
// 定义一些Tag或者是变量,在Prometheus和Grafana中可以使用
Iterable<Tag> tags = Tags.concat(extraTags, Tags.of(
"method", state.request != null ? state.request.method() : "UNKNOWN",
"uri", uri,
"status", getStatusMessage(state.response, state.exception),
"host", state.request != null ? state.request.url().host() : "UNKNOWN"
));
// 注册计时器监控数据,此时Prometheus可以通过Spring Boot Actuator提供的/actuator/promotheus断点来pull数据
Timer.builder(this.requestsMetricName)
.tags(tags)
.description("Timer of OkHttp operation")
.register(registry)
.record(registry.config().clock().monotonicTime() - state.startTime, TimeUnit.NANOSECONDS);
}
}