Springboot开启prometheus监控指标获取HTTP请求的吞吐时延等
一、相关文档
https://mvnrepository.com/artifact/io.micrometer/micrometer-registry-prometheus
https://github.com/micrometer-metrics/micrometer
https://micrometer.io/docs
https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#actuator.metrics.supported.spring-mvc
https://yunlzheng.gitbook.io/prometheus-book/
二、在springboot项目之中输出prometheus指标
a 安装 poml 依赖
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> </dependency>
b 在启动类注册
@SpringBootApplication @ServletComponentScan public class OneApplication { public static void main(String[] args) { SpringApplication.run(OneApplication.class, args); } // 非常重要 @Bean MeterRegistryCustomizer<MeterRegistry> configurer( @Value("${spring.application.name}") String applicationName) { return (registry) -> registry.config().commonTags("application", "hello"); } }
c 在配置中添加
management.endpoints.web.exposure.include=* management.metrics.tags.application="one" management.metrics.web.server.request.metric-name = http.server.requests
三、启动springboot项目后,查看输出的指标
http://localhost:8080/actuator/prometheus
四、在prometheus添加采集目标
cd \data\prometheu && vim prometheus.yml
添加采集目标
# Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ["localhost:9090"] # 新添加的采集目标 - job_name: "one" metrics_path: '/actuator/prometheus' static_configs: - targets: ["localhost:8080"]
五、在 prometheus 查看采集到的数据
http://localhost:9090/
搜索采集到的指标名称即可
六、可以安装 grafana 查看数据
1、安装 grafana 后登录
2、进入 grafana 添加 DataSource
3、添加 dashboards - 导入会更好
下载模板 https://grafana.com/grafana/dashboards/
七、几个特殊需求
1. 自定义指标名称 2. 如何给每一个请求打上自定义的tag 3. 更改默认的标签tag的名称 4. 有些status 200的日志,但是业务上是错误的,比如请求参数非法,如何埋点获取 5. 配置grafana的图示呈现吞吐、错误分布、时延 6. 获取网站在线人数
备注:
添加自定义标签,给每个接口增加一个team标识
import java.util.List; import io.micrometer.core.annotation.Timed; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RestController; @RestController @Timed public class MyController { @GetMapping("/api/addresses") public List<Address> listAddress() { return ... } @GetMapping("/api/people") @Timed(extraTags = { "team", "test" }) @Timed(value = "all.people", longTask = true) public List<Person> listPeople() { return ... } }
记得在 config 要添加以下
management.metrics.tags.team=""
在指标中的数据示例
# HELP http_server_requests_seconds # TYPE http_server_requests_seconds summary # http_server_requests_seconds_count 表示请求次数3次 http_server_requests_seconds_count{application="hello",exception="None",method="POST",outcome="SUCCESS",status="200",uri="/prometheus/post/{id}",} 3.0 # http_server_requests_seconds_sum 表示3次请求总响应时长是 3.021s http_server_requests_seconds_sum{application="hello",exception="None",method="POST",outcome="SUCCESS",status="200",uri="/prometheus/post/{id}",} 3.0210991 http_server_requests_seconds_count{application="hello",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 110.0 http_server_requests_seconds_sum{application="hello",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 3.8388842 # HELP 是对这个指标的描述 # HELP http_server_requests_seconds_max # 这个指标类型 # TYPE http_server_requests_seconds_max gauge # http_server_requests_seconds_max 表示所有请求中,最长响应时间的一次是 2.00s http_server_requests_seconds_max{application="hello",exception="None",method="POST",outcome="SUCCESS",status="200",uri="/prometheus/post/{id}",} 2.0021618 http_server_requests_seconds_max{application="hello",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.0345458
Counter(计数器)、Gauge(仪表盘)、Histogram(直方图)、Summary(摘要)
网站长需要的常规数据
1. 当前在线总数, 折线图, 时间线和人数 2. 吞吐量 - 每个时间点处理的请求数 - 折线图 3. 接口响应时长 - 每个接口的响应时长,横坐标是时间线,折线图 4. 错误分布 - 状态码各个请求分布
几个简单的图示
八、设置步长5s最大10s 获取 0ms ~ 0.1s 和 0.1s ~ 0.5s 和 0.5s ~ 1.5s 的响应数量
package com.example.one.config; import io.micrometer.core.instrument.Meter; import io.micrometer.core.instrument.MeterRegistry; import io.micrometer.core.instrument.config.MeterFilter; import io.micrometer.core.instrument.distribution.DistributionStatisticConfig; import lombok.extern.slf4j.Slf4j; import org.springframework.boot.actuate.autoconfigure.metrics.MeterRegistryCustomizer; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import java.time.Duration; @Configuration @Slf4j public class MicrometerConfig { @Bean MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() { Long mini = Duration.ofMillis(5000).toNanos(); Long maxi = Duration.ofSeconds(10).toNanos(); return registry -> { registry.config().meterFilter( new MeterFilter() { @Override public DistributionStatisticConfig configure(Meter.Id id, DistributionStatisticConfig config) { if (id.getType() == Meter.Type.TIMER&&id.getName().matches("^(http|hystrix).*")) { return DistributionStatisticConfig.builder() .percentilesHistogram(true) .serviceLevelObjectives(Duration.ofMillis(100).toNanos(), Duration.ofMillis(500).toNanos(), Duration.ofMillis(1000).toNanos(), Duration.ofMillis(1500).toNanos(), Duration.ofSeconds(3).toNanos(), Duration.ofSeconds(5).toNanos()) .minimumExpectedValue(mini.doubleValue()) .maximumExpectedValue(maxi.doubleValue()) .build() .merge(config); } else { return config; } } }); }; } }
获取的数据展示
http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="0.1",} 0.0 http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="0.5",} 0.0 http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="1.0",} 0.0 http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="1.5",} 1.0 http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="3.0",} 2.0 http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="5.0",} 2.0 http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="5.726623061",} 2.0 http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="7.158278826",} 2.0 http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="8.589934591",} 2.0 http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="10.0",} 2.0 http_server_requests_seconds_bucket{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",le="+Inf",} 2.0 http_server_requests_seconds_count{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",} 2.0 http_server_requests_seconds_sum{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",} 3.126248 # HELP http_server_requests_seconds_max # TYPE http_server_requests_seconds_max gauge http_server_requests_seconds_max{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.5549248 http_server_requests_seconds_max{application="one",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/user/detail/{id}",} 2.0041966
上面的请求有一个 1s 和 1个2s的