prometheus-接入方式simple-client&pushgateway&客户端源码解析
指标接入方式
官方源码库 https://github.com/prometheus/client_java
- target自己采集指标,暴露出端口, prometheusserver主动拉取数据
- target主动推送到pushgateway, prometheus主动去pushgateway拉取
target 暴露端口的方式
本模式下有两种实现
- 普通采集指标,暴露接口的方式
- 借助actuator和内置的micrometer, 然后使用prometheus-registry. --对于springboot项目, 这种比较方便业务埋点. 从springboot2.X开始, actuator内部集成了micrometer
simpleclient方式的使用方式和原理
- 接入方式
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient</artifactId>
<version>0.10.0</version>
</dependency>
<!-- Hotspot JVM metrics-->
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_hotspot</artifactId>
<version>0.10.0</version>
</dependency>
<!-- Exposition HTTPServer-->
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_httpserver</artifactId>
<version>0.10.0</version>
</dependency>
**<!-- Pushgateway exposition-->**
<!--
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_pushgateway</artifactId>
<version>0.10.0</version>
</dependency>
-->
从上面的注释也能看出, 加了pushgateway之后 , 就能推送到pushgateway了
- 使用方式
参见源码中readme文件, 附有例子 ,我就不啰嗦了. - 原理
- simpleclient_hotspot包主要用于采集jvm相关的指标信息
- simpleclient 内部封装了基本的数据结构和数据采集方式, 是最底层的逻辑
- httpserver 负责将采集的数据暴露出去, 负责接收请求
- 开胃菜-从prometheus的拉取开始 --httpserver部分
可见httpserver部分只有一个类, 主要完成暴露接口, 接收请求的功能. 写的短小精悍, 值得细读, 对于希望做出一个轻量级(轻量级就意味着专注于核心功能)的web服务的需求来说,很合适.
- NamedDaemonThreadFactory 线程池-线程工厂类
static class NamedDaemonThreadFactory implements ThreadFactory {
private static final AtomicInteger POOL_NUMBER = new AtomicInteger(1);
private final int poolNumber = POOL_NUMBER.getAndIncrement();
private final AtomicInteger threadNumber = new AtomicInteger(1);
private final ThreadFactory delegate;
private final boolean daemon;
/**
*
* @param delegate
* @param daemon 一般都是false,设置为普通用户线程.主线程退出,用户线程还会继续执行.
* 如果设置为true , 设置为守护线程, 主线程退出, 守护现场也会销毁, 垃圾回收线程就是守护线程.
*/
NamedDaemonThreadFactory(ThreadFactory delegate, boolean daemon) {
this.delegate = delegate;
this.daemon = daemon;
}
@Override
public Thread newThread(Runnable r) {
Thread t = delegate.newThread(r);
t.setName(String.format("prometheus-http-%d-%d", poolNumber, threadNumber.getAndIncrement()));
t.setDaemon(daemon);
return t;
}
static ThreadFactory defaultThreadFactory(boolean daemon) {
return new NamedDaemonThreadFactory(Executors.defaultThreadFactory(), daemon);
}
}
- LocalByteArray 实现了threalocal,用于存储/传递数据
private static class LocalByteArray extends ThreadLocal<ByteArrayOutputStream> {
// ByteArrayOutputStream 是一个基于字节数组的输出流
@Override
protected ByteArrayOutputStream initialValue()
{
return new ByteArrayOutputStream(1 << 20);
}
}
- HTTPMetricHandler 处理http请求
/**
* Handles Metrics collections from the given registry.
* registry位于simpleclient包
*/
static class HTTPMetricHandler implements HttpHandler {
private final CollectorRegistry registry;
//LocalByteArray 是一个threadlocal对象,
private final LocalByteArray response = new LocalByteArray();
private final static String HEALTHY_RESPONSE = "Exporter is Healthy.";
HTTPMetricHandler(CollectorRegistry registry) {
this.registry = registry;
}
@Override
public void handle(HttpExchange t) throws IOException {
String query = t.getRequestURI().getRawQuery();
String contextPath = t.getHttpContext().getPath();
ByteArrayOutputStream response = this.response.get();
//清空输出流
response.reset();
OutputStreamWriter osw = new OutputStreamWriter(response, Charset.forName("UTF-8"));
if ("/-/healthy".equals(contextPath)) {//如果是健康检查; 可见prometheus的健康检查是基于http请求的
osw.write(HEALTHY_RESPONSE);
} else {
String contentType = TextFormat.chooseContentType(t.getRequestHeaders().getFirst("Accept"));
t.getResponseHeaders().set("Content-Type", contentType);
// 重头戏!!! 从registry中取出数据, 写入到输出流中,
//得到的是 Enumeration<Collector.MetricFamilySamples>
TextFormat.writeFormat(contentType, osw,
registry.filteredMetricFamilySamples(parseQuery(query)));
}
osw.close();
// 如果客户端请求表示需要压缩的话, 要进行压缩
if (shouldUseCompression(t)) {
t.getResponseHeaders().set("Content-Encoding", "gzip");
t.sendResponseHeaders(HttpURLConnection.HTTP_OK, 0);
final GZIPOutputStream os = new GZIPOutputStream(t.getResponseBody());
try {
response.writeTo(os);
} finally {
os.close();
}
} else {
t.getResponseHeaders().set("Content-Length",
String.valueOf(response.size()));
t.sendResponseHeaders(HttpURLConnection.HTTP_OK, response.size());
response.writeTo(t.getResponseBody());
}
t.close();
}
}
上面提到了一个 Enumeration ,这个接口用于遍历集合
public interface Enumeration<E> {
/**
* Tests if this enumeration contains more elements.
*
* @return <code>true</code> if and only if this enumeration object
* contains at least one more element to provide;
* <code>false</code> otherwise.
*/
boolean hasMoreElements();
/**
* Returns the next element of this enumeration if this enumeration
* object has at least one more element to provide.
*
* @return the next element of this enumeration.
* @exception NoSuchElementException if no more elements exist.
*/
E nextElement();
}
成员变量介绍完了, 再看看构造方法
/**
* Start a HTTP server serving Prometheus metrics from the given registry using the given {@link HttpServer}.
* The {@code httpServer} is expected to already be bound to an address
*/
public HTTPServer(HttpServer httpServer, CollectorRegistry registry, boolean daemon) throws IOException {
if (httpServer.getAddress() == null)
throw new IllegalArgumentException("HttpServer hasn't been bound to an address");
//httpserver是 com.sun.net.httpserver, 这个是jdk提供的类,包含了一系列用于构建web服务器的类
server = httpServer;
//httphandler用于传给server, 处理请求
HttpHandler mHandler = new HTTPMetricHandler(registry);
//标记处理类
server.createContext("/", mHandler);
server.createContext("/metrics", mHandler);
server.createContext("/-/healthy", mHandler);
//真正处理任务的线程池, 由于主要是用于prometheus采集, 不需要很多线程
executorService = Executors.newFixedThreadPool(5, NamedDaemonThreadFactory.defaultThreadFactory(daemon));
server.setExecutor(executorService);
start(daemon);
}
- http-server相关使用可参见下文
https://www.cnblogs.com/aspwebchh/p/8300945.html
可见, httpserver部分主要功能就是暴露端口, 提供web服务, 供prometheus拉取数据. 内部使用线程池处理请求 ,处理时, 使用handler从registry中获取数据, 然后发出响应. handler使用threadlocal存放响应数据. 这个模块的启动方式就是 new HTTPServer(port);
至于再哪儿启动的 , 得看情况.
simple-client里面封装了prometheus的指标类型 ,提供了model和存储类,定义了collector(收集器), registry等
下面主要讲讲model之间的关系, 以及数据如何存储,如何被取出.
记住 , 客户端只是收集质变的当前值, 历史值都在prometheus里面存储,客户端创建了指标之后, 只是再不断更改它的值
- 先说model
启动类 基本类型都定义再collector里面
接口实现关系
主要类 :
Sample 单个样本
/**
* A single Sample, with a unique name and set of labels.
* 单个样本
*/
public static class Sample {
public final String name;
public final List<String> labelNames;
public final List<String> labelValues; // Must have same length as labelNames.
public final double value;
//每个样本默认都会与时间戳
public final Long timestampMs; // It's an epoch format with milliseconds value included (this field is subject to change).
public Sample(String name, List<String> labelNames, List<String> labelValues, double value, Long timestampMs) {
this.name = name;
this.labelNames = labelNames;
this.labelValues = labelValues;
this.value = value;
this.timestampMs = timestampMs;
}
MetricFamilySamples 真正的指标类
/**
* A metric, and all of its samples.
* 真正的指标类, 里面包含一系列样本
* 指标集样本
*/
static public class MetricFamilySamples {
public final String name;
public final String unit;
public final Type type;
public final String help;
//样本
public final List<Sample> samples;
public MetricFamilySamples(String name, String unit, Type type, String help, List<Sample> samples) {
if (!unit.isEmpty() && !name.endsWith("_" + unit)) {
throw new IllegalArgumentException("Metric's unit is not the suffix of the metric name: " + name);
}
if ((type == Type.INFO || type == Type.STATE_SET) && !unit.isEmpty()) {
throw new IllegalArgumentException("Metric is of a type that cannot have a unit: " + name);
}
List<Sample> mungedSamples = samples;
// Deal with _total from pre-OM automatically.
if (type == Type.COUNTER) {
if (name.endsWith("_total")) {
name = name.substring(0, name.length() - 6);
}
String withTotal = name + "_total";
mungedSamples = new ArrayList<Sample>(samples.size());
for (Sample s: samples) {
String n = s.name;
if (name.equals(n)) {
n = withTotal;
}
mungedSamples.add(new Sample(n, s.labelNames, s.labelValues, s.value, s.timestampMs));
}
}
this.name = name;
this.unit = unit;
this.type = type;
this.help = help;
this.samples = mungedSamples;
}
- 主要接口
希望返回的指标集
public interface Describable {
/**
* Provide a list of metric families this Collector is expected to return.
*
* These should exclude the samples. This is used by the registry to
* detect collisions and duplicate registrations.
*
* Usually custom collectors do not have to implement Describable. If
* Describable is not implemented and the CollectorRegistry was created
* with auto describe enabled (which is the case for the default registry)
* then {@link collect} will be called at registration time instead of
* describe. If this could cause problems, either implement a proper
* describe, or if that's not practical have describe return an empty
* list.
*/
List<MetricFamilySamples> describe();
}
返回指标数据
public abstract List<MetricFamilySamples> collect();
public class Counter extends SimpleCollector<Counter.Child> implements Collector.Describable {
可看出 , counter等这些 , 都会实现collector接口.
- 下面是个httpserver例子
public class ExampleExporter {
static final Gauge g = Gauge.build().name("gauge").help("blah").register();
static final Counter c = Counter.build().name("counter").help("meh").register();
static final Summary s = Summary.build().name("summary").help("meh").register();
static final Histogram h = Histogram.build().name("histogram").help("meh").register();
static final Gauge l = Gauge.build().name("labels").help("blah").labelNames("l").register();
public static void main(String[] args) throws Exception {
new HTTPServer(1234);
g.set(1);
c.inc(2);
s.observe(3);
h.observe(4);
l.labels("foo").inc(5);
}
}
- 取数据
之前说过, server通过handler处理数据, 使用的hadler处理
//循环集合,得到指标数据
TextFormat.writeFormat(contentType, osw,
// 返回的是个Enumeration集合
registry.filteredMetricFamilySamples(parseQuery(query)));
---
//取数据
public static void write004(Writer writer, Enumeration<Collector.MetricFamilySamples> mfs) throws IOException {
Map<String, Collector.MetricFamilySamples> omFamilies = new TreeMap<String, Collector.MetricFamilySamples>();
/* See http://prometheus.io/docs/instrumenting/exposition_formats/
* for the output format specification. */
while(mfs.hasMoreElements()) {
Collector.MetricFamilySamples metricFamilySamples = mfs.nextElement();
String name = metricFamilySamples.name;
writer.write("# HELP ");
writer.write(name);
if (metricFamilySamples.type == Collector.Type.COUNTER) {
writer.write("_total");
}
if (metricFamilySamples.type == Collector.Type.INFO) {
writer.write("_info");
}
writer.write(' ');
writeEscapedHelp(writer, metricFamilySamples.help);
writer.write('\n');
writer.write("# TYPE ");
writer.write(name);
if (metricFamilySamples.type == Collector.Type.COUNTER) {
writer.write("_total");
}
if (metricFamilySamples.type == Collector.Type.INFO) {
writer.write("_info");
}
writer.write(' ');
writer.write(typeString(metricFamilySamples.type));
writer.write('\n');
String createdName = name + "_created";
String gcountName = name + "_gcount";
String gsumName = name + "_gsum";
for (Collector.MetricFamilySamples.Sample sample: metricFamilySamples.samples) {
/* OpenMetrics specific sample, put in a gauge at the end. */
if (sample.name.equals(createdName)
|| sample.name.equals(gcountName)
|| sample.name.equals(gsumName)) {
Collector.MetricFamilySamples omFamily = omFamilies.get(sample.name);
if (omFamily == null) {
omFamily = new Collector.MetricFamilySamples(sample.name, Collector.Type.GAUGE, metricFamilySamples.help, new ArrayList<Collector.MetricFamilySamples.Sample>());
omFamilies.put(sample.name, omFamily);
}
omFamily.samples.add(sample);
continue;
}
writer.write(sample.name);
if (sample.labelNames.size() > 0) {
writer.write('{');
for (int i = 0; i < sample.labelNames.size(); ++i) {
writer.write(sample.labelNames.get(i));
writer.write("=\"");
writeEscapedLabelValue(writer, sample.labelValues.get(i));
writer.write("\",");
}
writer.write('}');
}
writer.write(' ');
writer.write(Collector.doubleToGoString(sample.value));
if (sample.timestampMs != null){
writer.write(' ');
writer.write(sample.timestampMs.toString());
}
writer.write('\n');
}
}
// Write out any OM-specific samples.
if (!omFamilies.isEmpty()) {
write004(writer, Collections.enumeration(omFamilies.values()));
}
}
- 存数据
以下是各步骤的代码片段
static final Gauge g = Gauge.build().name("gauge").help("blah").register();
---
/**
* Create and register the Collector with the given registry.
* 返回一个实体 ,比如guage
*/
public C register(CollectorRegistry registry) {
//实现产生一个真正的指标类, 比如new guage()
C sc = create();
registry.register(sc);//向registry中存储
return sc;
}
-----
public class CollectorRegistry {
/**
* The default registry.
*/
public static final CollectorRegistry defaultRegistry = new CollectorRegistry(true);
private final Object namesCollectorsLock = new Object();
private final Map<Collector, List<String>> collectorsToNames = new HashMap<Collector, List<String>>();//类似 keyg:guage,value:"guage",注意, counter,SUMMARY等 , 会有加后缀的name
private final Map<String, Collector> namesToCollectors = new HashMap<String, Collector>();//类似key;guaage,valu:一个真实的GaugeMetricFamily
private final boolean autoDescribe;
public CollectorRegistry() {
this(false);
}
public CollectorRegistry(boolean autoDescribe) {
this.autoDescribe = autoDescribe;
}
/**
* Register a Collector.
* <p>
* A collector can be registered to multiple CollectorRegistries.
*/
public void register(Collector m) {
List<String> names = collectorNames(m);
synchronized (namesCollectorsLock) {
for (String name : names) {
if (namesToCollectors.containsKey(name)) {
throw new IllegalArgumentException("Collector already registered that provides name: " + name);
}
}
//namesToCollectors 才是最终存储的地方
for (String name : names) {
namesToCollectors.put(name, m);
}
collectorsToNames.put(m, names);
}
}
pushgateway方式
其实就是封装了pushgateway的api, 推送,删除 ,增加指标信息, 将registry中的指标送到pushgateway暂存,用的是http请求
使用
* void executeBatchJob() throws Exception {
* CollectorRegistry registry = new CollectorRegistry();
* Gauge duration = Gauge.build()
* .name("my_batch_job_duration_seconds").help("Duration of my batch job in seconds.").register(registry);
* Gauge.Timer durationTimer = duration.startTimer();
* try {
* // Your code here.
*
* // This is only added to the registry after success,
* // so that a previous success in the Pushgateway isn't overwritten on failure.
* Gauge lastSuccess = Gauge.build()
* .name("my_batch_job_last_success").help("Last time my batch job succeeded, in unixtime.").register(registry);
* lastSuccess.setToCurrentTime();
* } finally {
* durationTimer.setDuration();
* PushGateway pg = new PushGateway("127.0.0.1:9091");
* pg.pushAdd(registry, "my_batch_job");
* }
* }
pushgateway有http连接池, 但是其实是每次都是新建链接,依赖于http 1.1 的keepalive,性能还好
public class PushGateway {
private static final int MILLISECONDS_PER_SECOND = 1000;
// Visible for testing.
protected final String gatewayBaseURL;
//连接池
private HttpConnectionFactory connectionFactory = new DefaultHttpConnectionFactory();
----
public class DefaultHttpConnectionFactory implements HttpConnectionFactory {
@Override
public HttpURLConnection create(String url) throws IOException {
return (HttpURLConnection) new URL(url).openConnection();
}
}
- 另外, push 方法默认是走了dorequest方法, 发送http请求. 下面是代码实现
void doRequest(CollectorRegistry registry, String job, Map<String, String> groupingKey, String method) throws IOException {
String url = gatewayBaseURL;
if (job.contains("/")) {
url += "job@base64/" + base64url(job);
} else {
url += "job/" + URLEncoder.encode(job, "UTF-8");
}
if (groupingKey != null) {
for (Map.Entry<String, String> entry: groupingKey.entrySet()) {
if (entry.getValue().isEmpty()) {
url += "/" + entry.getKey() + "@base64/=";
} else if (entry.getValue().contains("/")) {
url += "/" + entry.getKey() + "@base64/" + base64url(entry.getValue());
} else {
url += "/" + entry.getKey() + "/" + URLEncoder.encode(entry.getValue(), "UTF-8");
}
}
}
HttpURLConnection connection = connectionFactory.create(url);
connection.setRequestProperty("Content-Type", TextFormat.CONTENT_TYPE_004);
if (!method.equals("DELETE")) {
connection.setDoOutput(true);
}
connection.setRequestMethod(method);
//连接的超时时间是10s, read数据的超时时间也是10s
connection.setConnectTimeout(10 * MILLISECONDS_PER_SECOND);
connection.setReadTimeout(10 * MILLISECONDS_PER_SECOND);
connection.connect();
try {
if (!method.equals("DELETE")) {
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(connection.getOutputStream(), "UTF-8"));
TextFormat.write004(writer, registry.metricFamilySamples());
writer.flush();
writer.close();
}
int response = connection.getResponseCode();
if (response/100 != 2) {
String errorMessage;
InputStream errorStream = connection.getErrorStream();
if(errorStream != null) {
String errBody = readFromStream(errorStream);
errorMessage = "Response code from " + url + " was " + response + ", response body: " + errBody;
} else {
errorMessage = "Response code from " + url + " was " + response;
}
throw new IOException(errorMessage);
}
} finally {
connection.disconnect();
}
}
-
可以看到,这个超时时间默认超时时间是10s, readtimeout的超时时间也是10s, 这在gateway异常情况下, 会极大消耗客户端资源, 导致挂起.
-
可以schedule发送数据吗?
不行, 因为默认guage默认不会存时间戳, 所以时间存的其实是http请求的到达时间, 以下是抓包数据截图
-
push每次新建连接会有性能问题吗?
--没有,走的是keepalive
以下是push方法-dorequest之后, disconnect的方法源码
public void disconnect() {
this.responseCode = -1;
if (this.pi != null) {
this.pi.finishTracking();
this.pi = null;
}
if (this.http != null) {
if (this.inputStream != null) {
HttpClient var1 = this.http;
boolean var2 = var1.isKeepingAlive();
try {
this.inputStream.close();
} catch (IOException var4) {
}
if (var2) {
var1.closeIdleConnection();
}
} else {
this.http.setDoNotRetry(true);
this.http.closeServer();
}
this.http = null;
this.connected = false;
}
this.cachedInputStream = null;
if (this.cachedHeaders != null) {
this.cachedHeaders.reset();
}
}
可以看到 , URI的disconnect方法不是真正的关闭连接, 而是把相关的数据清除 , 以便下次复用. client和server是http1.1协议, 客户端默认开启keep-alive,而pushgateway的服务端也是支持的.
以下是两次push的wireshark抓包
请求的代码片段
public class ExamplePushGateway {
static final CollectorRegistry pushRegistry = new CollectorRegistry();
static final Gauge g = (Gauge) Gauge.build().name("gauge").help("blah").register(pushRegistry);
/**
* Example of how to use the pushgateway, pass in the host:port of a pushgateway.
*/
public static void main(String[] args) throws Exception {
PushGateway pg = new PushGateway("127.0.0.1:9091");
g.set(42);
pg.push(pushRegistry, "job");
g.set(45);
pg.push(pushRegistry,"job");
抓本机服务 使用这个
filter设置
tcp.dstport == 9091 or tcp.srcport == 9091
由抓包数据可知,只进行了一次http握手,disconnect并没有关闭连接, 然后设置的是keepalive
以下是push一次数据之后 , 一段时间内的链接情况, 可见, 客户端和服务器之前的链接并没有断开