prometheus-接入方式simple-client&pushgateway&客户端源码解析

指标接入方式

官方源码库   https://github.com/prometheus/client_java

  • target自己采集指标,暴露出端口,  prometheusserver主动拉取数据
  • target主动推送到pushgateway, prometheus主动去pushgateway拉取

target 暴露端口的方式

本模式下有两种实现

  • 普通采集指标,暴露接口的方式
  • 借助actuator和内置的micrometer, 然后使用prometheus-registry. --对于springboot项目, 这种比较方便业务埋点. 从springboot2.X开始, actuator内部集成了micrometer

simpleclient方式的使用方式和原理

  • 接入方式
<dependency>
  <groupId>io.prometheus</groupId>
  <artifactId>simpleclient</artifactId>
  <version>0.10.0</version>
</dependency>
<!-- Hotspot JVM metrics-->
<dependency>
  <groupId>io.prometheus</groupId>
  <artifactId>simpleclient_hotspot</artifactId>
  <version>0.10.0</version>
</dependency>
<!-- Exposition HTTPServer-->
<dependency>
  <groupId>io.prometheus</groupId>
  <artifactId>simpleclient_httpserver</artifactId>
  <version>0.10.0</version>
</dependency>
**<!-- Pushgateway exposition-->**
<!--
<dependency>
  <groupId>io.prometheus</groupId>
  <artifactId>simpleclient_pushgateway</artifactId>
  <version>0.10.0</version>
</dependency> 
-->

从上面的注释也能看出, 加了pushgateway之后 , 就能推送到pushgateway了

  • 使用方式
    参见源码中readme文件, 附有例子 ,我就不啰嗦了.
  • 原理
    • simpleclient_hotspot包主要用于采集jvm相关的指标信息
    • simpleclient 内部封装了基本的数据结构和数据采集方式, 是最底层的逻辑
    • httpserver 负责将采集的数据暴露出去, 负责接收请求
  • 开胃菜-从prometheus的拉取开始 --httpserver部分

    可见httpserver部分只有一个类, 主要完成暴露接口, 接收请求的功能. 写的短小精悍, 值得细读, 对于希望做出一个轻量级(轻量级就意味着专注于核心功能)的web服务的需求来说,很合适.

  • NamedDaemonThreadFactory 线程池-线程工厂类
    static class NamedDaemonThreadFactory implements ThreadFactory {
        private static final AtomicInteger POOL_NUMBER = new AtomicInteger(1);

        private final int poolNumber = POOL_NUMBER.getAndIncrement();
        private final AtomicInteger threadNumber = new AtomicInteger(1);
        private final ThreadFactory delegate;
        private final boolean daemon;

        /**
         *  
         * @param delegate
         * @param daemon 一般都是false,设置为普通用户线程.主线程退出,用户线程还会继续执行.
         *               如果设置为true , 设置为守护线程, 主线程退出, 守护现场也会销毁, 垃圾回收线程就是守护线程.
         */
        NamedDaemonThreadFactory(ThreadFactory delegate, boolean daemon) {
            this.delegate = delegate;
            this.daemon = daemon;
        }

        @Override
        public Thread newThread(Runnable r) {
            Thread t = delegate.newThread(r);
            t.setName(String.format("prometheus-http-%d-%d", poolNumber, threadNumber.getAndIncrement()));
            t.setDaemon(daemon);
            return t;
        }

        static ThreadFactory defaultThreadFactory(boolean daemon) {
            return new NamedDaemonThreadFactory(Executors.defaultThreadFactory(), daemon);
        }
    }
  • LocalByteArray 实现了threalocal,用于存储/传递数据
    private static class LocalByteArray extends ThreadLocal<ByteArrayOutputStream> {
//        ByteArrayOutputStream 是一个基于字节数组的输出流
        @Override
        protected ByteArrayOutputStream initialValue()
        {
            return new ByteArrayOutputStream(1 << 20);
        }
    }
  • HTTPMetricHandler 处理http请求
    /**
     * Handles Metrics collections from the given registry.
     * registry位于simpleclient包
     */
    static class HTTPMetricHandler implements HttpHandler {
        private final CollectorRegistry registry;
        //LocalByteArray 是一个threadlocal对象,
        private final LocalByteArray response = new LocalByteArray();
        private final static String HEALTHY_RESPONSE = "Exporter is Healthy.";

        HTTPMetricHandler(CollectorRegistry registry) {
          this.registry = registry;
        }

        @Override
        public void handle(HttpExchange t) throws IOException {
            String query = t.getRequestURI().getRawQuery();

            String contextPath = t.getHttpContext().getPath();
            ByteArrayOutputStream response = this.response.get();
            //清空输出流
            response.reset();
            OutputStreamWriter osw = new OutputStreamWriter(response, Charset.forName("UTF-8"));
            if ("/-/healthy".equals(contextPath)) {//如果是健康检查; 可见prometheus的健康检查是基于http请求的
                osw.write(HEALTHY_RESPONSE);
            } else {
                String contentType = TextFormat.chooseContentType(t.getRequestHeaders().getFirst("Accept"));
                t.getResponseHeaders().set("Content-Type", contentType);
                // 重头戏!!! 从registry中取出数据, 写入到输出流中, 
                //得到的是 Enumeration<Collector.MetricFamilySamples>
                TextFormat.writeFormat(contentType, osw,
                        registry.filteredMetricFamilySamples(parseQuery(query)));
            }

            osw.close();
            // 如果客户端请求表示需要压缩的话, 要进行压缩
            if (shouldUseCompression(t)) {
                t.getResponseHeaders().set("Content-Encoding", "gzip");
                t.sendResponseHeaders(HttpURLConnection.HTTP_OK, 0);
                final GZIPOutputStream os = new GZIPOutputStream(t.getResponseBody());
                try {
                    response.writeTo(os);
                } finally {
                    os.close();
                }
            } else {
                t.getResponseHeaders().set("Content-Length",
                        String.valueOf(response.size()));
                t.sendResponseHeaders(HttpURLConnection.HTTP_OK, response.size());
                response.writeTo(t.getResponseBody());
            }
            t.close();
        }

    }

上面提到了一个 Enumeration ,这个接口用于遍历集合

public interface Enumeration<E> {
    /**
     * Tests if this enumeration contains more elements.
     *
     * @return  <code>true</code> if and only if this enumeration object
     *           contains at least one more element to provide;
     *          <code>false</code> otherwise.
     */
    boolean hasMoreElements();

    /**
     * Returns the next element of this enumeration if this enumeration
     * object has at least one more element to provide.
     *
     * @return     the next element of this enumeration.
     * @exception  NoSuchElementException  if no more elements exist.
     */
    E nextElement();
}

成员变量介绍完了, 再看看构造方法

    /**
     * Start a HTTP server serving Prometheus metrics from the given registry using the given {@link HttpServer}.
     * The {@code httpServer} is expected to already be bound to an address
     */
    public HTTPServer(HttpServer httpServer, CollectorRegistry registry, boolean daemon) throws IOException {
        if (httpServer.getAddress() == null)
            throw new IllegalArgumentException("HttpServer hasn't been bound to an address");
        //httpserver是 com.sun.net.httpserver, 这个是jdk提供的类,包含了一系列用于构建web服务器的类
        server = httpServer;
        //httphandler用于传给server, 处理请求
        HttpHandler mHandler = new HTTPMetricHandler(registry);
        //标记处理类
        server.createContext("/", mHandler);
        server.createContext("/metrics", mHandler);
        server.createContext("/-/healthy", mHandler);
        //真正处理任务的线程池, 由于主要是用于prometheus采集, 不需要很多线程
        executorService = Executors.newFixedThreadPool(5, NamedDaemonThreadFactory.defaultThreadFactory(daemon));
        server.setExecutor(executorService);
        start(daemon);
    }
  • http-server相关使用可参见下文
    https://www.cnblogs.com/aspwebchh/p/8300945.html
    可见, httpserver部分主要功能就是暴露端口, 提供web服务, 供prometheus拉取数据. 内部使用线程池处理请求 ,处理时, 使用handler从registry中获取数据, 然后发出响应. handler使用threadlocal存放响应数据. 这个模块的启动方式就是 new HTTPServer(port);
    至于再哪儿启动的 , 得看情况.

simple-client里面封装了prometheus的指标类型 ,提供了model和存储类,定义了collector(收集器), registry等

下面主要讲讲model之间的关系, 以及数据如何存储,如何被取出.
记住 , 客户端只是收集质变的当前值, 历史值都在prometheus里面存储,客户端创建了指标之后, 只是再不断更改它的值

  • 先说model
    启动类 基本类型都定义再collector里面
    接口实现关系

主要类 :
Sample 单个样本

 /**
   * A single Sample, with a unique name and set of labels.
   * 单个样本
   */
    public static class Sample {
      public final String name;
      public final List<String> labelNames;
      public final List<String> labelValues;  // Must have same length as labelNames.
      public final double value;
      //每个样本默认都会与时间戳
      public final Long timestampMs;  // It's an epoch format with milliseconds value included (this field is subject to change).

      public Sample(String name, List<String> labelNames, List<String> labelValues, double value, Long timestampMs) {
        this.name = name;
        this.labelNames = labelNames;
        this.labelValues = labelValues;
        this.value = value;
        this.timestampMs = timestampMs;
      }

MetricFamilySamples 真正的指标类

/**
   * A metric, and all of its samples.
   * 真正的指标类, 里面包含一系列样本
   * 指标集样本
   */
  static public class MetricFamilySamples {
    public final String name;
    public final String unit;
    public final Type type;
    public final String help;
    //样本
    public final List<Sample> samples;

    public MetricFamilySamples(String name, String unit, Type type, String help, List<Sample> samples) {
      if (!unit.isEmpty() && !name.endsWith("_" + unit)) {
        throw new IllegalArgumentException("Metric's unit is not the suffix of the metric name: " + name);
      }
      if ((type == Type.INFO || type == Type.STATE_SET) && !unit.isEmpty()) {
        throw new IllegalArgumentException("Metric is of a type that cannot have a unit: " + name);
      }
      List<Sample> mungedSamples = samples;
      // Deal with _total from pre-OM automatically.
      if (type == Type.COUNTER) {
        if (name.endsWith("_total")) {
          name = name.substring(0, name.length() - 6);
        }
        String withTotal = name + "_total";
        mungedSamples = new ArrayList<Sample>(samples.size());
        for (Sample s: samples) {
          String n = s.name;
          if (name.equals(n)) {
            n = withTotal;
          }
          mungedSamples.add(new Sample(n, s.labelNames, s.labelValues, s.value, s.timestampMs));
        }
      }
      this.name = name;
      this.unit = unit;
      this.type = type;
      this.help = help;
      this.samples = mungedSamples;
    }

  • 主要接口
    希望返回的指标集
public interface Describable {
    /**
     *  Provide a list of metric families this Collector is expected to return.
     *
     *  These should exclude the samples. This is used by the registry to
     *  detect collisions and duplicate registrations.
     *
     *  Usually custom collectors do not have to implement Describable. If
     *  Describable is not implemented and the CollectorRegistry was created
     *  with auto describe enabled (which is the case for the default registry)
     *  then {@link collect} will be called at registration time instead of
     *  describe. If this could cause problems, either implement a proper
     *  describe, or if that's not practical have describe return an empty
     *  list.
     */
    List<MetricFamilySamples> describe();
  }

返回指标数据

  public abstract List<MetricFamilySamples> collect();
public class Counter extends SimpleCollector<Counter.Child> implements Collector.Describable {

可看出 , counter等这些 , 都会实现collector接口.

  • 下面是个httpserver例子
public class ExampleExporter {

    static final Gauge g = Gauge.build().name("gauge").help("blah").register();
    static final Counter c = Counter.build().name("counter").help("meh").register();
    static final Summary s = Summary.build().name("summary").help("meh").register();
    static final Histogram h = Histogram.build().name("histogram").help("meh").register();
    static final Gauge l = Gauge.build().name("labels").help("blah").labelNames("l").register();

    public static void main(String[] args) throws Exception {
        new HTTPServer(1234);
        g.set(1);
        c.inc(2);
        s.observe(3);
        h.observe(4);
        l.labels("foo").inc(5);
    }
}
  • 取数据
    之前说过, server通过handler处理数据, 使用的hadler处理
//循环集合,得到指标数据
 TextFormat.writeFormat(contentType, osw,
// 返回的是个Enumeration集合
                        registry.filteredMetricFamilySamples(parseQuery(query)));


---
//取数据
 public static void write004(Writer writer, Enumeration<Collector.MetricFamilySamples> mfs) throws IOException {
    Map<String, Collector.MetricFamilySamples> omFamilies = new TreeMap<String, Collector.MetricFamilySamples>();
    /* See http://prometheus.io/docs/instrumenting/exposition_formats/
     * for the output format specification. */
    while(mfs.hasMoreElements()) {
      Collector.MetricFamilySamples metricFamilySamples = mfs.nextElement();
      String name = metricFamilySamples.name;
      writer.write("# HELP ");
      writer.write(name);
      if (metricFamilySamples.type == Collector.Type.COUNTER) {
        writer.write("_total");
      }
      if (metricFamilySamples.type == Collector.Type.INFO) {
        writer.write("_info");
      }
      writer.write(' ');
      writeEscapedHelp(writer, metricFamilySamples.help);
      writer.write('\n');

      writer.write("# TYPE ");
      writer.write(name);
      if (metricFamilySamples.type == Collector.Type.COUNTER) {
        writer.write("_total");
      }
      if (metricFamilySamples.type == Collector.Type.INFO) {
        writer.write("_info");
      }
      writer.write(' ');
      writer.write(typeString(metricFamilySamples.type));
      writer.write('\n');

      String createdName = name + "_created";
      String gcountName = name + "_gcount";
      String gsumName = name + "_gsum";
      for (Collector.MetricFamilySamples.Sample sample: metricFamilySamples.samples) {
        /* OpenMetrics specific sample, put in a gauge at the end. */
        if (sample.name.equals(createdName)
            || sample.name.equals(gcountName)
            || sample.name.equals(gsumName)) {
          Collector.MetricFamilySamples omFamily = omFamilies.get(sample.name);
          if (omFamily == null) {
            omFamily = new Collector.MetricFamilySamples(sample.name, Collector.Type.GAUGE, metricFamilySamples.help, new ArrayList<Collector.MetricFamilySamples.Sample>());
            omFamilies.put(sample.name, omFamily);
          }
          omFamily.samples.add(sample);
          continue;
        }
        writer.write(sample.name);
        if (sample.labelNames.size() > 0) {
          writer.write('{');
          for (int i = 0; i < sample.labelNames.size(); ++i) {
            writer.write(sample.labelNames.get(i));
            writer.write("=\"");
            writeEscapedLabelValue(writer, sample.labelValues.get(i));
            writer.write("\",");
          }
          writer.write('}');
        }
        writer.write(' ');
        writer.write(Collector.doubleToGoString(sample.value));
        if (sample.timestampMs != null){
          writer.write(' ');
          writer.write(sample.timestampMs.toString());
        }
        writer.write('\n');
      }
    }
    // Write out any OM-specific samples.
    if (!omFamilies.isEmpty()) {
      write004(writer, Collections.enumeration(omFamilies.values()));
    }
  }
  • 存数据
    以下是各步骤的代码片段
 static final Gauge g = Gauge.build().name("gauge").help("blah").register();
---
    /**
     * Create and register the Collector with the given registry.
     * 返回一个实体 ,比如guage
     */
    public C register(CollectorRegistry registry) {
      //实现产生一个真正的指标类, 比如new guage()
      C sc = create();
      registry.register(sc);//向registry中存储
      return sc;
    }

-----
public class CollectorRegistry {
  /**
   * The default registry.
   */
  public static final CollectorRegistry defaultRegistry = new CollectorRegistry(true);

  private final Object namesCollectorsLock = new Object();
  private final Map<Collector, List<String>> collectorsToNames = new HashMap<Collector, List<String>>();//类似 keyg:guage,value:"guage",注意, counter,SUMMARY等 , 会有加后缀的name
  private final Map<String, Collector> namesToCollectors = new HashMap<String, Collector>();//类似key;guaage,valu:一个真实的GaugeMetricFamily

  private final boolean autoDescribe;

  public CollectorRegistry() {
    this(false);
  }

  public CollectorRegistry(boolean autoDescribe) {
    this.autoDescribe = autoDescribe;
  }

  /**
   * Register a Collector.
   * <p>
   * A collector can be registered to multiple CollectorRegistries.
   */
  public void register(Collector m) {
    List<String> names = collectorNames(m);
    synchronized (namesCollectorsLock) {
      for (String name : names) {
        if (namesToCollectors.containsKey(name)) {
          throw new IllegalArgumentException("Collector already registered that provides name: " + name);
        }
      }
      //namesToCollectors  才是最终存储的地方
      for (String name : names) {
        namesToCollectors.put(name, m);
      }
      collectorsToNames.put(m, names);
    }
  }


pushgateway方式

其实就是封装了pushgateway的api, 推送,删除 ,增加指标信息, 将registry中的指标送到pushgateway暂存,用的是http请求
使用

 *   void executeBatchJob() throws Exception {
 *     CollectorRegistry registry = new CollectorRegistry();
 *     Gauge duration = Gauge.build()
 *         .name("my_batch_job_duration_seconds").help("Duration of my batch job in seconds.").register(registry);
 *     Gauge.Timer durationTimer = duration.startTimer();
 *     try {
 *       // Your code here.
 *
 *       // This is only added to the registry after success,
 *       // so that a previous success in the Pushgateway isn't overwritten on failure.
 *       Gauge lastSuccess = Gauge.build()
 *           .name("my_batch_job_last_success").help("Last time my batch job succeeded, in unixtime.").register(registry);
 *       lastSuccess.setToCurrentTime();
 *     } finally {
 *       durationTimer.setDuration();
 *       PushGateway pg = new PushGateway("127.0.0.1:9091");
 *       pg.pushAdd(registry, "my_batch_job");
 *     }
 *   }

pushgateway有http连接池, 但是其实是每次都是新建链接,依赖于http 1.1 的keepalive,性能还好

public class PushGateway {

  private static final int MILLISECONDS_PER_SECOND = 1000;

  // Visible for testing.
  protected final String gatewayBaseURL;
  //连接池
  private HttpConnectionFactory connectionFactory = new DefaultHttpConnectionFactory();

----
public class DefaultHttpConnectionFactory implements HttpConnectionFactory {
    @Override
    public HttpURLConnection create(String url) throws IOException {
        return (HttpURLConnection) new URL(url).openConnection();
    }
}

  • 另外, push 方法默认是走了dorequest方法, 发送http请求. 下面是代码实现
  void doRequest(CollectorRegistry registry, String job, Map<String, String> groupingKey, String method) throws IOException {
    String url = gatewayBaseURL;
    if (job.contains("/")) {
      url += "job@base64/" + base64url(job);
    } else {
      url += "job/" + URLEncoder.encode(job, "UTF-8");
    }

    if (groupingKey != null) {
      for (Map.Entry<String, String> entry: groupingKey.entrySet()) {
        if (entry.getValue().isEmpty()) {
          url += "/" + entry.getKey() + "@base64/=";
        } else if (entry.getValue().contains("/")) {
          url += "/" + entry.getKey() + "@base64/" + base64url(entry.getValue());
        } else {
          url += "/" + entry.getKey() + "/" + URLEncoder.encode(entry.getValue(), "UTF-8");
        }
      }
    }
    HttpURLConnection connection = connectionFactory.create(url);
    connection.setRequestProperty("Content-Type", TextFormat.CONTENT_TYPE_004);
    if (!method.equals("DELETE")) {
      connection.setDoOutput(true);
    }
    connection.setRequestMethod(method);
    //连接的超时时间是10s, read数据的超时时间也是10s
    connection.setConnectTimeout(10 * MILLISECONDS_PER_SECOND);
    connection.setReadTimeout(10 * MILLISECONDS_PER_SECOND);
    connection.connect();

    try {
      if (!method.equals("DELETE")) {
        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(connection.getOutputStream(), "UTF-8"));
        TextFormat.write004(writer, registry.metricFamilySamples());
        writer.flush();
        writer.close();
      }

      int response = connection.getResponseCode();
      if (response/100 != 2) {
        String errorMessage;
        InputStream errorStream = connection.getErrorStream();
        if(errorStream != null) {
          String errBody = readFromStream(errorStream);
          errorMessage = "Response code from " + url + " was " + response + ", response body: " + errBody;
        } else {
          errorMessage = "Response code from " + url + " was " + response;
        }
        throw new IOException(errorMessage);
      }
    } finally {
      connection.disconnect();
    }
  }

  • 可以看到,这个超时时间默认超时时间是10s, readtimeout的超时时间也是10s, 这在gateway异常情况下, 会极大消耗客户端资源, 导致挂起.

  • 可以schedule发送数据吗?
    不行, 因为默认guage默认不会存时间戳, 所以时间存的其实是http请求的到达时间, 以下是抓包数据截图

  • push每次新建连接会有性能问题吗?
    --没有,走的是keepalive
    以下是push方法-dorequest之后, disconnect的方法源码

 public void disconnect() {
        this.responseCode = -1;
        if (this.pi != null) {
            this.pi.finishTracking();
            this.pi = null;
        }

        if (this.http != null) {
            if (this.inputStream != null) {
                HttpClient var1 = this.http;
                boolean var2 = var1.isKeepingAlive();

                try {
                    this.inputStream.close();
                } catch (IOException var4) {
                }

                if (var2) {
                    var1.closeIdleConnection();
                }
            } else {
                this.http.setDoNotRetry(true);
                this.http.closeServer();
            }

            this.http = null;
            this.connected = false;
        }

        this.cachedInputStream = null;
        if (this.cachedHeaders != null) {
            this.cachedHeaders.reset();
        }

    }

可以看到 , URI的disconnect方法不是真正的关闭连接, 而是把相关的数据清除 , 以便下次复用. client和server是http1.1协议, 客户端默认开启keep-alive,而pushgateway的服务端也是支持的.
以下是两次push的wireshark抓包
请求的代码片段

public class ExamplePushGateway {
  static final CollectorRegistry pushRegistry = new CollectorRegistry();
  static final Gauge g = (Gauge) Gauge.build().name("gauge").help("blah").register(pushRegistry);

  /**
   * Example of how to use the pushgateway, pass in the host:port of a pushgateway.
   */
  public static void main(String[] args) throws Exception {
    PushGateway pg = new PushGateway("127.0.0.1:9091");
    g.set(42);
    pg.push(pushRegistry, "job");

    g.set(45);
    pg.push(pushRegistry,"job");

抓本机服务 使用这个

filter设置
tcp.dstport == 9091 or tcp.srcport == 9091

由抓包数据可知,只进行了一次http握手,disconnect并没有关闭连接, 然后设置的是keepalive
以下是push一次数据之后 , 一段时间内的链接情况, 可见, 客户端和服务器之前的链接并没有断开

posted @ 2021-02-25 17:12  rudolf_lin  阅读(6932)  评论(1编辑  收藏  举报