Prometheus监控有所思:多标签埋点及Mbean
使用 grafana+prometheus+jmx 作为普通的监控手段,是比较有用的。我之前的文章介绍了相应的实现办法。https://www.cnblogs.com/yougewe/p/11140129.html
但是,按照之前的实现,我们更多的只能是监控 单值型的数据,如请求量,tps 等等,对于复杂组合型的指标却不容易监控。
这种情况一般带有一定的业务属性,比如想监控mq中的每个topic的消费情况,每类产品的实时订单情况等等。当然,对于看过完整的 prometheus 的监控数据的同学来说,会觉得很正常,因为你会看到如下的数据:
# HELP java_lang_MemoryPool_PeakUsage_max java.lang.management.MemoryUsage (java.lang<type=MemoryPool, name=Metaspace><PeakUsage>max) # TYPE java_lang_MemoryPool_PeakUsage_max untyped java_lang_MemoryPool_PeakUsage_max{name="Metaspace",} -1.0 java_lang_MemoryPool_PeakUsage_max{name="PS Old Gen",} 1.415053312E9 java_lang_MemoryPool_PeakUsage_max{name="PS Eden Space",} 6.96778752E8 java_lang_MemoryPool_PeakUsage_max{name="Code Cache",} 2.5165824E8 java_lang_MemoryPool_PeakUsage_max{name="Compressed Class Space",} 1.073741824E9 java_lang_MemoryPool_PeakUsage_max{name="PS Survivor Space",} 5242880.0
这里面的 name 就是普通标签嘛,同理于其他埋点咯。应该是可以实现的。
是的,prometheus 是方便实现这玩意的,但是我们之前不是使用 jmx_exportor 作为导出工具嘛,使用的埋点组件是 io.dropwizard.metrics:metrics-core 。
而它则是重在单值的监控,所以,用它我们是实现不了带指标的数据的监控了。
那怎么办呢?三个办法!
1. 直接替换原有的 metrics-core 组件为 prometheus 的client 组件,因为官方是支持这种操作的;
2. 使用 prometheus-client 组件与 metrics-core 组件配合,各自使用各自的功能;
3. 自行实现带标签的埋点,这可能是基于 MBean 的;
以上这几种方案,各有优劣。方案1可能改动太大,而且可能功能不兼容不可行; 方案2可能存在整合不了或者功能冲突情况,当然如果能整合,绝对是最好的; 方案3实现复杂度就高了,比如监控值维护、线程安全、MBean数据吐出方式等等。
好吧,不管怎么样,我们还是都看看吧。
一、 使用 prometheus-client 埋点实现带标签的监控
1. 引入 pom 依赖
<dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient</artifactId> <version>0.8.0</version> </dependency> <dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_hotspot</artifactId> <version>0.8.0</version> </dependency> <dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_servlet</artifactId> <version>0.8.0</version> </dependency>
2. 框架注册监控
@Configuration public class PrometheusConfig { @Bean public ServletRegistrationBean servletRegistrationBean(){ // 将埋点指标吐出到 /metrics 节点 return new ServletRegistrationBean(new MetricsServlet(), "/metrics"); } }
3. 业务埋点数据
// 注册指标实例 io.prometheus.client.Counter c = io.prometheus.client.Counter.build() .name("jmx_test_abc_ffff") .labelNames("topic") .help("topic counter usage.") .register(); public void incTopicMetric(String topic) { // c.labels("test").inc(); // for test }
4. 获取埋点数据信息
curl http://localhost:8080/metrics # 对外暴露http接口调用,结果如下 # HELP jmx_test_abc_ffff counter usage. # TYPE jmx_test_abc_ffff counter jmx_test_abc_ffff{topic="bbb",} 1.0 jmx_test_abc_ffff{topic="2",} 2.0 jmx_test_abc_ffff{topic="test",} 1.0
可以看出,效果咱们是实现了。但是,对于已经运行的东西,要改这玩意可能不是那么友好。主要有以下几点:
1. 暴露数据方式变更,原来由javaagent进行统一处理的数据,现在可能由于应用端口的不一,导致收集的配置会变更,不一定符合运维场景;
2. 需要将原来的埋点进行替换;
二、 prometheus-client 与 metrics-core 混合埋点
不处理以前的监控,将新监控带标签数据吐入到 jmx_exportor 中。
我们试着使用如上的埋点方式:
// 注册指标实例 io.prometheus.client.Counter c = io.prometheus.client.Counter.build() .name("jmx_test_abc_ffff") .labelNames("topic") .help("topic counter usage.") .register(); public void incTopicMetric(String topic) { // c.labels("test").inc(); // for test }
好像数据是不会进入的到 jmx_exportor 的。这也不奇怪,毕竟咱们也不了解其原理,难道想靠运气取胜??
细去查看 metrics-core 组件的埋点实现方案,发现其是向 MBean 中吐入数据,从而被 jmx_exportor 抓取的。
// com.codahale.metrics.jmx.JmxReporter.JmxListener#onCounterAdded @Override public void onCounterAdded(String name, Counter counter) { try { if (filter.matches(name, counter)) { final ObjectName objectName = createName("counters", name); registerMBean(new JmxCounter(counter, objectName), objectName); } } catch (InstanceAlreadyExistsException e) { LOGGER.debug("Unable to register counter", e); } catch (JMException e) { LOGGER.warn("Unable to register counter", e); } } // 向 mBeanServer 注册监控实例 // 默认情况下 mBeanServer = ManagementFactory.getPlatformMBeanServer(); private void registerMBean(Object mBean, ObjectName objectName) throws InstanceAlreadyExistsException, JMException { ObjectInstance objectInstance = mBeanServer.registerMBean(mBean, objectName); if (objectInstance != null) { // the websphere mbeanserver rewrites the objectname to include // cell, node & server info // make sure we capture the new objectName for unregistration registered.put(objectName, objectInstance.getObjectName()); } else { registered.put(objectName, objectName); } }
而 prometheus-client 则是通过 CollectorRegistry.defaultRegistry 进行注册实例的。
// io.prometheus.client.SimpleCollector.Builder#register() /** * Create and register the Collector with the default registry. */ public C register() { return register(CollectorRegistry.defaultRegistry); } /** * Create and register the Collector with the given registry. */ public C register(CollectorRegistry registry) { C sc = create(); registry.register(sc); return sc; }
所以,好像原理上来讲是不同的。至于到底为什么不能监控到数据,那还不好说。至少,你可以学习 metrics-core 使用 MBean 的形式将数据导出。这是我们下一个方案要讨论的事。
这里我可以给到一个最终简单又不失巧合的方式,实现两个监控组件的兼容,同时向 jmx_exportor 进行导出。如下:
1. 引入 javaagent 依赖包
<!-- javaagent 包,与 外部使用的 jmx_exportor 一致 --> <dependency> <groupId>io.prometheus.jmx</groupId> <artifactId>jmx_prometheus_javaagent</artifactId> <version>0.12.0</version> </dependency>
2. 使用 agent 的工具类进行埋点
因为 javaagent 里面提供一套完整的 client 工具包,所以,我们可以使用。
// 注册指标实例 // 将 io.prometheus.client.Counter 包替换为 io.prometheus.jmx.shaded.io.prometheus.client.Counter io.prometheus.client.Counter c = io.prometheus.client.Counter.build() .name("jmx_test_abc_ffff") .labelNames("topic") .help("topic counter usage.") .register(); public void incTopicMetric(String topic) { // c.labels("test").inc(); // for test }
3. 原样使用 jmx_exportor 就可以导出监控数据了
为什么换一个包这样就可以了?
因为 jmx_exportor 也是通过注册 CollectorRegistry.defaultRegistry 来进行收集数据的,我们只要保持与其实例一致,就可以做到在同一个jvm内共享数据了。
三、 基于 MBean自行实现带标签的埋点
// 测试类 public class PrometheusMbeanMetricsMain { private static ConcurrentHashMap<String, AtomicInteger> topicContainer = new ConcurrentHashMap<>(); private static MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer(); public static void main(String[] args) throws Exception { // 模拟某个topic String commingTopic = "test_topic"; AtomicInteger myTopic1Counter = getMetricCounter(commingTopic); System.out.println("jmx started!"); while(true){ System.out.println("---"); // 计数增加 myTopic1Counter.incrementAndGet(); Thread.sleep(10000); } } private static AtomicInteger getMetricCounter(String topic) throws MalformedObjectNameException, NotCompliantMBeanException, InstanceAlreadyExistsException, MBeanRegistrationException { AtomicInteger myTopic1Counter = topicContainer.get(topic); if(myTopic1Counter == null) { myTopic1Counter = new AtomicInteger(0); Hashtable<String, String> tab = new Hashtable<>(); tab.put("topic", topic); // 占位符,虽然不知道什么意思,但是感觉很厉害的样子 tab.put("_", "_value"); ObjectName objectName = new ObjectName("mydomain_test", tab); // 注册监控实例 到 MBeanServer 中 ObjectInstance objectInstance = mBeanServer.registerMBean(new JmxCounter(myTopic1Counter, objectName), objectName); } return myTopic1Counter; } } // JmxCounter, MBean 要求: 1. 接口必须定义成Public的; 2. 接口命名规范符合要求, 即接口名叫 XYZMBean ,那么实现名就必须一定是XYZ; // DynamicMBean public interface JmxCounterMBean { public Object getCount() throws Exception; } public class JmxCounter implements JmxCounterMBean { private AtomicInteger metric; private ObjectName objectName; public JmxCounter(AtomicInteger metric, ObjectName objectName) { this.objectName = objectName; this.metric = metric; } @Override public Object getCount() throws Exception { // 返回监控结果 return metric.get(); } }
最后,见证奇迹的时刻。结果如下:
# HELP mydomain_test_value_Count Attribute exposed for management (mydomain_test<_=_value, topic=b_topic><>Count) # TYPE mydomain_test_value_Count untyped mydomain_test_value_Count{topic="b_topic",} 1.0 mydomain_test_value_Count{topic="a_topic",} 88.0
很明显,这是一个糟糕的实现,不要学他。仅为了演示效果。
所以,总结下来,自然是使用方案2了。两个组件兼容,实现简单,性能也不错。如果只是为了使用,到此就可以了。不过你得明白,以上方案有取巧的成分在。
四、 原理: jmx_exportor 是如何获取数据的?
jmx_exportor 也是可以通过 http_server 暴露数据。
// io.prometheus.client.exporter.HTTPServer /** * Start a HTTP server serving Prometheus metrics from the given registry. */ public HTTPServer(InetSocketAddress addr, CollectorRegistry registry, boolean daemon) throws IOException { server = HttpServer.create(); server.bind(addr, 3); // 使用 HTTPMetricHandler 处理请求 HttpHandler mHandler = new HTTPMetricHandler(registry); // 绑定到 /metrics 地址上 server.createContext("/", mHandler); server.createContext("/metrics", mHandler); executorService = Executors.newFixedThreadPool(5, DaemonThreadFactory.defaultThreadFactory(daemon)); server.setExecutor(executorService); start(daemon); } /** * Start a HTTP server by making sure that its background thread inherit proper daemon flag. */ private void start(boolean daemon) { if (daemon == Thread.currentThread().isDaemon()) { server.start(); } else { FutureTask<Void> startTask = new FutureTask<Void>(new Runnable() { @Override public void run() { server.start(); } }, null); DaemonThreadFactory.defaultThreadFactory(daemon).newThread(startTask).start(); try { startTask.get(); } catch (ExecutionException e) { throw new RuntimeException("Unexpected exception on starting HTTPSever", e); } catch (InterruptedException e) { // This is possible only if the current tread has been interrupted, // but in real use cases this should not happen. // In any case, there is nothing to do, except to propagate interrupted flag. Thread.currentThread().interrupt(); } } }
所以,可以主要逻辑是 HTTPMetricHandler 处理。来看看。
// io.prometheus.client.exporter.HTTPServer.HTTPMetricHandler#handle public void handle(HttpExchange t) throws IOException { String query = t.getRequestURI().getRawQuery(); ByteArrayOutputStream response = this.response.get(); response.reset(); OutputStreamWriter osw = new OutputStreamWriter(response); // 主要由该 TextFormat 进行格式化输出 // registry.filteredMetricFamilySamples() 进行数据收集 TextFormat.write004(osw, registry.filteredMetricFamilySamples(parseQuery(query))); osw.flush(); osw.close(); response.flush(); response.close(); t.getResponseHeaders().set("Content-Type", TextFormat.CONTENT_TYPE_004); if (shouldUseCompression(t)) { t.getResponseHeaders().set("Content-Encoding", "gzip"); t.sendResponseHeaders(HttpURLConnection.HTTP_OK, 0); final GZIPOutputStream os = new GZIPOutputStream(t.getResponseBody()); response.writeTo(os); os.close(); } else { t.getResponseHeaders().set("Content-Length", String.valueOf(response.size())); t.sendResponseHeaders(HttpURLConnection.HTTP_OK, response.size()); // 写向客户端 response.writeTo(t.getResponseBody()); } t.close(); } }
五、 原理: jmx_exportor 是如何获取Mbean 的数据的?
jmx_exportor 有一个 JmxScraper, 专门用于处理 MBean 的值。
// io.prometheus.jmx.JmxScraper#doScrape /** * Get a list of mbeans on host_port and scrape their values. * * Values are passed to the receiver in a single thread. */ public void doScrape() throws Exception { MBeanServerConnection beanConn; JMXConnector jmxc = null; // 默认直接获取本地的 jmx 信息 // 即是通过共享 ManagementFactory.getPlatformMBeanServer() 变量来实现通信的 if (jmxUrl.isEmpty()) { beanConn = ManagementFactory.getPlatformMBeanServer(); } else { Map<String, Object> environment = new HashMap<String, Object>(); if (username != null && username.length() != 0 && password != null && password.length() != 0) { String[] credent = new String[] {username, password}; environment.put(javax.management.remote.JMXConnector.CREDENTIALS, credent); } if (ssl) { environment.put(Context.SECURITY_PROTOCOL, "ssl"); SslRMIClientSocketFactory clientSocketFactory = new SslRMIClientSocketFactory(); environment.put(RMIConnectorServer.RMI_CLIENT_SOCKET_FACTORY_ATTRIBUTE, clientSocketFactory); environment.put("com.sun.jndi.rmi.factory.socket", clientSocketFactory); } // 如果是远程获取,则会通过 rmi 进行远程通信获取 jmxc = JMXConnectorFactory.connect(new JMXServiceURL(jmxUrl), environment); beanConn = jmxc.getMBeanServerConnection(); } try { // Query MBean names, see #89 for reasons queryMBeans() is used instead of queryNames() Set<ObjectName> mBeanNames = new HashSet<ObjectName>(); for (ObjectName name : whitelistObjectNames) { for (ObjectInstance instance : beanConn.queryMBeans(name, null)) { mBeanNames.add(instance.getObjectName()); } } for (ObjectName name : blacklistObjectNames) { for (ObjectInstance instance : beanConn.queryMBeans(name, null)) { mBeanNames.remove(instance.getObjectName()); } } // Now that we have *only* the whitelisted mBeans, remove any old ones from the cache: jmxMBeanPropertyCache.onlyKeepMBeans(mBeanNames); for (ObjectName objectName : mBeanNames) { long start = System.nanoTime(); scrapeBean(beanConn, objectName); logger.fine("TIME: " + (System.nanoTime() - start) + " ns for " + objectName.toString()); } } finally { if (jmxc != null) { jmxc.close(); } } } // io.prometheus.jmx.JmxScraper#scrapeBean private void scrapeBean(MBeanServerConnection beanConn, ObjectName mbeanName) { MBeanInfo info; try { info = beanConn.getMBeanInfo(mbeanName); } catch (IOException e) { logScrape(mbeanName.toString(), "getMBeanInfo Fail: " + e); return; } catch (JMException e) { logScrape(mbeanName.toString(), "getMBeanInfo Fail: " + e); return; } MBeanAttributeInfo[] attrInfos = info.getAttributes(); Map<String, MBeanAttributeInfo> name2AttrInfo = new LinkedHashMap<String, MBeanAttributeInfo>(); for (int idx = 0; idx < attrInfos.length; ++idx) { MBeanAttributeInfo attr = attrInfos[idx]; if (!attr.isReadable()) { logScrape(mbeanName, attr, "not readable"); continue; } name2AttrInfo.put(attr.getName(), attr); } final AttributeList attributes; try { // 通过 MBean 调用对象,获取所有属性值,略去不说 attributes = beanConn.getAttributes(mbeanName, name2AttrInfo.keySet().toArray(new String[0])); } catch (Exception e) { logScrape(mbeanName, name2AttrInfo.keySet(), "Fail: " + e); return; } for (Attribute attribute : attributes.asList()) { MBeanAttributeInfo attr = name2AttrInfo.get(attribute.getName()); logScrape(mbeanName, attr, "process"); // 处理单个key的属性值, 如 topic=aaa,ip=1 将会进行再次循环处理 processBeanValue( mbeanName.getDomain(), // 获取有效的属性列表, 我们可以简单看一下过滤规则, 如下文 jmxMBeanPropertyCache.getKeyPropertyList(mbeanName), new LinkedList<String>(), attr.getName(), attr.getType(), attr.getDescription(), attribute.getValue() ); } } // 处理每个 mBean 的属性,写入到 receiver 中 // io.prometheus.jmx.JmxScraper#processBeanValue /** * Recursive function for exporting the values of an mBean. * JMX is a very open technology, without any prescribed way of declaring mBeans * so this function tries to do a best-effort pass of getting the values/names * out in a way it can be processed elsewhere easily. */ private void processBeanValue( String domain, LinkedHashMap<String, String> beanProperties, LinkedList<String> attrKeys, String attrName, String attrType, String attrDescription, Object value) { if (value == null) { logScrape(domain + beanProperties + attrName, "null"); } // 单值情况,数字型,字符串型,可以处理 else if (value instanceof Number || value instanceof String || value instanceof Boolean) { logScrape(domain + beanProperties + attrName, value.toString()); // 解析出的数据存入 receiver 中,可以是 jmx, 或者 控制台 this.receiver.recordBean( domain, beanProperties, attrKeys, attrName, attrType, attrDescription, value); } // 多值型情况 else if (value instanceof CompositeData) { logScrape(domain + beanProperties + attrName, "compositedata"); CompositeData composite = (CompositeData) value; CompositeType type = composite.getCompositeType(); attrKeys = new LinkedList<String>(attrKeys); attrKeys.add(attrName); for(String key : type.keySet()) { String typ = type.getType(key).getTypeName(); Object valu = composite.get(key); processBeanValue( domain, beanProperties, attrKeys, key, typ, type.getDescription(), valu); } } // 更复杂型对象 else if (value instanceof TabularData) { // I don't pretend to have a good understanding of TabularData. // The real world usage doesn't appear to match how they were // meant to be used according to the docs. I've only seen them // used as 'key' 'value' pairs even when 'value' is itself a // CompositeData of multiple values. logScrape(domain + beanProperties + attrName, "tabulardata"); TabularData tds = (TabularData) value; TabularType tt = tds.getTabularType(); List<String> rowKeys = tt.getIndexNames(); CompositeType type = tt.getRowType(); Set<String> valueKeys = new TreeSet<String>(type.keySet()); valueKeys.removeAll(rowKeys); LinkedList<String> extendedAttrKeys = new LinkedList<String>(attrKeys); extendedAttrKeys.add(attrName); for (Object valu : tds.values()) { if (valu instanceof CompositeData) { CompositeData composite = (CompositeData) valu; LinkedHashMap<String, String> l2s = new LinkedHashMap<String, String>(beanProperties); for (String idx : rowKeys) { Object obj = composite.get(idx); if (obj != null) { // Nested tabulardata will repeat the 'key' label, so // append a suffix to distinguish each. while (l2s.containsKey(idx)) { idx = idx + "_"; } l2s.put(idx, obj.toString()); } } for(String valueIdx : valueKeys) { LinkedList<String> attrNames = extendedAttrKeys; String typ = type.getType(valueIdx).getTypeName(); String name = valueIdx; if (valueIdx.toLowerCase().equals("value")) { // Skip appending 'value' to the name attrNames = attrKeys; name = attrName; } processBeanValue( domain, l2s, attrNames, name, typ, type.getDescription(), composite.get(valueIdx)); } } else { logScrape(domain, "not a correct tabulardata format"); } } } else if (value.getClass().isArray()) { logScrape(domain, "arrays are unsupported"); } else { // 多半会返回不支持的对象然后得不到jmx监控值 // mydomain_test{3=3, topic=aaa} java.util.Hashtable is not exported logScrape(domain + beanProperties, attrType + " is not exported"); } } // 我们看下prometheus 对 mbeanName 的转换操作,会将各种特殊字符转换为 属性列表 // io.prometheus.jmx.JmxMBeanPropertyCache#getKeyPropertyList public LinkedHashMap<String, String> getKeyPropertyList(ObjectName mbeanName) { LinkedHashMap<String, String> keyProperties = keyPropertiesPerBean.get(mbeanName); if (keyProperties == null) { keyProperties = new LinkedHashMap<String, String>(); // 转化为 string 格式 String properties = mbeanName.getKeyPropertyListString(); // 此处为 prometheus 认识的格式,已经匹配上了 Matcher match = PROPERTY_PATTERN.matcher(properties); while (match.lookingAt()) { keyProperties.put(match.group(1), match.group(2)); properties = properties.substring(match.end()); if (properties.startsWith(",")) { properties = properties.substring(1); } match.reset(properties); } keyPropertiesPerBean.put(mbeanName, keyProperties); } return keyProperties; } // io.prometheus.jmx.JmxMBeanPropertyCache#PROPERTY_PATTERN private static final Pattern PROPERTY_PATTERN = Pattern.compile( "([^,=:\\*\\?]+)" + // Name - non-empty, anything but comma, equals, colon, star, or question mark "=" + // Equals "(" + // Either "\"" + // Quoted "(?:" + // A possibly empty sequence of "[^\\\\\"]*" + // Greedily match anything but backslash or quote "(?:\\\\.)?" + // Greedily see if we can match an escaped sequence ")*" + "\"" + "|" + // Or "[^,=:\"]*" + // Unquoted - can be empty, anything but comma, equals, colon, or quote ")");
六、 原理: jmx_exportor 为什么输出的格式是这样的?
prometheus 的数据格式如下,如何从埋点数据转换?
# HELP mydomain_test_value_Count Attribute exposed for management (mydomain_test<_=_value, topic=b_topic><>Count) # TYPE mydomain_test_value_Count untyped mydomain_test_value_Count{topic="b_topic",} 1.0 mydomain_test_value_Count{topic="a_topic",} 132.0
是一个输出格式问题,也是一协议问题。
// io.prometheus.client.exporter.common.TextFormat#write004 public static void write004(Writer writer, Enumeration<Collector.MetricFamilySamples> mfs) throws IOException { /* See http://prometheus.io/docs/instrumenting/exposition_formats/ * for the output format specification. */ while(mfs.hasMoreElements()) { Collector.MetricFamilySamples metricFamilySamples = mfs.nextElement(); writer.write("# HELP "); writer.write(metricFamilySamples.name); writer.write(' '); writeEscapedHelp(writer, metricFamilySamples.help); writer.write('\n'); writer.write("# TYPE "); writer.write(metricFamilySamples.name); writer.write(' '); writer.write(typeString(metricFamilySamples.type)); writer.write('\n'); for (Collector.MetricFamilySamples.Sample sample: metricFamilySamples.samples) { writer.write(sample.name); // 带 labelNames 的,依次输出对应的标签 if (sample.labelNames.size() > 0) { writer.write('{'); for (int i = 0; i < sample.labelNames.size(); ++i) { writer.write(sample.labelNames.get(i)); writer.write("=\""); writeEscapedLabelValue(writer, sample.labelValues.get(i)); writer.write("\","); } writer.write('}'); } writer.write(' '); writer.write(Collector.doubleToGoString(sample.value)); if (sample.timestampMs != null){ writer.write(' '); writer.write(sample.timestampMs.toString()); } writer.write('\n'); } } }
done.