Druid——分位数使用

背景：近期因为项目需求，需要用到druid的分位数聚合查询，关于分位数的概念这里不做详述，自行百度即可。因为没有使用过且网上也说直方图特性还在druid的实验特性里面，应该是还不太完善，因此个人尝试研究使用了一下，看看结果是个啥样的。

应用场景：approximate-histograms配合使用quantile（quantiles）等分位数post-agg可以实现查询0.95/0.98/0.99等的页面加载时间。

因为我们是Druid的业务使用方，所以服务并非我们这边管理，根据官网的说明，首先需要添加插件，公司使用的Druid服务版本是0.9.2版本的，截止目前官网最新是0.12.3了。所以需要：

添加扩展支持

添加方式：查看{DRUID}/extensions目录下druid-histogram存在。
druid-histogram需要添加到extension：

druid.extensions.loadList=["druid-histogram",.....]

节点需要重启来加载新添加的extension：

查询端，需要重启historical节点和broker节点。
数据摄入端，需要重启overlord节

服务插件得到支持后，然后数据摄入：

根据官网直方图的介绍：http://druid.io/docs/latest/development/extensions-core/approximate-histograms.html

数据摄入

{
  ......

     "metricsSpec": [
              ......

              {
                    "name": "pageLoad",
                    "type": "longSum",
                    "fieldName": "pageLoad"
              },
        　　　 {
           　　　　　　"type" : "approxHistogramFold",
           　　　　　　"name" : "his_pageLoad",
           　　　　　　"fieldName" : "pageLoad",
           　　　　　　"resolution" : 50,
          　　　　　　 "numBuckets" : 7,
           　　　　　　"lowerLimit" : 0.0,
           　　　　　　"upperLimit" : 10000000.0
        　　　　}

               ......
          ]
   ......
}

这里我只是用了pageLoad这字段来实验而已，看看druid对pageLoad进行sum和分位数计算的对比。

需要注意的是resolution、numBuckets、lowerLimit、upperLimit这几个参数的含义参见官网解释，这里不做过多说明，这里我的设置是完全看心情写的。接着就是查询了：

查询脚本

脚本1、求sum:

{
    "queryType":"timeseries",
    "dataSource":{
        "type":"table",
        "name":"bpm_page_view"
    },
    "context":{
        "priority":7,
        "timeout":3000,
        "queryId":"f7d75164-2d53-44fe-8978-10742e102c3d"
    },
    "intervals":{
        "type":"LegacySegmentSpec",
        "intervals":[
            "2018-11-26T15:26:10.773+08:00/2018-11-26T15:56:10.773+08:00"
        ]
    },
    "descending":false,
    "filter":{
        "type":"and",
        "fields":[
            {
                "type":"selector",
                "dimension":"appCode",
                "value":"ec269367bf854639a56cb1618a097c38",
                "extractionFn":null
            }
        ]
    },
    "granularity":{
        "type":"duration",
        "duration":60000,
        "origin":"1970-01-01T08:00:00.000+08:00"
    },
    "aggregations":[
        {
            "type":"filtered",
            "aggregator":{
                "type":"longSum",
                "name":"pageLoad",
                "fieldName":"pageLoad"
            },
            "filter":{
                "type":"and",
                "fields":[
                    {
                        "type":"selector",
                        "dimension":"terminal",
                        "value":"IOS",
                        "extractionFn":null
                    }
                ]
            }
        }
    ],
    "postAggregations":null
}

脚本2、求分位数（这里求90%,95%,99%）

{
    "queryType":"timeseries",
    "dataSource":{
        "type":"table",
        "name":"bpm_page_view"
    },
    "context":{
        "priority":7,
        "timeout":3000,
        "queryId":"f7d75164-2d53-44fe-8978-10742e102c3d"
    },
    "intervals":{
        "type":"LegacySegmentSpec",
        "intervals":[
            "2018-11-26T15:26:10.773+08:00/2018-11-26T15:56:10.773+08:00"
        ]
    },
    "descending":false,
    "filter":{
        "type":"and",
        "fields":[
            {
                "type":"selector",
                "dimension":"appCode",
                "value":"ec269367bf854639a56cb1618a097c38",
                "extractionFn":null
            }
        ]
    },
    "granularity":{
        "type":"duration",
        "duration":60000,
        "origin":"1970-01-01T08:00:00.000+08:00"
    },
    "aggregations":[
        {
            "type":"filtered",
            "aggregator":{
                "type": "approxHistogramFold",
                "name": "his_pageLoad",
                "fieldName": "his_pageLoad",
                "resolution" : null,
                "numBuckets" : null
            },
            "filter":{
                "type":"and",
                "fields":[
                    {
                        "type":"selector",
                        "dimension":"terminal",
                        "value":"IOS",
                        "extractionFn":null
                    }
                ]
            }
        }        
    ],
    "postAggregations":[
            { "type" : "quantiles", "name" : "响应时间", "fieldName" : "his_pageLoad","probabilities" : [0.9,0.95,0.99] }
        ]
}

聚合结果分析

向kafka发送数据，druid来处理，kafka生产者api：

package com.suning.ctbpm;

import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import java.util.Properties;

public class KafkaProducerSimple {
    public static void main(String[] args) {
        String topic = "xxxx";
        Properties props = new Properties();
        props.put("serializer.class", "kafka.serializer.StringEncoder");
        props.put("metadata.broker.list", "xxxxxxx");
        props.put("request.required.acks", "1");
        Producer<String, String> producer = new Producer<>(new ProducerConfig(props));
        String msg;
        for (int i = 1; i <= 200; i++) {
            int j = i;
            //if (i == 10) {
             //   j = 11;
            //}
            msg = "{\n" +
                    "    \"access\":\"IE_10_0\",\n" +
                    "    \"apdexSign\":100,\n" +
                    "    \"appCode\":\"ec269367bf854639a56cb1618a097c38\",\n" +
                    "    \"area\":\"某某区\",\n" +
                    "    \"blankScreen\":11,\n" +
                    "    \"browser\":\"IE\",\n" +
                    "    \"browserVersion\":\"IE_10\",\n" +
                    "    \"cache\":30,\n" +
                    "    \"city\":\"某某城市\",\n" +
                    "    \"country\":\"zh_CN\",\n" +
                    "    \"dns\":11,\n" +
                    "    \"domParser\":211,\n" +
                    "    \"domain\":\"xxx.xxx.com\",\n" +
                    "    \"firstAction\":110,\n" +
                    "    \"firstPacket\":44,\n" +
                    "    \"firstPaint\":20,\n" +
                    "    \"htmlLoad\":187,\n" +
                    "    \"ip\":\"10.200.181.61\",\n" +
                    "    \"keyPageCode\":[\n" +
                    "\n" +
                    "    ],\n" +
                    "    \"logTime\":1543221571000,\n" +
                    "    \"net\":116,\n" +
                    "    \"operator\":\"unknown\",\n" +
                    "    \"os\":\"iOS 10 (iPhone)\",\n" +
                    "    \"pageLoad\":" + j + ",\n" +
                    "    \"pageRef\":\"http://xxx.xxx.com/broadcast/matchBefore.html\",\n" +
                    "    \"pageRender\":769,\n" +
                    "    \"processing\":765,\n" +
                    "    \"province\":\"谋省\",\n" +
                    "    \"redirect\":10,\n" +
                    "    \"request\":44,\n" +
                    "    \"resourceLoad\":558,\n" +
                    "    \"response\":101,\n" +
                    "    \"restPacket\":101,\n" +
                    "    \"slowPageSign\":10,\n" +
                    "    \"ssl\":10,\n" +
                    "    \"stalled\":10,\n" +
                    "    \"tcp\":42,\n" +
                    "    \"terminal\":\"IOS\",\n" +
                    "    \"ua\":\"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Mobile/14E304;PPTVSports\",\n" +
                    "    \"unload\":10,\n" +
                    "    \"version\":\"V1.0.7\",\n" +
                    "    \"visitId\":\"f7f2-7f8c760d\"\n" +
                    "}";
            KeyedMessage<String, String> record = new KeyedMessage<>(topic, msg);
            producer.send(record);
        }
        producer.close();
    }
}

这里msg是根据自己的摄入脚本业务数据来造的，注意时间logTime字段，因为我们方便德鲁伊聚合到一个点上来观察，因为每一次聚合我们让logTime的时间一样，且查询脚本里面的intervals查询时间段应该是包括这个logTime。

第一次：让msg 中的logTime=1543233754000,让for循环10次，即：pageLoad从1到10的十条数据到kafka。

执行脚本1("intervals":["2018-11-26T19:34:03.205+08:00/2018-11-26T20:04:03.205+08:00"])：

[
  {
    "timestamp": "2018-11-26T11:34:00.000Z",
    "result": {
      "pageLoad": 0
    }
  },
 
 ......

  {
    "timestamp": "2018-11-26T12:01:00.000Z",
    "result": {
      "pageLoad": 0
    }
  },
  {
    "timestamp": "2018-11-26T12:02:00.000Z",
    "result": {
      "pageLoad": 55
    }
  },
  {
    "timestamp": "2018-11-26T12:03:00.000Z",
    "result": {
      "pageLoad": 0
    }
  },
  {
    "timestamp": "2018-11-26T12:04:00.000Z",
    "result": {
      "pageLoad": 0
    }
  }
]

执行脚本2("intervals":["2018-11-26T19:34:03.205+08:00/2018-11-26T20:04:03.205+08:00"])：

[
  {
    "timestamp": "2018-11-26T11:34:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          "Infinity",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "-Infinity"
        ],
        "counts": [
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN"
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          "NaN",
          "NaN",
          "NaN"
        ],
        "min": "Infinity",
        "max": "-Infinity"
      }
    }
  },
  
  ......
  
  {
    "timestamp": "2018-11-26T12:02:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          -0.5,
          1,
          2.5,
          4,
          5.5,
          7,
          8.5,
          10
        ],
        "counts": [
          1,
          1,
          2,
          1,
          2,
          1,
          2
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          9,
          9.5,
          9.9
        ],
        "min": 1,
        "max": 10
      }
    }
  },
  ......
  {
    "timestamp": "2018-11-26T12:04:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          "Infinity",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "-Infinity"
        ],
        "counts": [
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN"
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          "NaN",
          "NaN",
          "NaN"
        ],
        "min": "Infinity",
        "max": "-Infinity"
      }
    }
  }
]

比较两次：druid将我的十条数据聚合在点的：2018-11-26T12:02:00.000Z，数据的logTime是：2018-11-26 20:02:34，十条数据1+2+。。。+10=55，且：TP90是9，TP95是9.5，TP99是9.9

第二次：让msg 中的logTime=1543234941000,让for循环10次，即：pageLoad从1到10的十条数据到kafka，且当i=9的时候我将j设置为7即：（1、2、3、4、5、6、7、8、7、10），为了验证是排序后的。

执行脚本1("intervals":["2018-11-26T19:53:12.89+08:00/2018-11-26T20:23:12.891+08:00"])：

[
  {
    "timestamp": "2018-11-26T11:53:00.000Z",
    "result": {
      "pageLoad": 0
    }
  },
......

　　{
　　　　"timestamp": "2018-11-26T12:02:00.000Z",
　　　　"result": {
　　　　"pageLoad": 55
　　　}
　　},

  ......

  {
    "timestamp": "2018-11-26T12:22:00.000Z",
    "result": {
      "pageLoad": 53
    }
  },
  {
    "timestamp": "2018-11-26T12:23:00.000Z",
    "result": {
      "pageLoad": 0
    }
  }
]

执行脚本2("intervals":["2018-11-26T19:53:12.89+08:00/2018-11-26T20:23:12.891+08:00"])：

[
  {
    "timestamp": "2018-11-26T11:53:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          "Infinity",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "-Infinity"
        ],
        "counts": [
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN"
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          "NaN",
          "NaN",
          "NaN"
        ],
        "min": "Infinity",
        "max": "-Infinity"
      }
    }
  },
  ......
  {
    "timestamp": "2018-11-26T12:02:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          -0.5,
          1,
          2.5,
          4,
          5.5,
          7,
          8.5,
          10
        ],
        "counts": [
          1,
          1,
          2,
          1,
          2,
          1,
          2
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          9,
          9.5,
          9.9
        ],
        "min": 1,
        "max": 10
      }
    }
  },
  ......
  {
    "timestamp": "2018-11-26T12:22:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          -0.5,
          1,
          2.5,
          4,
          5.5,
          7,
          8.5,
          10
        ],
        "counts": [
          1,
          1,
          2,
          1,
          3,
          1,
          1
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          8,
          9,
          9.799999
        ],
        "min": 1,
        "max": 10
      }
    }
  },
  ......
]

比较两次：druid将我的十条数据聚合在点的：2018-11-26T12:22:00.000Z，数据的logTime是：2018-11-26 20:22:21，十条数据1+2+。。。+10=53，且：TP90是8，TP95是9，TP99是9.799999,这里的TP90之所以是8是因为排序了，第九个是8。

继续来吧

第三步：让msg 中的logTime=1543235747000,让for循环100次，即：pageLoad从1到100的十条数据到kafka。

执行脚本1("intervals":["2018-11-26T20:07:05.311+08:00/2018-11-26T20:37:05.311+08:00"])：

[
......
  {
    "timestamp": "2018-11-26T12:22:00.000Z",
    "result": {
      "pageLoad": 53
    }
  },
  ......
  {
    "timestamp": "2018-11-26T12:35:00.000Z",
    "result": {
      "pageLoad": 5050
    }
  },
......
]

执行脚本2("intervals":["2018-11-26T20:07:05.311+08:00/2018-11-26T20:37:05.311+08:00"])：

[
  ......
  {
    "timestamp": "2018-11-26T12:22:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          -0.5,
          1,
          2.5,
          4,
          5.5,
          7,
          8.5,
          10
        ],
        "counts": [
          1,
          1,
          2,
          1,
          3,
          1,
          1
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          8,
          9,
          9.799999
        ],
        "min": 1,
        "max": 10
      }
    }
  },
  ......
  {
    "timestamp": "2018-11-26T12:35:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          -15.5,
          1,
          17.5,
          34,
          50.5,
          67,
          83.5,
          100
        ],
        "counts": [
          1,
          16,
          17,
          16,
          17,
          16,
          17
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          90,
          95,
          99
        ],
        "min": 1,
        "max": 100
      }
    }
  },
  ......
]

自行分析吧，再来

第四步：让msg 中的logTime=1543236350000,让for循环200次，即：pageLoad从1到200的十条数据到kafka。

执行脚本1("intervals":["2018-11-26T20:16:40.524+08:00/2018-11-26T20:46:40.524+08:00"])：

[
  ......
  {
    "timestamp": "2018-11-26T12:22:00.000Z",
    "result": {
      "pageLoad": 53
    }
  },
  ......
  {
    "timestamp": "2018-11-26T12:35:00.000Z",
    "result": {
      "pageLoad": 5050
    }
  },
  ......
  {
    "timestamp": "2018-11-26T12:45:00.000Z",
    "result": {
      "pageLoad": 20100
    }
  },
  {
    "timestamp": "2018-11-26T12:46:00.000Z",
    "result": {
      "pageLoad": 0
    }
  }
]

执行脚本2("intervals":["2018-11-26T20:16:40.524+08:00/2018-11-26T20:46:40.524+08:00"])：

[
  ......
  {
    "timestamp": "2018-11-26T12:22:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          -0.5,
          1,
          2.5,
          4,
          5.5,
          7,
          8.5,
          10
        ],
        "counts": [
          1,
          1,
          2,
          1,
          3,
          1,
          1
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          8,
          9,
          9.799999
        ],
        "min": 1,
        "max": 10
      }
    }
  },
  ......
  {
    "timestamp": "2018-11-26T12:35:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          -15.5,
          1,
          17.5,
          34,
          50.5,
          67,
          83.5,
          100
        ],
        "counts": [
          1,
          16,
          17,
          16,
          17,
          16,
          17
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          90,
          95,
          99
        ],
        "min": 1,
        "max": 100
      }
    }
  },
  ......
  {
    "timestamp": "2018-11-26T12:45:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          -32.16666793823242,
          1,
          34.16666793823242,
          67.33333587646484,
          100.5,
          133.6666717529297,
          166.83334350585938,
          200
        ],
        "counts": [
          1,
          33,
          33,
          33,
          33,
          33,
          34
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          180,
          190,
          198
        ],
        "min": 1,
        "max": 200
      }
    }
  },
  {
    "timestamp": "2018-11-26T12:46:00.000Z",
    "result": {
      "his_pageLoad": {
        "breaks": [
          "Infinity",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "-Infinity"
        ],
        "counts": [
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN",
          "NaN"
        ]
      },
      "响应时间": {
        "probabilities": [
          0.9,
          0.95,
          0.99
        ],
        "quantiles": [
          "NaN",
          "NaN",
          "NaN"
        ],
        "min": "Infinity",
        "max": "-Infinity"
      }
    }
  }
]

还是自行分析吧，

到此：Druid直方图分位数的实战就结束了，比较看出，druid在将数据聚合到一个点的时候，先把数据进行升序排序，然后取TP分位数的那个值来单做聚合点的分位数值。

下班了。。。

转载请附上原创路径啊，引流哈哈哈：https://www.cnblogs.com/wynjauu/articles/10022863.html

posted @ 2018-11-26 20:53 舞羊阅读(2048) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

舞羊

当你的才华还撑不起你的野心时，你就应该静下心来学习。

Druid——分位数使用

添加扩展支持

数据摄入

查询脚本

聚合结果分析

公告