Elasticsearch 聚合Aggregations API
简介:聚合框架有助于根据搜索查询提供聚合数据,语法定义如下:
"aggregations" : { // 可以简写为aggs "<aggregation_name>" : { // 聚合名字,唯一标识符 "<aggregation_type>" : { // 聚合类型 <aggregation_body> // 聚合体,对那些字段聚合 } [,"meta" : { [<meta_data_body>] } ]? // 元 [,"aggregations" : { [<sub_aggregation>]+ } ]? // 聚合里面的子聚合 } [,"<aggregation_name_2>" : { ... } ]* // 另一个聚合名字 }
注意:设置size=0,表示只返回聚合结果,不需要查询原始数据
一、Metric Aggregations(指标聚合):对桶内的文档进行统计计算
1. Top Hits:获取文档前几条数据,相当于MySQL中limit
A. URL:POST /index/_search?size=0
B. 请求参数
form:开始位置;
size:返回匹配项的最大数量,默认值3;
sort:匹配项的排序方式,默认是按照分数排序。
C. Kibana查询
D. Java实现
TopHitsAggregationBuilder aggregationBuilder = AggregationBuilders.topHits("top_hits").sort("time", SortOrder.DESC).size(1); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的问题 if (aggregations != null) { TopHits topHits = aggregations.get("top_hits"); }
2. Cardinality:统计去重后的文档数,相当于MySQL中count(distinct(字段))
A. URL:POST /index/_search?size=0
B. 请求参数
field:去重字段名;
script:脚本。
C. Kibana查询
D. Java实现
CardinalityAggregationBuilder aggregationBuilder = AggregationBuilders.cardinality("cardinality").field("cid"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的问题 if (aggregations != null) { Cardinality cardinality = aggregations.get("cardinality"); long count = cardinality.getValue(); }
3. Max:对指定字段求最大值
A. URL:POST /index/_search?size=0
B. 请求参数
field:求最大值字段名;
script:脚本。
C. Kibana查询
D. Java实现
MaxAggregationBuilder aggregationBuilder = AggregationBuilders.max("max").field("timestamp"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的问题 if (aggregations != null) { ParsedMax max = aggregations.get("max");
String timestamp = max.getValueAsString(); }
4. Min:对指定字段求最小值
A. URL:POST /index/_search?size=0
B. 请求参数
filed:求最小值字段名;
script:脚本。
C. Kibana查询
D. Java实现
MinAggregationBuilder aggregationBuilder = AggregationBuilders.min("min").field("timestamp"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的问题 if (aggregations != null) { ParsedMin min = aggregations.get("min");
String timestamp = min.getValueAsString(); }
5. Sum:对指定字段值求和
A. URL:POST /index/_search?size=0
B. 请求参数
filed:求和字段名;
script:脚本。
C. Kibana查询
D. Java实现
SumAggregationBuilder aggregationBuilder = AggregationBuilders.sum("sum").field("low"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的问题 if (aggregations != null) { Sum sum = aggregations.get("low");
Double low = sum.getValue(); }
6. Avg:求均值
A. URL:
B. 请求参数
script:脚本
C. Kibana查询
D. Java实现
7. Stats:统计,包含Max、Min、Sum、Avg
A. URL:
B. 请求参数
script:脚本
C. Kibana查询
D. Java实现
8. Value Count:统计文档数,重复的依然会计数
A. URL:POST /index/_search?size=0
B. 请求参数
field:统计的字段名;
script:脚本。
C. Kibana查询
D. Java实现
ValueCountAggregationBuilder aggregationBuilder = AggregationBuilders.count("count").field("cid"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的问题 if (aggregations != null) { ValueCount valueCount = aggregations.get("count"); long count = valueCount.getValue(); }
二、Bucket Aggregations(桶聚合):满足特定条件的文档的集合
1. Terms:对指定字段进行分组统计,相当于MySQL中group by或select distict column from table,该聚合不太准确
A. URL:GET /index/_search
B. 请求参数
filed:分组对象名,只适合一个字段;
size:返回文档的个数,默认值10,size值越大,数据越准确,伴随成本也越高;
order:指定返回结果的排序方式;
script:脚本,仅限于根据两个字段进行分组,但这有性能问题,最好不用。
C. Kibana查询
D. Java实现
// Script script = new Script("doc['data.srcip'].value + '_' + doc['data.dstip'].value");
// TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("terms").script(script).size(Integer.MAX_VALUE);
TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("terms").field("data.ip").size(Integer.MAX_VALUE); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的问题 if (aggregations != null) { Terms terms = aggregations.get("terms"); }
2. Filter:对查询的文档再进行过滤
A. URL:POST /index/_search?size=0
B. 请求参数:可参考DSL语句查询
C. Kibana查询
D. Java实现
FilterAggregationBuilder aggregationBuilder = AggregationBuilders.filter("filter", QueryBuilders.termsQuery("rule", new String[]{"login", "auth", "cca"})); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是解决索引不存在的问题 if (aggregations != null) { Filter filter = aggregations.get("filter"); }
3. Range:按指定区间范围统计,注意包括from值,不包括to值
A. URL:GET /index/_search
B. 请求参数
field:区间字段名;
to value1:指从*到value1范围,不包括value1;
from value1 - to value2:指从value1 到value2范围,包括value1,但不包括value2;
from value2:指从value2到*范围,包括value2。
C. Kibana查询
D. Java实现
RangeAggregationBuilder aggregationBuilder = AggregationBuilders.range("range").field("level").addUnboundedTo("1", 6).addRange("2", 6, 11).addUnboundedFrom("3", 11); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是解决索引不存在的问题 if (aggregations != null) { Range range = aggregations.get("range"); }
4. Date histogram:按日期统计日期直方图数据,适用于日期和日期范围聚合
A. URL:POST /index/_search?size=0
B. 请求参数
field:日期字段名;
format:时间格式;
calendar_interval:日历间隔,比如2d;
fixed_interval:固定间隔,比如1000ms;
min_doc_count:最小文档数,比该值还小就忽略获取。
C. Kibana查询
D. Java实现
DateHistogramAggregationBuilder aggregationBuilder = AggregationBuilders.dateHistogram("date_histogram") .field("timestamp") .format("yyyy-MM-dd") .calendarInterval(new DateHistogramInterval("1d")) .minDocCount(1); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); if (aggregations != null) { ParsedDateHistogram histogram = aggregations.get("date_histogram"); }
5. Date range:按日期值的区间范围统计
A. URL:POST /index/_search?size=0
B. 请求参数
field:日期区间字段名;
format:时间格式;
to value1:指从*到value1范围,不包括value1;
from value1 - to value2:指从value1 到value2范围,包括value1,但不包括value2;
C. Kibana查询
D. Java实现
DateRangeAggregationBuilder dateRangeAggregationBuilder = AggregationBuilders.dateRange("day_range") .field("day") .format("yyyy-MM-dd") .addRange("1", "2020-02-03") .addRange("2", "2020-02-03", "2020-03-10") .addRange("3", "2020-03-10"); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); Aggregations aggregations = searchResponse.getAggregations(); // 主要是避免索引不存在的问题 if (aggregations != null) { ParsedDateRange dateRange = aggregations.get("day_range"); }
三、Pipeline Aggregations(管道聚合):是基于其他聚合而非文档集所产生的输出,类似数据库分组后分页
1. Bucket Sort:是对其父多桶聚合的桶进行排序
A. URL:POST /sales/_search?size=0
B. 请求参数
from:设置值之前的位置的存储桶将被截断,默认值为0,注意分页需是size的整数倍;
size:要返回的存储桶数,默认为父聚合的所有存储桶;
sort:定义排序结构,可以多字段
C. Kibana查询:
D. Java实现: