[ElasticSearch]ES操作之总和桶聚合(Sum Bucket Aggregation)
最近从同事那里学到了很多ES查询的新姿势,总结一波.
总和桶聚合(Sum Bucket Aggregation)
使用场景: 获取某分组条件下所有桶的指定度量的和
比如: 根据某个条件分组,获取前1000条数据出现的数量和.
可以用笨办法定义变量,循环遍历分组,拿到count再求和的方式,但不够逼格,既然ES提供了方法,直接调用即可.
传送门:https://xiaoxiami.gitbook.io/elasticsearch/ji-chu/36aggregationsju-he-fen-679029/363guan-dao-ju-540828-pipeline-aggregations/zong-he-tong-ju-540828-sum-bucket-aggregation
例1-DSL写法:
"aggs": { "all": { "terms": { "field": "topics", "size": 5 } }, "sum":{ "sum_bucket":{ "buckets_path":"all>_count" } } }
结果:
"aggregations": {
"all": {
"doc_count_error_upper_bound": 11656,
"sum_other_doc_count": 2575137,
"buckets": [
{
"key": "xx",
"doc_count": 129636
},
{
"key": "xxx",
"doc_count": 41586
},
{
"key": "xxxx",
"doc_count": 39196
},
{
"key": "xxxxx",
"doc_count": 38775
},
{
"key": "xxxxxx",
"doc_count": 23163
}
]
},
"sum": {
"value": 272356
}
}
sum的value就是分组的doc_count的和
java操作rest-high-level-client写法:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder() .query(new MatchAllQueryBuilder()) .size(0) .timeout(TimeValue.timeValueMillis(120000)); TermsAggregationBuilder terms = AggregationBuilders.terms("all").field("topics").size(5); SumBucketPipelineAggregationBuilder sumBucket = new SumBucketPipelineAggregationBuilder("sum", "all>_count"); sourceBuilder.aggregation(terms).aggregation(sumBucket); SearchRequest request = new SearchRequest(xxIndex) .types(xxType) .source(sourceBuilder); SearchResponse response = esClient.getClient().search(request); Map<String, Aggregation> map = response.getAggregations().getAsMap(); double sum = ((ParsedSimpleValue)map.get("sum")).value();
除了count,其他度量条件(数字类型)也可以求和,比如对分组下的某个字段求和,然后获取所有分组的和
例2-DSL写法:
"aggs": {
"all": {
"terms": {
"field": "topics",
"size": 5
},
"aggs": {
"friends_cnt": {
"sum": {
"field": "friends_cnt"
}
}
}
},
"sum":{
"sum_bucket":{
"buckets_path":"all>friends_cnt"
}
}
}
结果:
"aggregations": {
"all": {
"doc_count_error_upper_bound": 11656,
"sum_other_doc_count": 2575137,
"buckets": [
{
"key": "xx",
"doc_count": 129636,
"friends_cnt": {
"value": 55291503
}
},
{
"key": "xxx",
"doc_count": 41586,
"friends_cnt": {
"value": 21381248
}
},
{
"key": "xxxx",
"doc_count": 39196,
"friends_cnt": {
"value": 14668921
}
},
{
"key": "xxxxx",
"doc_count": 38775,
"friends_cnt": {
"value": 19805247
}
},
{
"key": "xxxxxx",
"doc_count": 23163,
"friends_cnt": {
"value": 10268415
}
}
]
},
"sum": {
"value": 121415334
}
}
基于java:只需要修改第一个聚合条件,加一个子聚合,然后修改sumbucket的"_count"
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder() .query(new MatchAllQueryBuilder()) .size(0) .timeout(TimeValue.timeValueMillis(120000)); TermsAggregationBuilder terms = AggregationBuilders.terms("all").field("topics").size(5) .subAggregation(new SumAggregationBuilder("friends_cnt").field("friends_cnt")); SumBucketPipelineAggregationBuilder sumBucket = new SumBucketPipelineAggregationBuilder("sum", "all>friends_cnt"); sourceBuilder.aggregation(terms).aggregation(sumBucket); SearchRequest request = new SearchRequest(xxIndex) .types(xxType) .source(sourceBuilder); SearchResponse response = esClient.getClient().search(request); Map<String, Aggregation> map = response.getAggregations().getAsMap(); double sum = ((ParsedSimpleValue)map.get("sum")).value(); return Double.toString(sum);