ES - 聚合
聚合的种类
聚合(aggregations) 可以实现对文档数据的统计、分析、运算。聚合常见的有三类:
-
桶(Bucket) 聚合: 用来对文档做分组
- TermAggregation: 按照文档字段值分组
- Date Histogram: 按照日期阶梯分组,例如一周为一组,或者一月为一组
-
度量(Metric)聚合: 用以计算一些值,比如:最大值、最小值、平均值等
- Avg:求平均值
- Max:求最大值
- Min:求最小值
- Stats: 同时求max、min、avg、sum等
-
管道(pipeline) 聚合: 其它聚合的结果为基础做聚合
参与聚合的字段类型必须是:
- keyword
- 数值
- 日期
- 布尔
Bucket 聚合
需求:根据品牌进行分组聚合后按照数量升序排序
默认情况下吗,Bucket聚合会统计Bucket内的文档数量,记为_count, 按照_count 降序排序
GET /hotel/_search
{
"size":0, //不查看文档数据
"aggs": { //定义聚合
"brandAgg": { //给聚合起个名字
"terms": { //聚合的类型
"field": "brand", //参与聚合的字段
"size": 10, //希望获取的聚合结果数量
"order": {
"_count": "asc" //根据数量升序排序
}
}
}
}
}
结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 201,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"brandAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 130,
"buckets" : [
{
"key" : "万丽",
"doc_count" : 2
},
{
"key" : "丽笙",
"doc_count" : 2
},
{
"key" : "君悦",
"doc_count" : 4
},
{
"key" : "豪生",
"doc_count" : 6
},
{
"key" : "维也纳",
"doc_count" : 7
},
{
"key" : "凯悦",
"doc_count" : 8
},
{
"key" : "希尔顿",
"doc_count" : 10
},
{
"key" : "汉庭",
"doc_count" : 10
},
{
"key" : "万豪",
"doc_count" : 11
},
{
"key" : "喜来登",
"doc_count" : 11
}
]
}
}
}
可以同时集合query , 限定聚合范围,以免参与聚合的数据量过大
GET /hotel/_search
{
"query": {
"range": {
"price": {
"lt": 300
}
}
},
"size":0,
"aggs": {
"brandAgg": {
"terms": {
"field": "brand",
"size": 10,
"order": {
"_count": "asc"
}
}
}
}
}
Metrics 聚合
需求:获取每个品牌的用户评分的min、max、avg等值,并根据avg 降序排序
GET /hotel/_search
{
"size": 0,
"aggs": {
"brandAgg": {
"terms": {
"field": "brand",
"size": 10,
"order": {
"scoreAggs.avg": "desc"
}
},
"aggs": { //是brandAgg 聚合的子聚合,注意与terms 平级
"scoreAggs": { //子聚合的名称
"stats": { //聚合类型,这里stats可以计算min、max、avg等
"field": "score" // 聚合字段,这里只是score
}
}
}
}
}
}
响应:
{
"took" : 36,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 201,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"brandAgg" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 166,
"buckets" : [
{
"key" : "万丽",
"doc_count" : 2,
"scoreAggs" : {
"count" : 2,
"min" : 46.0,
"max" : 47.0,
"avg" : 46.5,
"sum" : 93.0
}
},
{
"key" : "凯悦",
"doc_count" : 8,
"scoreAggs" : {
"count" : 8,
"min" : 45.0,
"max" : 47.0,
"avg" : 46.25,
"sum" : 370.0
}
},
{
"key" : "和颐",
"doc_count" : 12,
"scoreAggs" : {
"count" : 12,
"min" : 44.0,
"max" : 47.0,
"avg" : 46.083333333333336,
"sum" : 553.0
}
},
{
"key" : "丽笙",
"doc_count" : 2,
"scoreAggs" : {
"count" : 2,
"min" : 46.0,
"max" : 46.0,
"avg" : 46.0,
"sum" : 92.0
}
},
{
"key" : "喜来登",
"doc_count" : 11,
"scoreAggs" : {
"count" : 11,
"min" : 44.0,
"max" : 48.0,
"avg" : 46.0,
"sum" : 506.0
}
}
]
}
}
}
RestClinet 实现聚合
@Test //分组聚合
public void testBuckert() throws IOException {
//1. 准备request
SearchRequest searchRequest = new SearchRequest("hotel");
searchRequest.source().size(0); // 设置为0 不需要文档数据
//2. 准备DSL
searchRequest.source()
.aggregation(AggregationBuilders.terms("brandTerm") //自定义agg名称
.field("brand")
.size(10));
//3. 发送请求
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
//4. 解析响应结果
Aggregations aggregations = response.getAggregations();
// List<? extends Terms.Bucket> buckets = terms.getBuckets();
// for (Terms.Bucket bucket : buckets) {
// System.out.println(bucket.getKeyAsString());
// }
//brandAgg
Terms terms = aggregations.get("brandTerm"); // 根据自定义的agg名称获得term类型的聚合结果
terms.getBuckets().stream().map(Terms.Bucket::getKeyAsString).forEach(System.out::println);
}
写DSL的时候最好结合着返回的结构来写会比较清晰:
聚合前加查询条件,限定聚合数据:
需求:查询深圳的所有酒店,根据品牌分组
@Test //分组聚合
public void testBuckert() throws IOException {
//1. 准备request
SearchRequest searchRequest = new SearchRequest("hotel");
searchRequest.source().size(0); // 设置为0 不需要文档数据
//2. 准备DSL
searchRequest.source().query(QueryBuilders.matchQuery("city","深圳"));
searchRequest.source()
.aggregation(AggregationBuilders.terms("brandTerm") //自定义 agg 名称
.field("brand")
.size(10));
//3. 发送请求
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(response);
}
本文来自博客园,作者:chuangzhou,转载请注明原文链接:https://www.cnblogs.com/czzz/p/17738665.html