ES - 聚合

聚合的种类

聚合(aggregations) 可以实现对文档数据的统计、分析、运算。聚合常见的有三类:

  • 桶(Bucket) 聚合: 用来对文档做分组

    • TermAggregation: 按照文档字段值分组
    • Date Histogram: 按照日期阶梯分组,例如一周为一组,或者一月为一组
  • 度量(Metric)聚合: 用以计算一些值,比如:最大值、最小值、平均值等

    • Avg:求平均值
    • Max:求最大值
    • Min:求最小值
    • Stats: 同时求max、min、avg、sum等
  • 管道(pipeline) 聚合: 其它聚合的结果为基础做聚合

参与聚合的字段类型必须是:

  • keyword
  • 数值
  • 日期
  • 布尔

Bucket 聚合

需求:根据品牌进行分组聚合后按照数量升序排序

默认情况下吗,Bucket聚合会统计Bucket内的文档数量,记为_count, 按照_count 降序排序

GET /hotel/_search
{
  "size":0,  //不查看文档数据
  "aggs": {  //定义聚合
    "brandAgg": { //给聚合起个名字
       "terms": { //聚合的类型
        "field": "brand",  //参与聚合的字段
        "size": 10,  //希望获取的聚合结果数量
        "order": {
          "_count": "asc"  //根据数量升序排序
        }
      }
    }
  }
}

结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 201,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "brandAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 130,
      "buckets" : [
        {
          "key" : "万丽",
          "doc_count" : 2
        },
        {
          "key" : "丽笙",
          "doc_count" : 2
        },
        {
          "key" : "君悦",
          "doc_count" : 4
        },
        {
          "key" : "豪生",
          "doc_count" : 6
        },
        {
          "key" : "维也纳",
          "doc_count" : 7
        },
        {
          "key" : "凯悦",
          "doc_count" : 8
        },
        {
          "key" : "希尔顿",
          "doc_count" : 10
        },
        {
          "key" : "汉庭",
          "doc_count" : 10
        },
        {
          "key" : "万豪",
          "doc_count" : 11
        },
        {
          "key" : "喜来登",
          "doc_count" : 11
        }
      ]
    }
  }
}

可以同时集合query , 限定聚合范围,以免参与聚合的数据量过大

GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "lt": 300
      }
    }
  }, 
  "size":0,
  "aggs": {
    "brandAgg": {
       "terms": {
        "field": "brand",
        "size": 10,
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}

Metrics 聚合

需求:获取每个品牌的用户评分的min、max、avg等值,并根据avg 降序排序

GET /hotel/_search
{
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 10,
        "order": {
          "scoreAggs.avg": "desc"
        }
      },
      "aggs": {  //是brandAgg 聚合的子聚合,注意与terms 平级
        "scoreAggs": { //子聚合的名称
          "stats": { //聚合类型,这里stats可以计算min、max、avg等
            "field": "score"  // 聚合字段,这里只是score
          }
        }
      }
    }
  }
}

响应:

{
  "took" : 36,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 201,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "brandAgg" : {
      "doc_count_error_upper_bound" : -1,
      "sum_other_doc_count" : 166,
      "buckets" : [
        {
          "key" : "万丽",
          "doc_count" : 2,
          "scoreAggs" : {
            "count" : 2,
            "min" : 46.0,
            "max" : 47.0,
            "avg" : 46.5,
            "sum" : 93.0
          }
        },
        {
          "key" : "凯悦",
          "doc_count" : 8,
          "scoreAggs" : {
            "count" : 8,
            "min" : 45.0,
            "max" : 47.0,
            "avg" : 46.25,
            "sum" : 370.0
          }
        },
        {
          "key" : "和颐",
          "doc_count" : 12,
          "scoreAggs" : {
            "count" : 12,
            "min" : 44.0,
            "max" : 47.0,
            "avg" : 46.083333333333336,
            "sum" : 553.0
          }
        },
        {
          "key" : "丽笙",
          "doc_count" : 2,
          "scoreAggs" : {
            "count" : 2,
            "min" : 46.0,
            "max" : 46.0,
            "avg" : 46.0,
            "sum" : 92.0
          }
        },
        {
          "key" : "喜来登",
          "doc_count" : 11,
          "scoreAggs" : {
            "count" : 11,
            "min" : 44.0,
            "max" : 48.0,
            "avg" : 46.0,
            "sum" : 506.0
          }
        }
      ]
    }
  }
}

RestClinet 实现聚合

@Test  //分组聚合
public void testBuckert() throws IOException {

    //1. 准备request
    SearchRequest searchRequest = new SearchRequest("hotel");
    searchRequest.source().size(0);  // 设置为0 不需要文档数据
    //2. 准备DSL
    searchRequest.source()
            .aggregation(AggregationBuilders.terms("brandTerm") //自定义agg名称
            .field("brand")
            .size(10));
    //3. 发送请求
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
    //4. 解析响应结果
    Aggregations aggregations = response.getAggregations(); 

//        List<? extends Terms.Bucket> buckets = terms.getBuckets();
//        for (Terms.Bucket bucket : buckets) {
//            System.out.println(bucket.getKeyAsString());
//        }
    //brandAgg 
    Terms terms = aggregations.get("brandTerm"); // 根据自定义的agg名称获得term类型的聚合结果
    
    terms.getBuckets().stream().map(Terms.Bucket::getKeyAsString).forEach(System.out::println);
}

写DSL的时候最好结合着返回的结构来写会比较清晰:

聚合前加查询条件,限定聚合数据:

需求:查询深圳的所有酒店,根据品牌分组

@Test  //分组聚合
public void testBuckert() throws IOException {

    //1. 准备request
    SearchRequest searchRequest = new SearchRequest("hotel");
    searchRequest.source().size(0);  // 设置为0 不需要文档数据

    //2. 准备DSL

    searchRequest.source().query(QueryBuilders.matchQuery("city","深圳"));

    searchRequest.source()
            .aggregation(AggregationBuilders.terms("brandTerm") //自定义 agg 名称
            .field("brand")
            .size(10));

    //3. 发送请求
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
    System.out.println(response);
}
posted @ 2023-10-01 11:07  chuangzhou  阅读(8)  评论(0编辑  收藏  举报