ES 常用聚合函数
关于常用聚合函数,ES提供了很多,具体查看官方文档,本文在ES 聚合查询的基础上,相关测试数据也在ES 聚合查询中.
1、range聚合
1.1 统计各个价格范围的食品销售情况,代码如下:
GET food/_search?size=0 { "aggs": { "price_range": { "range": { "field": "Price", "ranges": [ { "from": 0, "to": 100 }, { "from": 100, "to": 200 }, { "from": 200, "to": 300 }, { "from": 300, "to": 400 } ] } } } }
1.2 统计每个月的食品销售情况
GET food/_search?size=0 { "aggs": { "price_range": { "range": { "field": "CreateTime", "ranges": [ { "from": "2022-05-01 00:00:00", "to": "2022-06-01 00:00:00" }, { "from": "2022-06-01 00:00:00", "to": "2022-07-01 00:00:00" }, { "from": "2022-07-01 00:00:00", "to": "2022-08-01 00:00:00" }, { "from": "2022-08-01 00:00:00", "to": "2022-09-01 00:00:00" } ] } } } }
2、Histogram 柱状图统计 官方文档
2.1 统计各个价位区间的食品销售数量 间隔是100效果和1.1类似
GET food/_search?size=0 { "aggs": { "price_histogram": { "histogram": { "field": "Price", "interval": 100 } } } }
2.2 统计各个价位区间的食品销售数量 间隔是100 要求过滤掉所有区间能销售量为0的桶结果
GET food/_search?size=0 { "aggs": { "price_histogram": { "histogram": { "field": "Price", "interval": 100, "min_doc_count": 1 } } } }
2.3 统计各个价位区间的食品销售数量 间隔是100 如果区间内存在空值,统一用250替代
这里需要新增一条价格为空的数据方便演示,代码如下:
PUT food/_doc/8 { "CreateTime":"2022-04-10 13:11:11", "Desc":"猕猴桃 对身体很有好处", "Level":"高级水果", "Name":"猕猴桃", "Price":"", "Tags":["性价比","水果","保健"], "Type":"水果" }
查询代码如下:
GET food/_search?size=0 { "aggs": { "price_histogram": { "histogram": { "field": "Price", "interval": 100, "missing": 250 } } } }
搜索结果如下:
{ "took" : 106, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "price_histogram" : { "buckets" : [ { "key" : 0.0, "doc_count" : 4 }, { "key" : 100.0, "doc_count" : 2 }, { "key" : 200.0, "doc_count" : 1 }, { "key" : 300.0, "doc_count" : 1 } ] } } }
注意原先结果里面200-300区间是没有数据的,这个时候插入了一条价格为空的数据,且制定了miss条件为250,es会将所有的价格为空的值用250替换,所以结果中200-300范围的count为1.
2.4 key关键字
这里key关键字的用法只是改变了桶聚合值得展示形式通过key value形式展示,这里不在赘述.
3、Date-Histogram 官方文档
3.1 按照日期进行聚合,统计每个月所有食品得销量
GET food/_search?size=0 { "aggs": { "adate_histogram": { "date_histogram": { "field": "CreateTime", "calendar_interval": "month", //每隔一个月进行统计 "format": "yyyy-MM", //日期展示按照 年月展示, "min_doc_count": 1 //过滤掉count为0的数据 } } } }
这里用calendar_interval做时间间隔,但是需要注意其支持的单位如下:minute=>
1m,
hour=>
1h,
day=>
1d,
week=>
1w,
month=>
1M,
quarter=>
1q,
year=>
1y 最小支持到分钟,最大支持到年.
3.2 按照时间进行聚合,统计没毫秒所有食品得销量
GET food/_search?size=0 { "aggs": { "adate_histogram": { "date_histogram": { "field": "CreateTime", "fixed_interval": "1ms", //每隔1毫秒进行统计 "min_doc_count": 1 //过滤掉count为0的数据 } } } }
这里用fixed_interval做时间间隔,但是需要注意其支持的单位如下:ms,s,m,h,d 最小支持到毫秒,最大支持到天.
这里有个严重的问题,使用毫秒进行分桶时,会造成es检索出大量数据,造成es卡死,写入收到严重影响,所以要慎用,使用前必须用query或者filter等等进行时间限制
3.3 统计今年一年内的每个月食品的销售情况
这里注意,上面的按照月份统计,如果1月份没有数据,es进行分桶时并不会展示1月份的数据,那么显然不符合需求,所以需要让1月份以0显示出来,代码如下
GET food/_search?size=0 { "aggs": { "adate_histogram": { "date_histogram": { "field": "CreateTime", "calendar_interval": "month", //每隔一个月进行统计 "min_doc_count": 0, "extended_bounds": { "min": "2022-01-01 00:00:00", "max": "2022-12-31 00:00:00" } } } } }
搜索结果如下:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "adate_histogram" : { "buckets" : [ { "key_as_string" : "2022-01-01 00:00:00", "key" : 1640995200000, "doc_count" : 0 }, { "key_as_string" : "2022-02-01 00:00:00", "key" : 1643673600000, "doc_count" : 0 }, { "key_as_string" : "2022-03-01 00:00:00", "key" : 1646092800000, "doc_count" : 0 }, { "key_as_string" : "2022-04-01 00:00:00", "key" : 1648771200000, "doc_count" : 1 }, { "key_as_string" : "2022-05-01 00:00:00", "key" : 1651363200000, "doc_count" : 0 }, { "key_as_string" : "2022-06-01 00:00:00", "key" : 1654041600000, "doc_count" : 3 }, { "key_as_string" : "2022-07-01 00:00:00", "key" : 1656633600000, "doc_count" : 4 }, { "key_as_string" : "2022-08-01 00:00:00", "key" : 1659312000000, "doc_count" : 0 }, { "key_as_string" : "2022-09-01 00:00:00", "key" : 1661990400000, "doc_count" : 0 }, { "key_as_string" : "2022-10-01 00:00:00", "key" : 1664582400000, "doc_count" : 0 }, { "key_as_string" : "2022-11-01 00:00:00", "key" : 1667260800000, "doc_count" : 0 }, { "key_as_string" : "2022-12-01 00:00:00", "key" : 1669852800000, "doc_count" : 0 } ] } } }
这里结果就是按照1月份到12月份,按照每个月份进行数量的统计.
注意:这里extended_bounds和min_doc_count的参数的混合使用,当使用extended_bounds进行间隔空白填充时,min_doc_count必须为0,上面说了min_doc_count是为了过滤count为0的风筒,如果min_doc_count为1就会过滤掉extended_bounds产生的空白填充,这就自相矛盾了.
3.4 统计今年一年内的每个月食品的销售情况,并按每个月的销售数量进行排序
GET food/_search?size=0 { "aggs": { "adate_histogram": { "date_histogram": { "field": "CreateTime", "calendar_interval": "month", //每隔一个月进行统计 "min_doc_count": 0, "extended_bounds": { "min": "2022-01-01 00:00:00", "max": "2022-12-31 00:00:00" }, "order": { "_count": "asc" } } } } }
3.5 统计今年每个月的食物的销售数量,并且计算每个月相对于上一个月的累计值(商品价格的叠加)
GET food/_search?size=0 { "aggs": { "date_histogram": { "date_histogram": { "field": "CreateTime", "calendar_interval": "month", //每隔一个月进行统计 "min_doc_count": 0, "extended_bounds": { "min": "2022-01-01 00:00:00", "max": "2022-12-31 00:00:00" } }, "aggs": { "sum_agg": { "sum": { "field": "Price" } }, "cumulative_sum_agg":{ "cumulative_sum": { "buckets_path": "sum_agg" } } } } } }
搜索结果如下:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "date_histogram" : { "buckets" : [ { "key_as_string" : "2022-01-01 00:00:00", "key" : 1640995200000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 0.0 } }, { "key_as_string" : "2022-02-01 00:00:00", "key" : 1643673600000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 0.0 } }, { "key_as_string" : "2022-03-01 00:00:00", "key" : 1646092800000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 0.0 } }, { "key_as_string" : "2022-04-01 00:00:00", "key" : 1648771200000, "doc_count" : 1, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 0.0 } }, { "key_as_string" : "2022-05-01 00:00:00", "key" : 1651363200000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 0.0 } }, { "key_as_string" : "2022-06-01 00:00:00", "key" : 1654041600000, "doc_count" : 3, "sum_agg" : { "value" : 89.32999992370605 }, "cumulative_sum_agg" : { "value" : 89.32999992370605 } }, { "key_as_string" : "2022-07-01 00:00:00", "key" : 1656633600000, "doc_count" : 4, "sum_agg" : { "value" : 511.43998622894287 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } }, { "key_as_string" : "2022-08-01 00:00:00", "key" : 1659312000000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } }, { "key_as_string" : "2022-09-01 00:00:00", "key" : 1661990400000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } }, { "key_as_string" : "2022-10-01 00:00:00", "key" : 1664582400000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } }, { "key_as_string" : "2022-11-01 00:00:00", "key" : 1667260800000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } }, { "key_as_string" : "2022-12-01 00:00:00", "key" : 1669852800000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } } ] } } }
从结果就可以看出,在计算出每个月销量的同时,计算出了每个月的销售额,并且通过cumulative_sum计算除了当前月份和前面所有月份的累计销售额.
4、Auto-interval date histogram 官方文档
自动直方图,自动直方图会按照指定的桶数量去计算interval,在某些场景下使用还是用方便的,比如统计今年每个月的食物的销售情况,就可以指定桶数量为180,代码如下:
GET food/_search?size=0 { "aggs": { "auto_date_histogram_aggs": { "auto_date_histogram": { "field": "CreateTime", "buckets":"12", "format":"yyyy-MM-dd " } } } }
结果如下:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "auto_date_histogram_aggs" : { "buckets" : [ { "key_as_string" : "2022-04-01", "key" : 1648771200000, "doc_count" : 1 }, { "key_as_string" : "2022-05-01", "key" : 1651363200000, "doc_count" : 0 }, { "key_as_string" : "2022-06-01", "key" : 1654041600000, "doc_count" : 3 }, { "key_as_string" : "2022-07-01", "key" : 1656633600000, "doc_count" : 4 } ], "interval" : "1M" } } }
注意结果中Interval为1M,就是es根据桶数量自动推算出来的.
4、Percentiles 饼图统计
指定百分比计算值的范围,分别统计百分之20、百分之40、百分之60、百分之80、百分之99的商品的价格在什么值
GET food/_search?size=0 { "aggs": { "percentiles_agg": { "percentiles": { "field": "Price", "percents": [ 20, 40, 60, 80, 99 ] } } } }
结果如下:
{ "took" : 29, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "percentiles_agg" : { "values" : { "20.0" : 11.109999656677246, "40.0" : 28.309999942779555, "60.0" : 89.91000061035156, "80.0" : 120.10999908447278, "99.0" : 300.1099853515625 } } } }
结果显示百分之20的商品价格在11以内,百分之40的价格在28以内,百分之99的价格在300以内.
常用于计算接口的可靠性,假设接口相应在100ms以内,算合格,那么这里的百分之99对应的值,必须在100以内,才算达标,以此类推.
5、Percentile ranks 饼图统计
这个和Percentiles相反,两者都是饼图统计的一种,它可以计算指定范围所占的百分比,而Percentiles指定百分比计算范围
GET food/_search?size=0 { "aggs": { "percentile_ranks_agg": { "percentile_ranks": { "field": "Price", "values": [ 100, 200, 300, 400 ] } } } }
结果如下:
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "percentile_ranks_agg" : { "values" : { "100.0" : 71.33613394088104, "200.0" : 85.69857243009302, "300.0" : 100.0, "400.0" : 100.0 } } } }
这里就计算出100以内占百分之71.....
到这里结束,其余查阅官方文档.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 【设计模式】告别冗长if-else语句:使用策略模式优化代码结构