谷粒学院高级-118、全文检索-ElasticSearch-进阶-aggregations聚合分析
1、首先程序代码中心需要部署导入基础数据
https://github.com/NiceSeason/gulimall-learning/blob/master/docs/%E8%B0%B7%E7%B2%92%E5%95%86%E5%9F%8E%E2%80%94%E5%88%86%E5%B8%83%E5%BC%8F%E9%AB%98%E7%BA%A7.md
https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json ,导入测试数据,
POST bank/account/_bulk
(9)Aggregation(执行聚合)
聚合提供了从数据中分组和提取数据的能力。最简单的聚合方法大致等于SQL Group by和SQL聚合函数。在elasticsearch中,执行搜索返回this(命中结果),并且同时返回聚合结果,把以响应中的所有hits(命中结果)分隔开的能力。这是非常强大且有效的,你可以执行查询和多个聚合,并且在一次使用中得到各自的(任何一个的)返回结果,使用一次简洁和简化的API啦避免网络往返。
"size":0
size:0不显示搜索数据 aggs:执行聚合。聚合语法如下:
整个目录结构如下
我们来看下查询语句搜索地址中包含工厂的年龄分布以及平均年龄,但不显示这些人的详情
GET bank/_search { "query": { "match": { "address": "Mill" } }, "aggs": { "ageAgg": { "terms": { "field": "age", "size": 10 } }, "ageAvg": { "avg": { "field": "age" } }, "balanceAvg": { "avg": { "field": "balance" } } }, "size": 0 }
查询的结果为,其中ageAgg为聚合新起的名称, "field": "age",为聚合的字段,term为精确匹配,不清楚的看上一篇博客,"size": 10表示年龄的信息有很多中,这里选择10个
{ "took" : 34, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "ageAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 38, "doc_count" : 2 }, { "key" : 28, "doc_count" : 1 }, { "key" : 32, "doc_count" : 1 } ] }, "ageAvg" : { "value" : 34.0 }, "balanceAvg" : { "value" : 25208.0 } } }
年龄为38的有2条记录,年假为28的有1条记录
上面查询的时候指定了
"size": 0
整个属性,如果不指定,除了聚合的数据之后,还会显示原始的数据值
GET bank/_search { "query": { "match": { "address": "Mill" } }, "aggs": { "ageAgg": { "terms": { "field": "age", "size": 10 } }, "ageAvg": { "avg": { "field": "age" } }, "balanceAvg": { "avg": { "field": "balance" } } } }
我们来查看下结果
{ "took" : 6, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : 5.4042025, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "970", "_score" : 5.4042025, "_source" : { "account_number" : 970, "balance" : 19648, "firstname" : "Forbes", "lastname" : "Wallace", "age" : 28, "gender" : "M", "address" : "990 Mill Road", "employer" : "Pheast", "email" : "forbeswallace@pheast.com", "city" : "Lopezo", "state" : "AK" } }, { "_index" : "bank", "_type" : "account", "_id" : "136", "_score" : 5.4042025, "_source" : { "account_number" : 136, "balance" : 45801, "firstname" : "Winnie", "lastname" : "Holland", "age" : 38, "gender" : "M", "address" : "198 Mill Lane", "employer" : "Neteria", "email" : "winnieholland@neteria.com", "city" : "Urie", "state" : "IL" } }, { "_index" : "bank", "_type" : "account", "_id" : "345", "_score" : 5.4042025, "_source" : { "account_number" : 345, "balance" : 9812, "firstname" : "Parker", "lastname" : "Hines", "age" : 38, "gender" : "M", "address" : "715 Mill Avenue", "employer" : "Baluba", "email" : "parkerhines@baluba.com", "city" : "Blackgum", "state" : "KY" } }, { "_index" : "bank", "_type" : "account", "_id" : "472", "_score" : 5.4042025, "_source" : { "account_number" : 472, "balance" : 25571, "firstname" : "Lee", "lastname" : "Long", "age" : 32, "gender" : "F", "address" : "288 Mill Street", "employer" : "Comverges", "email" : "leelong@comverges.com", "city" : "Movico", "state" : "MT" } } ] }, "aggregations" : { "ageAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 38, "doc_count" : 2 }, { "key" : 28, "doc_count" : 1 }, { "key" : 32, "doc_count" : 1 } ] }, "ageAvg" : { "value" : 34.0 }, "balanceAvg" : { "value" : 25208.0 } } }
复杂度:按照年龄聚合,并且要求这些人的平均年龄
GET bank/_search { "query": { "match_all": {} }, "aggs": { "ageAgg": { "terms": { "field": "age", "size": 100 }, "aggs": { "ageAvg": { "avg": { "field": "balance" } } } } }, "size": 0 }
查询的结果如下
{ "took" : 88, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1001, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "ageAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 31, "doc_count" : 61, "ageAvg" : { "value" : 28312.918032786885 } }, { "key" : 39, "doc_count" : 60, "ageAvg" : { "value" : 25269.583333333332 } }, { "key" : 26, "doc_count" : 59, "ageAvg" : { "value" : 23194.813559322032 } }, { "key" : 32, "doc_count" : 53, "ageAvg" : { "value" : 24239.528301886792 } }, { "key" : 35, "doc_count" : 52, "ageAvg" : { "value" : 22136.69230769231 } }, { "key" : 36, "doc_count" : 52, "ageAvg" : { "value" : 22174.71153846154 } }, { "key" : 22, "doc_count" : 51, "ageAvg" : { "value" : 24731.07843137255 } }, { "key" : 28, "doc_count" : 51, "ageAvg" : { "value" : 28273.882352941175 } }, { "key" : 33, "doc_count" : 50, "ageAvg" : { "value" : 25093.94 } }, { "key" : 34, "doc_count" : 49, "ageAvg" : { "value" : 26809.95918367347 } }, { "key" : 30, "doc_count" : 47, "ageAvg" : { "value" : 22841.106382978724 } }, { "key" : 21, "doc_count" : 46, "ageAvg" : { "value" : 26981.434782608696 } }, { "key" : 40, "doc_count" : 45, "ageAvg" : { "value" : 27183.17777777778 } }, { "key" : 20, "doc_count" : 44, "ageAvg" : { "value" : 27741.227272727272 } }, { "key" : 23, "doc_count" : 42, "ageAvg" : { "value" : 27314.214285714286 } }, { "key" : 24, "doc_count" : 42, "ageAvg" : { "value" : 28519.04761904762 } }, { "key" : 25, "doc_count" : 42, "ageAvg" : { "value" : 27445.214285714286 } }, { "key" : 37, "doc_count" : 42, "ageAvg" : { "value" : 27022.261904761905 } }, { "key" : 27, "doc_count" : 39, "ageAvg" : { "value" : 21471.871794871793 } }, { "key" : 38, "doc_count" : 39, "ageAvg" : { "value" : 26187.17948717949 } }, { "key" : 29, "doc_count" : 35, "ageAvg" : { "value" : 29483.14285714286 } } ] } } }
年龄为28的一个有35条记录,这35个人的平均工资为29483.14285714286
查出所有年龄分布,并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资
代码如下
GET bank/_search { "query": { "match_all": {} }, "aggs": { "ageAgg": { "terms": { "field": "age", "size": 100 }, "aggs": { "genderAgg": { "terms": { "field": "gender.keyword" }, "aggs": { "balanceAvg": { "avg": { "field": "balance" } } } }, "ageBalanceAvg": { "avg": { "field": "balance" } } } } }, "size": 0 }
运行结果为
{ "took" : 89, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1001, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "ageAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 31, "doc_count" : 61, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 35, "balanceAvg" : { "value" : 29565.628571428573 } }, { "key" : "F", "doc_count" : 26, "balanceAvg" : { "value" : 26626.576923076922 } } ] }, "ageBalanceAvg" : { "value" : 28312.918032786885 } }, { "key" : 39, "doc_count" : 60, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "F", "doc_count" : 38, "balanceAvg" : { "value" : 26348.684210526317 } }, { "key" : "M", "doc_count" : 22, "balanceAvg" : { "value" : 23405.68181818182 } } ] }, "ageBalanceAvg" : { "value" : 25269.583333333332 } }, { "key" : 26, "doc_count" : 59, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 32, "balanceAvg" : { "value" : 25094.78125 } }, { "key" : "F", "doc_count" : 27, "balanceAvg" : { "value" : 20943.0 } } ] }, "ageBalanceAvg" : { "value" : 23194.813559322032 } }, { "key" : 32, "doc_count" : 53, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 29, "balanceAvg" : { "value" : 23503.44827586207 } }, { "key" : "F", "doc_count" : 24, "balanceAvg" : { "value" : 25128.958333333332 } } ] }, "ageBalanceAvg" : { "value" : 24239.528301886792 } }, { "key" : 35, "doc_count" : 52, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 28, "balanceAvg" : { "value" : 24226.321428571428 } }, { "key" : "F", "doc_count" : 24, "balanceAvg" : { "value" : 19698.791666666668 } } ] }, "ageBalanceAvg" : { "value" : 22136.69230769231 } }, { "key" : 36, "doc_count" : 52, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 31, "balanceAvg" : { "value" : 20884.677419354837 } }, { "key" : "F", "doc_count" : 21, "balanceAvg" : { "value" : 24079.04761904762 } } ] }, "ageBalanceAvg" : { "value" : 22174.71153846154 } }, { "key" : 22, "doc_count" : 51, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "F", "doc_count" : 27, "balanceAvg" : { "value" : 22152.74074074074 } }, { "key" : "M", "doc_count" : 24, "balanceAvg" : { "value" : 27631.708333333332 } } ] }, "ageBalanceAvg" : { "value" : 24731.07843137255 } }, { "key" : 28, "doc_count" : 51, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "F", "doc_count" : 31, "balanceAvg" : { "value" : 27076.8064516129 } }, { "key" : "M", "doc_count" : 20, "balanceAvg" : { "value" : 30129.35 } } ] }, "ageBalanceAvg" : { "value" : 28273.882352941175 } }, { "key" : 33, "doc_count" : 50, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "F", "doc_count" : 26, "balanceAvg" : { "value" : 26437.615384615383 } }, { "key" : "M", "doc_count" : 24, "balanceAvg" : { "value" : 23638.291666666668 } } ] }, "ageBalanceAvg" : { "value" : 25093.94 } }, { "key" : 34, "doc_count" : 49, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "F", "doc_count" : 30, "balanceAvg" : { "value" : 26039.166666666668 } }, { "key" : "M", "doc_count" : 19, "balanceAvg" : { "value" : 28027.0 } } ] }, "ageBalanceAvg" : { "value" : 26809.95918367347 } }, { "key" : 30, "doc_count" : 47, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "F", "doc_count" : 25, "balanceAvg" : { "value" : 25316.16 } }, { "key" : "M", "doc_count" : 22, "balanceAvg" : { "value" : 20028.545454545456 } } ] }, "ageBalanceAvg" : { "value" : 22841.106382978724 } }, { "key" : 21, "doc_count" : 46, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "F", "doc_count" : 24, "balanceAvg" : { "value" : 28210.916666666668 } }, { "key" : "M", "doc_count" : 22, "balanceAvg" : { "value" : 25640.18181818182 } } ] }, "ageBalanceAvg" : { "value" : 26981.434782608696 } }, { "key" : 40, "doc_count" : 45, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 24, "balanceAvg" : { "value" : 26474.958333333332 } }, { "key" : "F", "doc_count" : 21, "balanceAvg" : { "value" : 27992.571428571428 } } ] }, "ageBalanceAvg" : { "value" : 27183.17777777778 } }, { "key" : 20, "doc_count" : 44, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 27, "balanceAvg" : { "value" : 29047.444444444445 } }, { "key" : "F", "doc_count" : 17, "balanceAvg" : { "value" : 25666.647058823528 } } ] }, "ageBalanceAvg" : { "value" : 27741.227272727272 } }, { "key" : 23, "doc_count" : 42, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 24, "balanceAvg" : { "value" : 27730.75 } }, { "key" : "F", "doc_count" : 18, "balanceAvg" : { "value" : 26758.833333333332 } } ] }, "ageBalanceAvg" : { "value" : 27314.214285714286 } }, { "key" : 24, "doc_count" : 42, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "F", "doc_count" : 23, "balanceAvg" : { "value" : 29414.521739130436 } }, { "key" : "M", "doc_count" : 19, "balanceAvg" : { "value" : 27435.052631578947 } } ] }, "ageBalanceAvg" : { "value" : 28519.04761904762 } }, { "key" : 25, "doc_count" : 42, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 23, "balanceAvg" : { "value" : 29336.08695652174 } }, { "key" : "F", "doc_count" : 19, "balanceAvg" : { "value" : 25156.263157894737 } } ] }, "ageBalanceAvg" : { "value" : 27445.214285714286 } }, { "key" : 37, "doc_count" : 42, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 23, "balanceAvg" : { "value" : 25015.739130434784 } }, { "key" : "F", "doc_count" : 19, "balanceAvg" : { "value" : 29451.21052631579 } } ] }, "ageBalanceAvg" : { "value" : 27022.261904761905 } }, { "key" : 27, "doc_count" : 39, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "F", "doc_count" : 21, "balanceAvg" : { "value" : 21618.85714285714 } }, { "key" : "M", "doc_count" : 18, "balanceAvg" : { "value" : 21300.38888888889 } } ] }, "ageBalanceAvg" : { "value" : 21471.871794871793 } }, { "key" : 38, "doc_count" : 39, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "F", "doc_count" : 20, "balanceAvg" : { "value" : 27931.65 } }, { "key" : "M", "doc_count" : 19, "balanceAvg" : { "value" : 24350.894736842107 } } ] }, "ageBalanceAvg" : { "value" : 26187.17948717949 } }, { "key" : 29, "doc_count" : 35, "genderAgg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "M", "doc_count" : 23, "balanceAvg" : { "value" : 29943.17391304348 } }, { "key" : "F", "doc_count" : 12, "balanceAvg" : { "value" : 28601.416666666668 } } ] }, "ageBalanceAvg" : { "value" : 29483.14285714286 } } ] } } }
GET bank/_search{ "query": { "match_all": {} }, "aggs": { "ageAgg": { "terms": { "field": "age", "size": 100 }, "aggs": { "genderAgg": { "terms": { "field": "gender.keyword" }, "aggs": { "balanceAvg": { "avg": { "field": "balance" } } } }, "ageBalanceAvg": { "avg": { "field": "balance" } } } } }, "size": 0}
posted on 2022-07-14 15:10 luzhouxiaoshuai 阅读(150) 评论(0) 编辑 收藏 举报