Elastic_Terms 内容分类统计
Terms 按字段的值进行分类,并计算出doc_count,
bucket聚合 类似于 group by
常用统计 分类并出现频率高的,并进一步挖出,计算出想要的数据。
参考资料
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
1.批量插入数据
curl -XPOST 127.0.0.1:9200/cars/transactions/_bulk --data-binary @cars.json
{ "index": {}} { "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" } { "index": {}} { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" } { "index": {}} { "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" } { "index": {}} { "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" } { "index": {}} { "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" } { "index": {}} { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" } { "index": {}} { "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" } { "index": {}} { "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
2.哪种颜色的车卖的最好
http://192.168.1.10:9200/cars/
_search?search_type=count //并不关心搜索结果,只关心统计,使用的search_type是count
1 { "aggs": { 3 "color": { 4 "terms": { //定义了一个terms类型的桶,它针对color字段聚合,terms桶会动态地为每一个它遇到的不重复的词条创建一个新的桶 5 "field": "color", 6 "size": 50, //返回结果大小 7 "min_doc_count": 1, //控制最小计数 大于1才显示 8 "order": {"_count": "asc" } //排序方式 11 } 12 } 13 } 14 }
//每个桶中的key对应的是在color字段中找到的不重复的词条。它同时也包含了一个doc_count,用来表示包含了该词条的文档数量。
//响应包含了一个桶列表,每个桶都对应着一个不重复的颜色(比如,红色或者绿色)。每个桶也包含了“掉入”该桶中的文档数量。比如,有4辆红色的车
3.每种颜色汽车的平均价格是多少?
{ "aggs": { "color": { "terms": { "field": "color", "size": 50, "min_doc_count": 1, "order": { "avg_price": "asc" } //按平均价格排序 }, "aggs": { //添加了一个新的aggs层级(聚合层)avg 指标嵌套在terms桶中,每种颜色都计算一个平均值 "avg_price": { "avg": { "field": "price" } } } } } }
返回每个颜色汽车的个数及平均价格
4.每种颜色的汽车的制造商分布信息?
{"aggs": { "color": { "terms": { "field": "color", "size": 50, "min_doc_count": 1, "order": {"avg_price": "asc" } }, "aggs": { "avg_price": { "avg": { "field": "price" } }, "make": { //添加了新聚合make,它是一个terms类型的桶(嵌套在名为colors的terms桶中)。这意味着会根据数据集创建不重复的(color, make)组合 "terms": { "field": "make" } } } } } }
4.再添加 每个制造商 最低和最高价格?
{ "aggs": { "color": { "terms": { "field": "color", "size": 50, "min_doc_count": 1, "order": { "avg_price": "asc" } }, "aggs": { "avg_price": { "avg": { "field": "price" } }, "make": { "terms": { "field": "make"}, "aggs": { "min_price": { "min": { "field": "price" } }, "max_price": { "max": { "field": "price" } } } } } } } }
5.再添加 每个制造商 价格列表?
{ "aggs": { "color": { "terms": { "field": "color", "size": 50, "min_doc_count": 1, "order": { "avg_price": "asc" } }, "aggs": { "avg_price": { "avg": { "field": "price" } }, "make": { "terms": { "field": "make" }, "aggs": { "price": { "terms": { "field": "price" } }, "min_price": { "min": { "field": "price" } }, "max_price": { "max": { "field": "price" } } } } } } } }