ES系列十四、ES聚合分析（聚合分析简介、指标聚合、桶聚合）

一、聚合分析简介

1. ES聚合分析是什么？

聚合分析是数据库中重要的功能特性，完成对一个查询的数据集中数据的聚合计算，如：找出某字段（或计算表达式的结果）的最大值、最小值，计算和、平均值等。ES作为搜索引擎兼数据库，同样提供了强大的聚合分析能力。

对一个数据集求最大、最小、和、平均值等指标的聚合，在ES中称为指标聚合 metric

而关系型数据库中除了有聚合函数外，还可以对查询出的数据进行分组group by，再在组上进行指标聚合。在 ES 中group by 称为分桶，桶聚合 bucketing

ES中还提供了矩阵聚合（matrix）、管道聚合（pipleline），但还在完善中。

2. ES聚合分析查询的写法

在查询请求体中以aggregations节点按如下语法定义聚合分析：

"aggregations" : {
    "<aggregation_name>" : { <!--聚合的名字 -->
        "<aggregation_type>" : { <!--聚合的类型 -->
            <aggregation_body> <!--聚合体：对哪些字段进行聚合 -->
        }
        [,"meta" : {  [<meta_data_body>] } ]? <!--元 -->
        [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合 -->
    }
    [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->
}

说明：

aggregations 也可简写为 aggs

3. 聚合分析的值来源

聚合计算的值可以取字段的值，也可是脚本计算的结果。

二、指标聚合

1. max min sum avg

示例1：查询所有记录中年龄的最大值

POST /book1/_search?pretty

{
  "size": 0, 
  "aggs": {
    "maxage": {
      "max": {
        "field": "age"
      }
    }
  }
}

结果1：

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "maxage": {
            "value": 54
        }
    }
}

示例2：加上查询条件，查询名字包含'test'的年龄最大值：

POST /book1/_search?pretty

{
  "query":{
     "term":{
         "name":"test"
     }    
  },
  "size": 2, 
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ],
  "aggs": {
    "maxage": {
      "max": {
        "field": "age"
      }
    }
  }
}

结果2：

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": null,
        "hits": [
            {
                "_index": "book1",
                "_type": "english",
                "_id": "6IUkUmUBRzBxBrDgFok2",
                "_score": null,
                "_source": {
                    "name": "test goog my money",
                    "age": [
                        14,
                        54,
                        45,
                        34
                    ],
                    "class": "dsfdsf",
                    "addr": "中国"
                },
                "sort": [
                    54
                ]
            },
            {
                "_index": "book1",
                "_type": "english",
                "_id": "54UiUmUBRzBxBrDgfIl9",
                "_score": null,
                "_source": {
                    "name": "test goog my money",
                    "age": [
                        11,
                        13,
                        14
                    ],
                    "class": "dsfdsf",
                    "addr": "中国"
                },
                "sort": [
                    14
                ]
            }
        ]
    },
    "aggregations": {
        "maxage": {
            "value": 54
        }
    }
}

示例3：值来源于脚本，查询所有记录的平均年龄是多少，并对平均年龄加10

POST /book1/_search?pretty
{
  "size":0,
  "aggs": {
    "avg_age": {
      "avg": {
        "script": {
          "source": "doc.age.value"
        }
      }
    },
    "avg_age10": {
      "avg": {
        "script": {
          "source": "doc.age.value + 10"
        }
      }
    }
  }
}

结果3：

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "avg_age": {
            "value": 7.585365853658536
        },
        "avg_age10": {
            "value": 17.585365853658537
        }
    }
}

示例4：指定field，在脚本中用_value 取字段的值

POST  /book1/_search?pretty
{
  "size":0,
  "aggs": {
    "sun_age": {
      "sum": {
          "field":"age",
        "script": {
          "source": "_value * 2"
        }
      }
    }
  }
}

结果4：

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "sun_age": {
            "value": 942
        }
    }
}

示例5：为没有值字段指定值。如未指定，缺失该字段值的文档将被忽略：

POST /book1/_search?pretty

{
  "size":0,
  "aggs": {
    "sun_age": {
      "avg": {
          "field":"age",
        "missing":15
      }
    }
  }
}

结果5：

{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "sun_age": {
            "value": 12.847826086956522
        }
    }
}

2. 文档计数 count

示例1：统计银行索引book下年龄为12的文档数量

POST book1/english/_count
{
    "query":{
        "match":{
            "age":12
        }
    }
}

结果1：

{
    "count": 16,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    }
}

3. Value count 统计某字段有值的文档数

示例1：

POST /book1/_search?size=0
{
    "aggs":{
        "age_count":{
            "value_count":{
                "field":"age"
            }
            
        }
    }
}

结果1：

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_count": {
            "value": 38
        }
    }
}

4. cardinality 值去重计数

示例1：

POST  /book1/_search?size=0
{
    "aggs":{
        "age_count":{
            "value_count":{
                "field":"age"
            }
            
        },
        "name_count":{
            "cardinality":{
                "field":"age"
            }
        }
    }
}

结果1：

{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "name_count": {
            "value": 11
        },
        "age_count": {
            "value": 38
        }
    }
}

说明：有值的38个，去掉重复的之后以一共有11个。

5. stats 统计 count max min avg sum 5个值

示例1：

POST  /book1/_search?size=0
{
    "aggs":{
        "age_count":{
            "stats":{
                "field":"age"
            }
            
        }
    }
}

结果1：

{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_count": {
            "count": 38,
            "min": 1,
            "max": 54,
            "avg": 12.394736842105264,
            "sum": 471
        }
    }
}

6. Extended stats

高级统计，比stats多4个统计结果： 平方和、方差、标准差、平均值加/减两个标准差的区间。

示例1：

POST /book1/_search?size=0

{
    "aggs":{
        "age_stats":{
            "extended_stats":{
                "field":"age"
            }
            
        }
    }
}

结果1：

{
    "took": 8,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_stats": {
            "count": 38,
            "min": 1,
            "max": 54,
            "avg": 12.394736842105264,
            "sum": 471,
            "sum_of_squares": 11049,
            "variance": 137.13365650969527,
            "std_deviation": 11.710408041981085,
            "std_deviation_bounds": {
                "upper": 35.81555292606743,
                "lower": -11.026079241856905
            }
        }
    }
}

7. Percentiles 占比百分位对应的值统计

示例1：

对指定字段（脚本）的值按从小到大累计每个值对应的文档数的占比（占所有命中文档数的百分比），返回指定占比比例对应的值。默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。如下中间的结果，可以理解为：占比为50%的文档的age值 <= 12，或反过来：age<=12的文档数占总命中文档数的50%。

POST /book1/_search?size=0
{
    "aggs":{
        "age_percentiles":{
            "percentiles":{
                "field":"age"
            }
            
        }
    }
}

结果1：

{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_percentiles": {
            "values": {
                "1.0": 1,
                "5.0": 1,
                "25.0": 1,
                "50.0": 12,
                "75.0": 13,
                "95.0": 40.600000000000016,
                "99.0": 54
            }
        }
    }
}

示例2：指定分位值（占比50%，96%，99%的范围值分别是多少）

POST /book1/_search?size=0
{
    "aggs":{
        "age_percentiles":{
            "percentiles":{
                "field":"age",
                "percents" : [50,96,99]
            }
            
        }
    }
}

结果2：

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_percentiles": {
            "values": {
                "50.0": 12,
                "96.0": 44.779999999999966,
                "99.0": 54
            }
        }
    }
}

说明：50%的数值<= 12, 96%的数值<= 96%, 99%的数值<= 54

8. Percentiles rank 统计值小于等于指定值的文档占比

示例1：统计年龄小于25和30的文档的占比，和第7项相反

POST /book1/_search?size=0
{
    "aggs":{
        "aggs_perc_rank":{
            "percentile_ranks":{
                "field":"age",
                "values" : [12,35]
            }
            
        }
    }
}

结果1：

{
    "took": 8,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "aggs_perc_rank": {
            "values": {
                "12.0": 71.05263157894737,
                "35.0": 92.76315789473685
            }
        }
    }
}

结果说明：年龄小于12的文档占比为71%，年龄小于35的文档占比为92%，

9. Geo Bounds aggregation 求文档集中的地理位置坐标点的范围

参考官网链接：

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geobounds-aggregation.html

10. Geo Centroid aggregation 求地理位置中心点坐标值

参考官网链接：

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html

三、桶聚合

1. Terms Aggregation 根据字段值项分组聚合

示例1：

POST /book1/_search?size=0

{
    "aggs":{
        "age_terms":{
            "terms":{
                "field":"age"
            }
        }
    }
}

说明：相当于group by age

结果1：

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1,
            "buckets": [
                {
                    "key": 12,
                    "doc_count": 16
                },
                {
                    "key": 1,
                    "doc_count": 11
                },
                {
                    "key": 13,
                    "doc_count": 2
                },
                {
                    "key": 14,
                    "doc_count": 2
                },
                {
                    "key": 11,
                    "doc_count": 1
                },
                {
                    "key": 16,
                    "doc_count": 1
                },
                {
                    "key": 21,
                    "doc_count": 1
                },
                {
                    "key": 33,
                    "doc_count": 1
                },
                {
                    "key": 34,
                    "doc_count": 1
                },
                {
                    "key": 45,
                    "doc_count": 1
                }
            ]
        }
    }
}

结果说明：

"doc_count_error_upper_bound": 0：文档计数的最大偏差值

"sum_other_doc_count": 1：未返回的其他文档数，不在桶里的文档数量

默认情况下返回按文档计数从高到低的前10个分组：

示例2：sizz可以指定返回多少组数

POST /book1/_search?size=0
{
    "aggs":{
        "age_terms":{
            "terms":{
                "field":"age",
                "size":5
            }
            
        }
    }
}

结果2：

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 6,
            "buckets": [
                {
                    "key": 12,
                    "doc_count": 16
                },
                {
                    "key": 1,
                    "doc_count": 11
                },
                {
                    "key": 13,
                    "doc_count": 2
                },
                {
                    "key": 14,
                    "doc_count": 2
                },
                {
                    "key": 11,
                    "doc_count": 1
                }
            ]
        }
    }
}

示例3：每个分组上显示偏差值

POST /book1/_search?size=0
{
    "aggs":{
        "age_terms":{
            "terms":{
                "field":"age",
                "size":5,
                 "show_term_doc_count_error": true
            }
            
        }
    }
}

结果3：

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 6,
            "buckets": [
                {
                    "key": 12,
                    "doc_count": 16,
                    "doc_count_error_upper_bound": 0
                },
                {
                    "key": 1,
                    "doc_count": 11,
                    "doc_count_error_upper_bound": 0
                },
                {
                    "key": 13,
                    "doc_count": 2,
                    "doc_count_error_upper_bound": 0
                },
                {
                    "key": 14,
                    "doc_count": 2,
                    "doc_count_error_upper_bound": 0
                },
                {
                    "key": 11,
                    "doc_count": 1,
                    "doc_count_error_upper_bound": 0
                }
            ]
        }
    }
}

示例4：shard_size 指定每个分片上返回多少个分组

POST /book1/_search?size=0
{
    "aggs":{
        "age_terms":{
            "terms":{
                "field":"age",
                "size":3,
                 "shard_size": 20
            }
            
        }
    }
}

结果4：

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 9,
            "buckets": [
                {
                    "key": 12,
                    "doc_count": 16
                },
                {
                    "key": 1,
                    "doc_count": 11
                },
                {
                    "key": 13,
                    "doc_count": 2
                }
            ]
        }
    }
}

order 指定分组的排序

示例5：根据分组值"_key"排序

POST /book1/_search?size=0
{
    "aggs":{
        "age_terms":{
            "terms":{
                "field":"age",
                "size":3,
                 "order":{"_key":"desc"}
            }
            
        }
    }
}

结果5：

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 35,
            "buckets": [
                {
                    "key": 54,
                    "doc_count": 1
                },
                {
                    "key": 45,
                    "doc_count": 1
                },
                {
                    "key": 34,
                    "doc_count": 1
                }
            ]
        }
    }
}

示例6：根据文档计数"_count"排序

POST /book1/_search?size=0
{
    "aggs":{
        "age_terms":{
            "terms":{
                "field":"age",
                "size":3,
                 "order":{"_count":"desc"}
            }
            
        }
    }
}

结果6：

{
    "took": 91,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 9,
            "buckets": [
                {
                    "key": 12,
                    "doc_count": 16
                },
                {
                    "key": 1,
                    "doc_count": 11
                },
                {
                    "key": 13,
                    "doc_count": 2
                }
            ]
        }
    }
}

示例7：取分组指标值排序

POST /book1/_search?size=0
{
    "aggs":{
        "age_terms":{
            "terms":{
                "field":"age",
                 "order":{"max_age":"desc"}
            },
            "aggs":{
                "max_age":{
                    "max":{
                        "field":"age"
                    }
                },
                "min_age":{
                    "min":{
                        "field":"age"
                    }
                }
            }
            
        }
    
        
    }
}

说明：先根据age 分组，再计算每个组的最大最小值，最后根据最大值倒排

示例8：筛选分组-正则表达式匹配值

POST book1/_search?size=0
{
    "aggs":{
        "tags":{
            "terms":{
                "field":"name",
                "include":"里*",
                "exclude":"test*"
            }
            
        }
    
        
    }
}

结果8：

{
    "took": 22,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "tags": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "里",
                    "doc_count": 13
                }
            ]
        }
    }
}

示例9：筛选分组-指定值列表

POST book1/_search?size=0
{
    "aggs":{
        "Chinese":{
            "terms":{
                "field":"name",
                "include":["里","国"]
            }
            
        },
        "Test":{
            "terms":{
                "field":"name",
                "exclude":["test","the"]
            }
        }
    
        
    }
}

结果9：

{
    "took": 23,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "Test": {
            "doc_count_error_upper_bound": 6,
            "sum_other_doc_count": 559,
            "buckets": [
                {
                    "key": "里",
                    "doc_count": 12
                },
                {
                    "key": "否",
                    "doc_count": 11
                },
                {
                    "key": "a",
                    "doc_count": 7
                },
                {
                    "key": "default",
                    "doc_count": 7
                },
                {
                    "key": "document",
                    "doc_count": 7
                },
                {
                    "key": "for",
                    "doc_count": 7
                },
                {
                    "key": "absolute",
                    "doc_count": 6
                },
                {
                    "key": "account",
                    "doc_count": 6
                },
                {
                    "key": "accurate",
                    "doc_count": 6
                },
                {
                    "key": "documents",
                    "doc_count": 6
                }
            ]
        },
        "Chinese": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "国",
                    "doc_count": 4
                }
            ]
        }
    }
}

View Code

示例10：根据脚本计算值分组

POST book1/_search?size=0
{
    "aggs":{
        "name":{
            "terms":{
                "script":{
                    "source":"doc['age'].value + doc.age.value",
                    "lang": "painless"
                }
            }
         }   
     }
}

说明：脚本取值的方式doc['age'].value 或者 doc.age.value

结果10：

{
    "took": 18,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "24",
                    "doc_count": 16
                },
                {
                    "key": "2",
                    "doc_count": 11
                },
                {
                    "key": "0",
                    "doc_count": 8
                },
                {
                    "key": "22",
                    "doc_count": 1
                },
                {
                    "key": "26",
                    "doc_count": 1
                },
                {
                    "key": "28",
                    "doc_count": 1
                },
                {
                    "key": "32",
                    "doc_count": 1
                },
                {
                    "key": "42",
                    "doc_count": 1
                },
                {
                    "key": "66",
                    "doc_count": 1
                }
            ]
        }
    }
}

2. filter Aggregation 对满足过滤查询的文档进行聚合计算

示例1：在查询命中的文档中选取符合过滤条件的文档进行聚合，先过滤再聚合（和上面的示例9示例9：筛选分组，区分开：先聚合再过滤）

POST book1/_search?size=0
{
    "aggs":{
        "age_terms":{
            "filter":{
                "match":{"name":"test"}
            },
        "aggs":{
            "avg_age":{
                "avg":{"field":"age" }
            }
         }
       }
    }
}

结果1：

{
    "took": 152,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "doc_count": 5,
            "avg_age": {
                "value": 19.9
            }
        }
    }
}

3. Filters Aggregation 多个过滤组聚合计算

示例1：分别统计包含‘test’,和‘里’的文档的个数

POST book1/_search?size=0
{
    "aggs":{
        "age_terms":{
            "filters":{
                "filters":{
                    "test":{
                        "match":{"name":"test"}
                    },
                    "china":{
                        "match":{"name":"里"}
                    }    
                }
            }
        }
    }
}

结果：

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "buckets": {
                "china": {
                    "doc_count": 13
                },
                "test": {
                    "doc_count": 5
                }
            }
        }
    }
}

例如：日志中选出 error和warning日志的个数，作日志预警

GET logs/_search
{
  "size": 0,
  "aggs": {
    "messages": {
      "filters": {
        "filters": {
          "errors": {
            "match": {
              "body": "error"
            }
          },
          "warnings": {
            "match": {
              "body": "warning"
            }
          }
        }
      }
    }
  }
}

示例2：为其他值组指定key

POST book1/_search?size=0
{
    "aggs":{
        "age_terms":{
            "filters":{
                "other_bucket_key": "other_messages",
                "filters":{
                    "test":{
                        "match":{"name":"test"}
                    },
                    "china":{
                        "match":{"name":"里"}
                    }    
                }
            }
        }
    }
}

结果2：

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_terms": {
            "buckets": {
                "china": {
                    "doc_count": 13
                },
                "test": {
                    "doc_count": 5
                },
                "other_messages": {
                    "doc_count": 23
                }
            }
        }
    }
}

4. Range Aggregation 范围分组聚合

示例1：

POST book1/_search?size=0

{
    "aggs":{
        "age_range":{
            "range":{
                "field":"age",
                "keyed":true,
                "ranges":[
                    {
                        "to":20,
                        "key":"TW"
                    },
                    {
                        "from":25,
                        "to":40,
                        "key":"TH"
                    },
                    {
                        "from":60,
                        "key":"SIX"
                    }
                ]
            }
        }
    }
}

结果1：

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "age_range": {
            "buckets": {
                "TW": {
                    "to": 20,
                    "doc_count": 31
                },
                "TH": {
                    "from": 25,
                    "to": 40,
                    "doc_count": 2
                },
                "SIX": {
                    "from": 60,
                    "doc_count": 0
                }
            }
        }
    }
}

5. Date Range Aggregation 时间范围分组聚合

示例1：

POST /bank/_search?size=0
{
  "aggs": {
    "range": {
      "date_range": {
        "field": "date",
        "format": "MM-yyy",
        "ranges": [
          {
            "to": "now-10M/M"
          },
          {
            "from": "now-10M/M"
          }
        ]
      }
    }
  }
}

结果1：

{
  "took": 115,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "range": {
      "buckets": [
        {
          "key": "*-2017-08-01T00:00:00.000Z",
          "to": 1501545600000,
          "to_as_string": "2017-08-01T00:00:00.000Z",
          "doc_count": 0
        },
        {
          "key": "2017-08-01T00:00:00.000Z-*",
          "from": 1501545600000,
          "from_as_string": "2017-08-01T00:00:00.000Z",
          "doc_count": 0
        }
      ]
    }
  }
}

6. Date Histogram Aggregation 时间直方图（柱状）聚合

就是按天、月、年等进行聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 间隔聚合或指定的时间间隔聚合。

示例1：

POST /bank/_search?size=0
{
  "aggs": {
    "sales_over_time": {
      "date_histogram": {
        "field": "date",
        "interval": "month"
      }
    }
  }
}

结果1：

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "sales_over_time": {
      "buckets": []
    }
  }
}

7. Missing Aggregation 缺失值的桶聚合

示例：统计没有值的文档的数量

POST /book/_search?size=0
{
    "aggs" : {
        "account_without_a_age" : {
            "missing" : { "field" : "age" }
        }
    }
}

结果1:

{
    "took": 10,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 41,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "account_without_age": {
            "doc_count": 8
        }
    }
}

8. Geo Distance Aggregation 地理距离分区聚合

参考官网链接：

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geodistance-aggregation.html

posted on 2018-09-03 23:49 小人物的奋斗阅读(19522) 评论(1) 收藏举报

刷新页面返回顶部

ES系列十四、ES聚合分析（聚合分析简介、指标聚合、桶聚合）

一、聚合分析简介

1. ES聚合分析是什么？

2. ES聚合分析查询的写法

3. 聚合分析的值来源

二、指标聚合

1. max min sum avg

2. 文档计数 count

3. Value count 统计某字段有值的文档数

4. cardinality 值去重计数

5. stats 统计 count max min avg sum 5个值

6. Extended stats

7. Percentiles 占比百分位对应的值统计

8. Percentiles rank 统计值小于等于指定值的文档占比

9. Geo Bounds aggregation 求文档集中的地理位置坐标点的范围

10. Geo Centroid aggregation 求地理位置中心点坐标值

三、桶聚合

1. Terms Aggregation 根据字段值项分组聚合

2. filter Aggregation 对满足过滤查询的文档进行聚合计算

3. Filters Aggregation 多个过滤组聚合计算

4. Range Aggregation 范围分组聚合

5. Date Range Aggregation 时间范围分组聚合

6. Date Histogram Aggregation 时间直方图（柱状）聚合

7. Missing Aggregation 缺失值的桶聚合

8. Geo Distance Aggregation 地理距离分区聚合

导航

公告