Elasticsearch-sql针对groupby 聚合查询limit问题
在使用Elasticsearch-sql插件查询ES中,我们经常遇到多个字段group by聚合查询,例如:
select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28, ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31 where appCode = '5f05acfc9a084d9f9a07e165a2516c18' and logTime>= '2018-07-07T09:57:15.436Z' and logTime<= '2018-08-07T09:57:15.436Z' group by pageRef,province,city,ip limit 100
解析后:
{ "from": 0, "size": 0, "query": { "bool": { "filter": [ { "bool": { "must": [ { "bool": { "must": [ { "match_phrase": { "appCode": { "query": "5f05acfc9a084d9f9a07e165a2516c18", "slop": 0, "boost": 1 } } }, { "range": { "logTime": { "from": "2018-07-07T09:57:15.436Z", "to": null, "include_lower": true, "include_upper": true, "boost": 1 } } }, { "range": { "logTime": { "from": null, "to": "2018-08-07T09:57:15.436Z", "include_lower": true, "include_upper": true, "boost": 1 } } } ], "disable_coord": false, "adjust_pure_negative": true, "boost": 1 } } ], "disable_coord": false, "adjust_pure_negative": true, "boost": 1 } } ], "disable_coord": false, "adjust_pure_negative": true, "boost": 1 } }, "_source": { "includes": [ "SUM" ], "excludes": [] }, "aggregations": { "pageRef": { "terms": { "field": "pageRef", "size": 100, "shard_size": 2000, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_term": "asc" } ] }, "aggregations": { "province": { "terms": { "field": "province", "size": 10, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_term": "asc" } ] }, "aggregations": { "city": { "terms": { "field": "city", "size": 10, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_term": "asc" } ] }, "aggregations": { "ip": { "terms": { "field": "ip", "size": 10, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_term": "asc" } ] }, "aggregations": { "num": { "sum": { "field": "errorCount" } } } } } } } } } } } }
我们看到解析后的json看到:limit 15中的15只对group by 后面的第一个字段起作用,其他的字段size其实都是10,limit并没起作用,这就是Elasticsearch-sql针对group by存在的问题。
解决方式为:使用terms(field='correspond_brand_name',size='10',alias='correspond_brand_name',include='\".*sport.*\"',exclude='\"water_.*\"')")
注意:这种方式不再添加limit关键词,另外还要注意group by后面字段的顺序不一样,因为数据的情况,查询结果条数不一样,但是整体是没有问题的。
select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28, ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31 where appCode = '5f05acfc9a084d9f9a07e165a2516c18' and logTime>= '2018-07-07T09:57:15.436Z' and logTime<= '2018-08-07T09:57:15.436Z' group by terms(field='pageRef',size='15',alias='pageRef'), terms(field='province',size='15',alias='province'), terms(field='city',size='15',alias='city'), terms(field='ip',size='15',alias='ip')
解析后:
{ "from": 0, "size": 0, "query": { "bool": { "filter": [ { "bool": { "must": [ { "bool": { "must": [ { "match_phrase": { "appCode": { "query": "5f05acfc9a084d9f9a07e165a2516c18", "slop": 0, "boost": 1 } } }, { "range": { "logTime": { "from": "2018-07-07T09:57:15.436Z", "to": null, "include_lower": true, "include_upper": true, "boost": 1 } } }, { "range": { "logTime": { "from": null, "to": "2018-08-07T09:57:15.436Z", "include_lower": true, "include_upper": true, "boost": 1 } } } ], "disable_coord": false, "adjust_pure_negative": true, "boost": 1 } } ], "disable_coord": false, "adjust_pure_negative": true, "boost": 1 } } ], "disable_coord": false, "adjust_pure_negative": true, "boost": 1 } }, "_source": { "includes": [ "SUM" ], "excludes": [] }, "aggregations": { "pageRef": { "terms": { "field": "pageRef", "size": 15, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_term": "asc" } ] }, "aggregations": { "province": { "terms": { "field": "province", "size": 15, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_term": "asc" } ] }, "aggregations": { "city": { "terms": { "field": "city", "size": 15, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_term": "asc" } ] }, "aggregations": { "ip": { "terms": { "field": "ip", "size": 15, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_term": "asc" } ] }, "aggregations": { "num": { "sum": { "field": "errorCount" } } } } } } } } } } } }
从解析后的内容看出:四个字段的size都是15了,可以使用postman查询看看,结果是正确的。
语法来自;https://github.com/NLPchina/elasticsearch-sql中的terms用法。
补充:如果是nested(嵌套查询),比如:
select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28,ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31 where appCode = '5f05acfc9a084d9f9a07e165a2516c18' and logTime>= '2018-07-08T06:20:13.144Z' and logTime<= '2018-08-08T06:20:13.144Z' group by pageRef,province,city,ip,nested(errors.message) limit 10
那么需要这么来查:
select /*! IGNORE_UNAVAILABLE */ SUM(errorCount) as num from ctbpm-js-data-2018-w32,ctbpm-js-data-2018-w27,ctbpm-js-data-2018-w28,ctbpm-js-data-2018-w29,ctbpm-js-data-2018-w30,ctbpm-js-data-2018-w31 where appCode = '5f05acfc9a084d9f9a07e165a2516c18' and logTime>= '2018-07-08T06:20:13.144Z' and logTime<= '2018-08-08T06:20:13.144Z' group by terms(field='pageRef',size='15',alias='pageRef'), terms(field='province',size='1',alias='province'), terms(field='city',size='2',alias='city'), terms(field='ip',size='3',alias='ip'), terms(field='errors.message',size='4',alias='errors.message',nested="errors")
对于聚合查询可以参考官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations.html
本文版权归作者所有,欢迎转载,请在文章页面明显位置给出原文连接:https://www.cnblogs.com/wynjauu/articles/9439089.html
作者:敲完代码好睡觉