elasticsearch 复杂查询小记
以下接口调用都基于5.5版本
JSON 文档格式
{ "_index": "zipkin-2017-09-06", "_type": "span", "_id": "AV5WSb1lKwYfgxikh_Fp", "_score": null, "_source": { "timestamp_millis": 1504686226897, "traceId": "58d858be36d2493e", "id": "eb5e8ee2ff39eaa7", "name": "close", "parentId": "47622e0c4229a48b", "timestamp": 1504686226897000, "duration": 2, "binaryAnnotations": [ { "key": "ip", "value": "127.0.0.1", "endpoint": { "serviceName": "redis", "ipv4": "127.0.0.1", "port": 20880 } }, { "key": "lc", "value": "unknown", "endpoint": { "serviceName": "redis", "ipv4": "127.0.0.1", "port": 20880 } }, { "key": "service", "value": "redis", "endpoint": { "serviceName": "redis", "ipv4": "127.0.0.1", "port": 20880 } } ] }, "fields": { "timestamp_millis": [ 1504686226897 ] }, "sort": [ 1504686226897 ] }
1.OR条件查询格式
{"query":{"bool":{"should":[{},{},{}...}]}},"size":400,"from":0,"sort":[{"timestamp":{"order":"desc","unmapped_type":"boolean"}}]}
should条件的意思就只要匹配到里面其中一个条件就可以命中, 如
{"query":{"bool":{"should":[{"match":{"traceId":"6edb691b4bc775b1"}},{"match":{"traceId":"7e5b391r4bc775b1"}}]}},"size":400,"from":0,"sort":[{"timestamp":{"order":"desc","unmapped_type":"boolean"}}]}
只要traceId等于其中一个值就可以命中
2.AND 条件查询格式
{"query":{"bool":{"must":[{},{},{}...}]}},"size":400,"from":0,"sort":[{"timestamp":{"order":"desc","unmapped_type":"boolean"}}]}
must条件的意思就是必须匹配里面的所有条件才可以命中,如
{"query":{"bool":{"must":[{"range":{"timestamp":{"gte":1504581280866000,"lte":1504581280878000,"format":"date_time_no_millis"}}}, {"match":{"traceId":"6edb691b4bc775b1"}}],"must_not": {"exists": { "field": "parentId" } }}},"size":400,"from":0,"sort":[{"timestamp":{"order":"desc","unmapped_type":"boolean"}}]}
必须匹配traceId=6edb691b4bc775b1, 并且时间范围在1504581280866000,1504581280878000
3.是否含有某key
"must_not": {"exists": { "field": "parentId" } }
意思是查询必须没有parenId这个key的数据
{"query":{ "bool":{"must":[{"range":{"timestamp":{"gte":1504581280866000,"lte":1504581280878000,"format":"date_time_no_millis"}}}, {"match":{"traceId":"6edb691b4bc775b1"}}],"must_not": {"exists": { "field": "parentId" } }}}, "size":400,"from":0,"sort":[{"timestamp":{"order":"desc","unmapped_type":"boolean"}}]}
PS: 不管是must,should,must_not都是平级的,包含在bool里面
4.嵌套查询
{"query":{ "bool":{"must":[{"range":{"timestamp":{"gte":1504581280866000,"lte":1504581280878000,"format":"date_time_no_millis"}}}, {"match":{"traceId":"6edb691b4bc775b1"}},{"nested": {"path": "binaryAnnotations" ,"query": { "bool": {"must": [{ "match": { "binaryAnnotations.key": "service" }},{ "match": { "binaryAnnotations.value": "WebRequest" }}] } }}}],"must_not": {"exists": { "field": "parentId" } }}}, "size":400,"from":0,"sort":[{"timestamp":{"order":"desc","unmapped_type":"boolean"}}]}
nested嵌套查询和其他match,range条件一样,是包含在must,should这些条件里面
{"nested": {"path": "binaryAnnotations" ,"query": { "bool": {"must": [{ "match": { "binaryAnnotations.key": "service" }},{ "match": { "binaryAnnotations.value": "WebRequest" }}] } }}}
我们的JSON文档里有binaryAnnotations这个key, 而value是一个数组, 嵌套查询必须指定path,在我们这里就是binaryAnnotations,然后里面再使用query查询,query里面的语法和外层的一样
5.复合条件嵌套查询
假设我们要查询binaryAnnotations 里面两个并行的条件
{"query":{ "bool":{"must":[{"range":{"timestamp":{"gte":1504581280866000,"lte":1504581280878000,"format":"date_time_no_millis"}}}, {"match":{"traceId":"6edb691b4bc775b1"}},{"nested": {"path": "binaryAnnotations" ,"query": { "bool": {"must": [{ "match": { "binaryAnnotations.key": "service" }},{ "match": { "binaryAnnotations.value": "WebRequest" }}] } }}},{"nested": {"path": "binaryAnnotations" ,"query": { "bool": {"must": [{ "match": { "binaryAnnotations.key": "ip" }},{ "match": { "binaryAnnotations.value": "127.0.0.1" }}] } }}}],"must_not": {"exists": { "field": "parentId" } }}}, "size":400,"from":0,"sort":[{"timestamp":{"order":"desc","unmapped_type":"boolean"}}]}
6.去重查询
{"query":{"bool":{"must":[ {"match":{"name":"query"}} ]}}, "aggs": {"traceId": {"terms": {"field": "traceId","size": 10 }}}, "size":10,"from":0,"sort":[{"timestamp":{"order":"desc","unmapped_type":"boolean"}}]}
去重要使用aggs 语句,和query查询平级,这里的意思是获取name=query 的记录并且用traceId去重