Elasticsearch请求体查询

前言

在前面的笔记中，记录了Elasticsearch的轻量查询，同时也说明了不推荐轻量查询，这篇笔记主要记录如何使用Elasticsearch请求体查询。

它不仅可以处理自身的查询请求，还允许你对结果进行片段强调（高亮）、对所有或部分结果进行聚合分析，同时还可以给出你是不是想找的建议，这些建议可以引导使用者快速找到他想要的结果。

空查询

空查询，不指定任何参数。将返回所有索引库中的所有文档：

GET /_search
{}

返回结果：

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 23,
    "successful" : 23,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : ".kibana_1",
        "_type" : "_doc",
        "_id" : "space:default",
        "_score" : 1.0,
        "_source" : {
          "space" : {
            "name" : "Default",
            "description" : "This is your default space!",
            "color" : "#00bfb3",
            "disabledFeatures" : [ ],
            "_reserved" : true
          },
          "type" : "space",
          "references" : [ ],
          "migrationVersion" : {
            "space" : "6.6.0"
          },
          "updated_at" : "2020-12-08T06:47:14.690Z"
        }
      },
      ...
    ]
  }
}

查询表达式

查询表达式(Query DSL)是一种非常灵活又富有表现力的查询语言。 Elasticsearch 使用它可以以简单的 JSON 接口来展现 Lucene 功能的绝大部分。在你的应用中，你应该用它来编写你的查询语句。它可以使你的查询语句更灵活、更精确、易读和易调试。

我们可以将查询语句传递给query参数：

GET /_search
{
    "query": YOUR_QUERY_HERE
}

空查询相当于我们使用match_all查询，匹配所有文档：

GET /_search
{
    "query": {
        "match_all": {}
    }
}

查询语句结构

一个查询语句的典型结构：

{
    QUERY_NAME: {
        ARGUMENT: VALUE,
        ARGUMENT: VALUE,...
    }
}

如果是针对某个字段，那么它的结构如下：

{
    QUERY_NAME: {
        FIELD_NAME: {
            ARGUMENT: VALUE,
            ARGUMENT: VALUE,...
        }
    }
}

举个例子，你可以使用 match 查询语句来查询 tweet 字段中包含 elasticsearch 的 tweet：

{
    "match": {
        "tweet": "elasticsearch"
    }
}

完整的查询请求如下：

GET /_search
{
    "query": {
        "match": {
            "tweet": "elasticsearch"
        }
    }
}

合并查询语句

查询语句(Query clauses) 就像一些简单的组合块，这些组合块可以彼此之间合并组成更复杂的查询。这些语句可以是如下形式：

叶子语句（Leaf clauses） (就像 match 语句) 被用于将查询字符串和一个字段（或者多个字段）对比。
复合(Compound) 语句主要用于合并其它查询语句。比如，一个 bool 语句允许在你需要的时候组合其它语句，无论是 must 匹配、 must_not 匹配还是 should 匹配，同时它可以包含不评分的过滤器（filters）：

{
    "bool": {
        "must":     { "match": { "tweet": "elasticsearch" }},
        "must_not": { "match": { "name":  "mary" }},
        "should":   { "match": { "tweet": "full text" }},
        "filter":   { "range": { "age" : { "gt" : 30 }} }
    }
}

常用查询

虽然 Elasticsearch 自带了很多的查询，但经常用到的也就那么几个，下面简单记录下Elasticsearch常用查询的用法。

match_all

match_all 查询简单的匹配所有文档。在没有指定查询方式时，它是默认的查询：

{ "match_all": {}}

match

match查询是标准查询，当在精确值字段使用它，它将会精确匹配给定的值。当在一个全文字段上使用match查询，在执行查询前，它将用正确的分析器去分析查询字符串：

{ "match": { "age":    26           }}
{ "match": { "date":   "2014-09-01" }}
{ "match": { "public": true         }}
{ "match": { "tag":    "full_text"  }}

注意：

对于精确值的查询，建议使用 filter 语句来取代 query，因为 filter 将会被缓存。

multi_match

multi_match 查询可以在多个字段上执行相同的 match 查询：

#在title与body字段中查找"full text search"
{
    "multi_match": {
        "query":    "full text search",
        "fields":   [ "title", "body" ]
    }
}

range

range 查询找出那些落在指定区间内的数字或者时间：

{
    "range": {
        "age": {
            "gte":  20,
            "lt":   30
        }
    }
}

range操作符有以下几种：

gt：大于
gte：大于等于
lt：小于
lte：小于等于

term

term 查询被用于精确值匹配，term 查询对于输入的文本不进行分析，所以它将给定的值进行精确查询：

{ "term": { "age":    26           }}
{ "term": { "date":   "2014-09-01" }}
{ "term": { "public": true         }}
{ "term": { "tag":    "full_text"  }}

terms

terms是term的升级版本，允许你指定多值进行匹配。如果这个字段包含了指定值中的任何一个值，那么这个文档满足条件：

{ "terms": { "tag": [ "search", "full_text", "nosql" ] }}

exists与missing

exists 查询和 missing 查询被用于查找那些指定字段中有值 (exists) 或无值 (missing) 的文档：

{
    "exists":   {
        "field":    "title"
    }
}

组合查询

前面的都是一些常用的简单查询，但是在实际业务中，一般逻辑不会这么简单。
我们需要用 bool 查询来实现需求。这种查询将多查询组合在一起，成为用户自己想要的布尔查询。它接收以下参数：

must：文档必须匹配这些条件才能被包含进来。
must_not：文档必须不匹配这些条件才能被包含进来。
should：如果满足这些语句中的任意语句，将增加_score，否则，无任何影响。它们主要用于修正每个文档的相关性得分。
filter：必须匹配，但它以不评分、过滤模式来进行。

下面的查询用于查找 title 字段匹配 how to make millions 并且不被标识为 spam 的文档。那些被标识为 starred 或在2014之后的文档，将比另外那些文档拥有更高的排名。如果两者都满足，那么它排名将更高：

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }},
            { "range": { "date": { "gte": "2014-01-01" }}}
        ]
    }
}

注意：如果没有must语句，那么至少需要能够匹配其中的一条should语句。但如果存在至少一条must语句，则对should语句的匹配没有要求。

过滤器

如果我们不想因为文档的时间而影响得分，可以用 filter 语句来重写前面的例子：

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }}
        ],
        "filter": {
          "range": { "date": { "gte": "2014-01-01" }} 
        }
    }
}

如果你需要通过多个不同的标准来过滤你的文档，bool 查询本身也可以被用做不评分的查询：

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }}
        ],
        "filter": {
          "bool": { 
              "must": [
                  { "range": { "date": { "gte": "2014-01-01" }}},
                  { "range": { "price": { "lte": 29.99 }}}
              ],
              "must_not": [
                  { "term": { "category": "ebooks" }}
              ]
          }
        }
    }
}

constant_score

constant_score它将一个不变的常量评分应用于所有匹配的文，经常用于只需要执行一个 filter 而没有其它查询的情况下：

{
    "constant_score":   {
        "filter": {
            "term": { "category": "ebooks" } 
        }
    }
}

验证查询

当我们的查询逻辑变得十分复杂的时候，可能需要用到验证查询的功能，它可以自动检测出你的查询语句是否存在问题：

GET /index_name/_validate/query?explain
{
   "query": {
      "match" : {
         "name" : "really powerful"
      }
   }
}

返回结果：

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "index_name",
      "valid" : true,
      "explanation" : "name:really name:powerful"
    }
  ]
}

如果我们的查询语句有问题的话，将会返回错误信息，如下：

GET /index_name/_validate/query?explain
{
   "query": {
      "test" : {
         "name" : "really powerful"
      }
   }
}

返回信息：

{
  "valid" : false,
  "error" : "ParsingException[unknown query [test]]; nested: NamedObjectNotFoundException[[3:16] unknown field [test]];; org.elasticsearch.common.xcontent.NamedObjectNotFoundException: [3:16] unknown field [test]"
}

验证解析

在验证的时候，推荐如上一样加上explain参数，这样不管验证是否通过，它将返回详细信息回来。

如上根据返回的错误信息，我们可以快速定位出问题所在。

从 explanation 中可以看出，匹配 really powerful 的 match 查询被重写为两个针对 name 字段的 single-term 查询，一个single-term查询对应查询字符串分出来的一个term。

posted @ 2021-02-22 17:33 红雨520 阅读(160) 评论(0) 编辑收藏举报

刷新页面返回顶部

极速快码