Elasticsearch基本语法

match和match_phrase区别

match: 索引中只要有任意一个匹配拆分后词就可以出现在结果中，只是匹配度越高的排越前面

match_phrase: 索引中必须同时匹配拆分后词就可以出现在结果中

ex:

GET /product_index/product/_search
{
  "query": {
    "match_phrase": {
      "product_name": "PHILIPS toothbrush"
    }
  }
}

product_name必须同时包含PHILIPS和toothbrush才会返回。

match的另一些用法

满足分词结果中所有的词，而不是像上面，任意一个就可以的。

GET /product_index/product/_search
{
  "query": {
    "match": {
      "product_name": {
        "query": "PHILIPS toothbrush",
        "operator": "and"
      }
     }
   }
}

只要命中50%的分词就返回

GET /test_index/test/_search
{
  "query": {
    "match": {
      "product_name": {
        "query": "java 程序员 书 推荐",
        "minimum_should_match": "50%"
      }
    }
  }
}

multi_match: 查询a和b字段中，只要有c关键字的就出现

GET /test_index/test/_search
{
  "query": {
    "multi_match": {
      "query": "c",
      "fields": [
        "a",
        "b"
      ]
    }
  }
}

multi_match 跨多个 field 查询，表示查询分词必须出现在相同字段中

GET /product_index/product/_search
{
  "query": {
    "multi_match": {
      "query": "PHILIPS toothbrush",
      "type": "cross_fields",
      "operator": "and",
      "fields": [
        "product_name",
        "product_desc"
      ]
    }
  }
}

match_phrase + slop

在说 slop 的用法之前，需要先说明原数据是：大吉大利，被分词后至少有：大吉大利四个 term。
match_phrase 的用法我们上面说了，按理说查询的词必须完全匹配才能查询到，吉利很明显是不完全匹配的。
但是有时候我们就是要这种不完全匹配，只要求他们尽可能靠谱，中间有几个单词是没啥问题的，那就可以用到 slop。slop = 2 表示中间如果间隔 2 个单词以内也算是匹配的结果（）。
实也不能称作间隔，应该说是移位，查询的关键字分词后移动多少位可以跟 doc 内容匹配，移动的次数就是 slop。所以吉利其实也是可以匹配到 doc 的，只是 slop = 1 才行。
```
GET /product_index/product/_search
{
  "query": {
    "match_phrase": {
      "product_name" : {
          "query" : "吉利",
          "slop" : 1
      }
    }
  }
}
```

term用法

term 一般用在不分词字段上的，因为它是完全匹配查询，如果要查询的字段是分词字段就会被拆分成各种分词结果，和完全查询的内容就对应不上了。

所以自己设置 mapping 的时候有些不分词的时候就最好设置不分词。

其实 Elasticsearch 5.X 之后给 text 类型的分词字段，又默认新增了一个子字段 keyword，这个字段的类型就是 keyword，是不分词的，默认保留 256 个字符。假设 product_name 是分词字段，那有一个 product_name.keyword 是不分词的字段，也可以用这个子字段来做完全匹配查询。

terms 用法

类似于数据库的 in

GET /product_index/product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "terms": {
          "product_name": [
            "toothbrush",
            "shell"
          ]
        }
      }
    }
  }
}

query和filter区别

GET /product_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "product_name": [
              "PHILIPS",
              "toothbrush"
            ]
          }
        },
        {
          "range": {
            "price": {
              "gt": 12.00
            }
          }
        }
      ]
    }
  }
}

GET /product_index/product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "price": {
            "gte": 30.00
          }
        }
      }
    }
  }
}

从搜索结果上看：

filter，只查询出搜索条件的数据，不计算相关度分数
query，查询出搜索条件的数据，并计算相关度分数，按照分数进行倒序排序

从性能上看：

filter（性能更好，无排序），无需计算相关度分数，也就无需排序，内置的自动缓存最常使用查询结果的数据
query（性能较差，有排序），要计算相关度分数，按照分数进行倒序排序，没有缓存结果的功能
filter 和 query 一起使用可以兼顾两者的特性，所以看你业务需求

should 有一个特殊性，如果组合查询中没有 must 条件，那么 should 中必须至少匹配一个。我们也还可以通过 minimum_should_match 来限制它匹配更多个。

GET /product_index/product/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "product_name": "java"
          }
        },
        {
          "match": {
            "product_name": "程序员"
          }
        },
        {
          "match": {
            "product_name": "书"
          }
        },
        {
          "match": {
            "product_name": "推荐"
          }
        }
      ],
      "minimum_should_match": 3
    }
  }
}

should有一个特殊性，如果组合查询中没有 must 条件，那么 should 中必须至少匹配一个。我们也还可以通过 minimum_should_match 来限制它匹配更多个。

GET /product_index/product/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "product_name": "java"
          }
        },
        {
          "match": {
            "product_name": "程序员"
          }
        },
        {
          "match": {
            "product_name": "书"
          }
        },
        {
          "match": {
            "product_name": "推荐"
          }
        }
      ],
      "minimum_should_match": 3
    }
  }
}

View Code

boost 用法

在搜索精准度的控制上，还有一种需求，比如搜索：PHILIPS toothbrush，要比：Braun toothbrush 更加优先，我们可以这样：

GET /product_index/product/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "product_name": "toothbrush"
          }
        }
      ],
      "should": [
        {
          "match": {
            "product_name": {
              "query": "PHILIPS",
              "boost": 4
            }
          }
        },
        {
          "match": {
            "product_name": {
              "query": "Braun",
              "boost": 3
            }
          }
        }
      ]
    }
  }
}

View Code

通配符搜索（性能较差，扫描所有倒排索引）

GET /product_index/product/_search
{
  "query": {
    "wildcard": {
      "product_name": {
        "value": "ipho*"
      }
    }
  }
}

View Code

正则搜索（性能较差，扫描所有倒排索引）

GET /product_index/product/_search
{
  "query": {
    "regexp": {
      "product_name": "iphone[0-9].+"
    }
  }
}

View Code

range用法

range用于查询数值，时间区间

GET /product_index/product/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 30.00
      }
    }
  }
}

posted @ 2018-11-20 11:06 半岛弥情阅读(1367) 评论(0) 收藏举报

刷新页面返回顶部

半岛弥情

瞎看看，瞎写写

Elasticsearch基本语法

match和match_phrase区别

match的另一些用法

满足分词结果中所有的词，而不是像上面，任意一个就可以的。

只要命中50%的分词就返回

multi_match: 查询a和b字段中，只要有c关键字的就出现

multi_match 跨多个 field 查询，表示查询分词必须出现在相同字段中

match_phrase + slop

term用法

terms 用法

query和filter区别

boost 用法

通配符搜索（性能较差，扫描所有倒排索引）

正则搜索（性能较差，扫描所有倒排索引）

range用法

公告