《Elasticsearch权威指南》案例集 之 深入搜索
精确值查找:
GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "term" : { "price" : 20 } } } } } ### 以下查询是否能查到结果和文档索引的方式有关 GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "term" : { "productID" : "XHDK-A-1293-#fJ3" } } } } } ### 要将其设置成 not_analyzed 无需分析的才能查到 DELETE /my_store PUT /my_store { "mappings" : { "products" : { "properties" : { "productID" : { "type" : "string", "index" : "not_analyzed" } } } } }
组合过滤器:
### 布尔过滤器 { "bool" : { "must" : [], "should" : [], "must_not" : [], } } ### 示例 GET /my_store/products/_search { "query" : { "filtered" : { "filter" : { "bool" : { "should" : [ { "term" : {"price" : 20}}, { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} ], "must_not" : { "term" : {"price" : 30} } } } } } } ### 嵌套布尔过滤器 GET /my_store/products/_search { "query" : { "filtered" : { "filter" : { "bool" : { "should" : [ { "term" : {"productID" : "KDKE-B-9947-#kL5"}}, { "bool" : { "must" : [ { "term" : {"productID" : "JODL-X-1937-#pV7"}}, { "term" : {"price" : 30}} ] }} ] } } } } }
查找多个精确值:
### terms { "terms" : { "price" : [20, 30] } } #### 示例 GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "terms" : { "price" : [20, 30] } } } } } ### 一定要了解 term 和 terms 是 包含(contains) 操作,而非 等值(equals) (判断 ### 精确相等 ### 最好的方式是增加并索引另一个字段, 这个字段用以存储该字段包含词项的数量 { "tags" : ["search"], "tag_count" : 1 } { "tags" : ["search", "open_source"], "tag_count" : 2 } GET /my_index/my_type/_search { "query": { "constant_score" : { "filter" : { "bool" : { "must" : [ { "term" : { "tags" : "search" } }, { "term" : { "tag_count" : 1 } } ] } } } } }
范围:
### gt: > 大于(greater than) ### lt: < 小于(less than) ### gte: >= 大于或等于(greater than or equal to) ### lte: <= 小于或等于(less than or equal to) "range" : { "price" : { "gte" : 20, "lte" : 40 } } #### 示例 GET /my_store/products/_search { "query" : { "constant_score" : { "filter" : { "range" : { "price" : { "gte" : 20, "lt" : 40 } } } } } } ### 日期范围 "range" : { "timestamp" : { "gt" : "2014-01-01 00:00:00", "lt" : "2014-01-07 00:00:00" } } ### 过去一小时 "range" : { "timestamp" : { "gt" : "now-1h" } } ### 早于 2014 年 1 月 1 日加 1 月 "range" : { "timestamp" : { "gt" : "2014-01-01 00:00:00", "lt" : "2014-01-01 00:00:00||+1M" } } ### 字符串范围 ### 查找从 a 到 b (不包含)的字符串 "range" : { "title" : { "gte" : "a", "lt" : "b" } }
处理Null值:
### 存在查询 GET /my_index/posts/_search { "query" : { "constant_score" : { "filter" : { "exists" : { "field" : "tags" } } } } } ### 缺失查询 GET /my_index/posts/_search { "query" : { "constant_score" : { "filter": { "missing" : { "field" : "tags" } } } } } ### 对象上的存在与缺失 ##### 对象示例 { "name" : { "first" : "John", "last" : "Smith" } } ### 过滤操作 { "exists" : { "field" : "name" } } ### 实际执行的是 { "bool": { "should": [ { "exists": { "field": "name.first" }}, { "exists": { "field": "name.last" }} ] } }
匹配查询:
GET /my_index/my_type/_search { "query": { "match": { "title": "QUICK!" } } }
多词查询:
### 多词查询 GET /my_index/my_type/_search { "query": { "match": { "title": "BROWN DOG!" } } } ### 提高精度 GET /my_index/my_type/_search { "query": { "match": { "title": { "query": "BROWN DOG!", "operator": "and" } } } } ### 控制精度 GET /my_index/my_type/_search { "query": { "match": { "title": { "query": "quick brown dog", "minimum_should_match": "75%" } } } }
组合查询:
### 组合查询 GET /my_index/my_type/_search { "query": { "bool": { "must": { "match": { "title": "quick" }}, "must_not": { "match": { "title": "lazy" }}, "should": [ { "match": { "title": "brown" }}, { "match": { "title": "dog" }} ] } } } ### 控制精度 ### minimum_should_match可以设置为某个具体数字,更常用的做法是将其设置为一个百分数 ### 这个查询结果会将所有满足以下条件的文档返回: title 字段包含 "brown" AND "fox" 、 "brown" AND "dog" 或 "fox" AND "dog" 。如果有文档包含所有三个条件,它会比只包含两个的文档更相关。 GET /my_index/my_type/_search { "query": { "bool": { "should": [ { "match": { "title": "brown" }}, { "match": { "title": "fox" }}, { "match": { "title": "dog" }} ], "minimum_should_match": 2 } } }
如何使用布尔匹配:
### 以下两个查询等价 ## 查询1 { "match": { "title": "brown fox"} } ## 查询2 { "bool": { "should": [ { "term": { "title": "brown" }}, { "term": { "title": "fox" }} ] } } ### 以下两个查询等价 ## 查询3 { "match": { "title": { "query": "brown fox", "operator": "and" } } } ## 查询4 { "bool": { "must": [ { "term": { "title": "brown" }}, { "term": { "title": "fox" }} ] } } ### 以下两个查询等价 ## 查询5 { "match": { "title": { "query": "quick brown fox", "minimum_should_match": "75%" } } } ## 查询6 ### 因为只有三条语句,match 查询的参数 minimum_should_match 值 75% 会被截断成 2 。即三条 should 语句中至少有两条必须匹配。 { "bool": { "should": [ { "term": { "title": "brown" }}, { "term": { "title": "fox" }}, { "term": { "title": "quick" }} ], "minimum_should_match": 2 (1) } }
查询语句提升权重:
GET /_search { "query": { "bool": { "must": { "match": { (1) "content": { "query": "full text search", "operator": "and" } } }, "should": [ { "match": { "content": { "query": "Elasticsearch", "boost": 3 (2) } }}, { "match": { "content": { "query": "Lucene", "boost": 2 (3) } }} ] } } }
控制分析:
GET /my_index/_analyze { "field": "my_type.title", "text": "Foxes" } GET /my_index/my_type/_validate/query?explain { "query": { "bool": { "should": [ { "match": { "title": "Foxes"}}, { "match": { "english_title": "Foxes"}} ] } } }
多字符串查询:
GET /_search { "query": { "bool": { "should": [ { "match": { "title": "War and Peace" }}, { "match": { "author": "Leo Tolstoy" }} ] } } } GET /_search { "query": { "bool": { "should": [ { "match": { "title": "War and Peace" }}, { "match": { "author": "Leo Tolstoy" }}, { "bool": { "should": [ { "match": { "translator": "Constance Garnett" }}, { "match": { "translator": "Louise Maude" }} ] }} ] } } } ### 语句的优先级 GET /_search { "query": { "bool": { "should": [ { "match": { "title": { "query": "War and Peace", "boost": 2 }}}, { "match": { "author": { "query": "Leo Tolstoy", "boost": 2 }}}, { "bool": { "should": [ { "match": { "translator": "Constance Garnett" }}, { "match": { "translator": "Louise Maude" }} ] }} ] } } }
最佳字段:
### dis_max 查询:将任何与任一查询匹配的文档作为结果返回,但只将最佳匹配的评分作为查询的评分结果返回 { "query": { "dis_max": { "queries": [ { "match": { "title": "Brown fox" }}, { "match": { "body": "Brown fox" }} ] } } }
最佳字段查询调优:
### 通过指定 tie_breaker 这个参数将其他匹配语句的评分也考虑其中 { "query": { "dis_max": { "queries": [ { "match": { "title": "Quick pets" }}, { "match": { "body": "Quick pets" }} ], "tie_breaker": 0.3 } } }
multi_match查询:
### best_fields 、 most_fields 和 cross_fields (最佳字段、多数字段、跨字段) ### 以下两个查询等价 ## 查询1 { "dis_max": { "queries": [ { "match": { "title": { "query": "Quick brown fox", "minimum_should_match": "30%" } } }, { "match": { "body": { "query": "Quick brown fox", "minimum_should_match": "30%" } } }, ], "tie_breaker": 0.3 } } ## 查询2 { "multi_match": { "query": "Quick brown fox", "type": "best_fields", "fields": [ "title", "body" ], "tie_breaker": 0.3, "minimum_should_match": "30%" } } ### 查询字段名称的模糊匹配 { "multi_match": { "query": "Quick brown fox", "fields": "*_title" } } ### 提升单个字段的权重 { "multi_match": { "query": "Quick brown fox", "fields": [ "*_title", "chapter_title^2" ] } }
多数字段:
GET /my_index/_search { "query": { "multi_match": { "query": "jumping rabbits", "type": "most_fields", "fields": [ "title", "title.std" ] } } } ### 权重控制 GET /my_index/_search { "query": { "multi_match": { "query": "jumping rabbits", "type": "most_fields", "fields": [ "title^10", "title.std" ] } } }
跨字段实体搜索:
### 查询每个字段并将每个字段的匹配评分结果相加 { "query": { "multi_match": { "query": "Poland Street W1V", "type": "most_fields", "fields": [ "street", "city", "country", "postcode" ] } } }
自定义 _all 字段:
### copy_to PUT /my_index { "mappings": { "person": { "properties": { "first_name": { "type": "string", "copy_to": "full_name" }, "last_name": { "type": "string", "copy_to": "full_name" }, "full_name": { "type": "string" } } } } }
cross-fields跨字段查询:
GET /books/_search { "query": { "multi_match": { "query": "peter smith", "type": "cross_fields", "fields": [ "title^2", "description" ] } } }
短语匹配:
GET /my_index/my_type/_search { "query": { "match_phrase": { "title": "quick brown fox" } } }
混合起来:
### slop 参数告诉 match_phrase 查询词条相隔多远时仍然能将文档视为匹配 GET /my_index/my_type/_search { "query": { "match_phrase": { "title": { "query": "quick fox", "slop": 1 } } } }
多值字段:
### 多值字段示例 PUT /my_index/groups/1 { "names": [ "John Abraham", "Lincoln Smith"] } ### position_increment_gap 设置告诉 Elasticsearch 应该为数组中每个新元素增加当前词条 position 的指定值 PUT /my_index/_mapping/groups { "properties": { "names": { "type": "string", "position_increment_gap": 100 } } }
使用邻近度提高相关度:
GET /my_index/my_type/_search { "query": { "bool": { "must": { "match": { "title": { "query": "quick brown fox", "minimum_should_match": "30%" } } }, "should": { "match_phrase": { "title": { "query": "quick brown fox", "slop": 50 } } } } } }
性能优化:
### 用重评分缩小窗口 优化使用邻近度提高相关度】 GET /my_index/my_type/_search { "query": { "match": { "title": { "query": "quick brown fox", "minimum_should_match": "30%" } } }, "rescore": { "window_size": 50, "query": { "rescore_query": { "match_phrase": { "title": { "query": "quick brown fox", "slop": 50 } } } } } }
寻找相关词:
PUT /my_index { "settings": { "number_of_shards": 1, "analysis": { "filter": { "my_shingle_filter": { "type": "shingle", "min_shingle_size": 2, "max_shingle_size": 2, "output_unigrams": false } }, "analyzer": { "my_shingle_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "my_shingle_filter" ] } } } } } ### 测试分析器 GET /my_index/_analyze?analyzer=my_shingle_analyzer Sue ate the alligator ### 多字段使用示例 PUT /my_index/_mapping/my_type { "my_type": { "properties": { "title": { "type": "string", "fields": { "shingles": { "type": "string", "analyzer": "my_shingle_analyzer" } } } } } }
邮编与结构化数据:
PUT /my_index { "mappings": { "address": { "properties": { "postcode": { "type": "string", "index": "not_analyzed" } } } } }
prefix前缀查询:
GET /my_index/address/_search { "query": { "prefix": { "postcode": "W1" } } }
通配符与正则表达式查询:
GET /my_index/address/_search { "query": { "wildcard": { "postcode": "W?F*HW" } } } GET /my_index/address/_search { "query": { "regexp": { "postcode": "W[0-9].+" } } }
查询时输入即搜索:
{ "match_phrase_prefix" : { "brand" : "johnnie walker bl" } } { "match_phrase_prefix" : { "brand" : { "query": "walker johnnie bl", "slop": 10 } } } { "match_phrase_prefix" : { "brand" : { "query": "johnnie walker bl", "max_expansions": 50 } } }
索引时输入即搜索:
PUT / my_index { "settings": { "number_of_shards": 1, "analysis": { "filter": { "autocomplete_filter": { "type": "edge_ngram", "min_gram": 1, "max_gram": 20 } }, "analyzer": { "autocomplete": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "autocomplete_filter" ] } } } } } ### 应用分析器 PUT / my_index / _mapping / my_type { "my_type": { "properties": { "name": { "type": "string", "analyzer": "autocomplete" } } } } ### 查询 GET / my_index / my_type / _search { "query": { "match": { "name": "brown fo" } } } ### 查询时设置分析器 GET / my_index / my_type / _search { "query": { "match": { "name": { "query": "brown fo", "analyzer": "standard" } } } } ### 映射时设置索引和查询分析器 PUT / my_index / my_type / _mapping { "my_type": { "properties": { "name": { "type": "string", "index_analyzer": "autocomplete", "search_analyzer": "standard" } } } } ### 边界 n-grams 与邮编 { "analysis": { "filter": { "postcode_filter": { "type": "edge_ngram", "min_gram": 1, "max_gram": 8 } }, "analyzer": { "postcode_index": { "tokenizer": "keyword", "filter": ["postcode_filter"] }, "postcode_search": { "tokenizer": "keyword" } } } }
相关度评分背后的理论:
### 禁用词频统计 PUT /my_index { "mappings": { "doc": { "properties": { "text": { "type": "string", "index_options": "docs" } } } } } ### 禁用归一值 PUT /my_index { "mappings": { "doc": { "properties": { "text": { "type": "string", "norms": { "enabled": false } } } } } }
Lucene的实用评分函数:
### 禁用协调因子 GET /_search { "query": { "bool": { "disable_coord": true, "should": [ { "term": { "text": "jump" }}, { "term": { "text": "hop" }}, { "term": { "text": "leap" }} ] } } }
查询时权重提升:
GET /_search { "query": { "bool": { "should": [ { "match": { "title": { "query": "quick brown fox", "boost": 2 ① } } }, { "match": { ② "content": "quick brown fox" } } ] } } } ### 提升索引权重 GET /docs_2014_*/_search ① { "indices_boost": { ② "docs_2014_10": 3, "docs_2014_09": 2 }, "query": { "match": { "text": "quick brown fox" } } }
使用查询结构修改相关度:
### quick OR brown OR red OR fox GET /_search { "query": { "bool": { "should": [ { "term": { "text": "quick" }}, { "term": { "text": "brown" }}, { "term": { "text": "red" }}, { "term": { "text": "fox" }} ] } } } ### quick OR (brown OR red) OR fox GET /_search { "query": { "bool": { "should": [ { "term": { "text": "quick" }}, { "term": { "text": "fox" }}, { "bool": { "should": [ { "term": { "text": "brown" }}, { "term": { "text": "red" }} ] } } ] } } }
Not Quite Not:
### boosting 查询 GET /_search { "query": { "boosting": { "positive": { "match": { "text": "apple" } }, "negative": { "match": { "text": "pie tart fruit crumble tree" } }, "negative_boost": 0.5 } } }
忽略 TF/IDF:
### constant_score 查询 GET /_search { "query": { "bool": { "should": [{ "constant_score": { "query": { "match": { "description": "wifi" } } } }, { "constant_score": { "query": { "match": { "description": "garden" } } } }, { "constant_score": { "boost": 2 "query": { "match": { "description": "pool" } } } } ] } } }
按受欢迎度提升权重:
### 将点赞数与全文相关度评分结合 ### new_score = old_score * number_of_votes GET / blogposts / post / _search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": ["title", "content"] } }, "field_value_factor": { "field": "votes" } } } } ### modifier ### new_score = old_score * log(1 + number_of_votes) GET / blogposts / post / _search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": ["title", "content"] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" } } } } ### factor ### new_score = old_score * log(1 + factor * number_of_votes) GET / blogposts / post / _search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": ["title", "content"] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 2 } } } } ### boost_mode ### multiply:评分 _score 与函数值的积(默认) ### sum:评分 _score 与函数值的和 ### min:评分 _score 与函数值间的较小值 ### max:评分 _score 与函数值间的较大值 ### replace:函数值替代评分 _score GET / blogposts / post / _search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": ["title", "content"] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 0.1 }, "boost_mode": "sum" } } } ### max_boost ### 无论 field_value_factor 函数的结果如何,最终结果都不会大于 1.5 ### max_boost 只对函数的结果进行限制,不会对最终评分 _score 产生直接影响 GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 0.1 }, "boost_mode": "sum", "max_boost": 1.5 } } }
过滤集提升权重:
### multiply:函数结果求积(默认)。 ### sum:函数结果求和。 ### avg:函数结果的平均值。 ### max:函数结果的最大值。 ### min:函数结果的最小值。 ### first:使用首个函数(可以有过滤器,也可能没有)的结果作为最终结果 GET /_search { "query": { "function_score": { "filter": { "term": { "city": "Barcelona" } }, "functions": [ { "filter": { "term": { "features": "wifi" }}, "weight": 1 }, { "filter": { "term": { "features": "garden" }}, "weight": 1 }, { "filter": { "term": { "features": "pool" }}, "weight": 2 } ], "score_mode": "sum", } } }
随机评分:
### random_score 函数会输出一个 0 到 1 之间的数,当种子 seed 值相同时,生成的随机结果是一致的 ### 当然,如果增加了与查询匹配的新文档,无论是否使用一致随机,其结果顺序都会发生变化 GET /_search { "query": { "function_score": { "filter": { "term": { "city": "Barcelona" } }, "functions": [ { "filter": { "term": { "features": "wifi" }}, "weight": 1 }, { "filter": { "term": { "features": "garden" }}, "weight": 1 }, { "filter": { "term": { "features": "pool" }}, "weight": 2 }, { "random_score": { "seed": "the users session id" } } ], "score_mode": "sum" } } }
越近越好:
### 支持linear 、 exp 和 gauss (线性、指数和高斯) ### origin:中心点 或字段可能的最佳值,落在原点 origin 上的文档评分 _score 为满分 1.0 。 ### scale:衰减率,即一个文档从原点 origin 下落时,评分 _score 改变的速度。(例如,每 £10 欧元或每 100 米)。 ### decay:从原点 origin 衰减到 scale 所得的评分 _score ,默认值为 0.5 。 ### offset:以原点 origin 为中心点,为其设置一个非零的偏移量 offset 覆盖一个范围,而不只是单个原点。在范围 -offset <= origin <= +offset 内的所有评分 _score 都是 1.0 。 GET /_search { "query": { "function_score": { "functions": [ { "gauss": { "location": { ① "origin": { "lat": 51.5, "lon": 0.12 }, "offset": "2km", "scale": "3km" } } }, { "gauss": { "price": { ② "origin": "50", ③ "offset": "50", "scale": "20" } }, "weight": 2 ④ } ] } } }
脚本评分:
GET /_search { "function_score": { "functions": [ { ...location clause... }, { ...price clause... }, { "script_score": { "params": { ② "threshold": 80, "discount": 0.1, "target": 10 }, "script": "price = doc['price'].value; margin = doc['margin'].value; if (price < threshold) { return price * margin / target }; return price * (1 - discount) * margin / target;" } } ] } }
更改相似度:
## 相似度算法可以按字段指定,只需在映射中为不同字段选定即可 PUT /my_index { "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "BM25" ① }, "body": { "type": "string", "similarity": "default" ② } } } } ### 配置 BM25 PUT /my_index { "settings": { "similarity": { "my_bm25": { ① "type": "BM25", "b": 0 ② } } }, "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "my_bm25" ③ }, "body": { "type": "string", "similarity": "BM25" ④ } } } } }