elasticsearch-模糊查询

前缀搜索：prefix

概念：以xx开头的搜索，不计算相关度评分。

##### 注意：

- 前缀搜索匹配的是term，而不是field。
- 前缀搜索的性能很差
- 前缀搜索没有缓存
- 前缀搜索尽可能把前缀长度设置的更长

语法：

   GET <index>/_search
   {
     "query": {
       "prefix": {
         "<field>": {
           "value": "<word_prefix>"
         }
       }
     }
   }
   index_prefixes: 默认   "min_chars" : 2,   "max_chars" : 5

#prefix: 前缀搜索

#prefix: 前缀搜索
DELETE my_index
# elasticsearch stack
# elasticsearch search
# el
# ela 
# elas elasticsearch
PUT my_index
{
  "mappings": {
    "properties": {
      "text": {
        "analyzer": "ik_max_word",
        "type": "text",
        "index_prefixes":{
          "min_chars":2,
          "max_chars":4
        },
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}
GET my_index/_mapping
POST /my_index/_bulk?filter_path=items.*.error
{"index":{"_id":"1"}}
{"text":"城管打电话喊商贩去摆摊摊"}
{"index":{"_id":"2"}}
{"text":"笑果文化回应商贩老农去摆摊"}
{"index":{"_id":"3"}}
{"text":"老农耗时17年种出椅子树"}
{"index":{"_id":"4"}}
{"text":"夫妻结婚30多年AA制,被城管抓"}
{"index":{"_id":"5"}}
{"text":"黑人见义勇为阻止抢劫反被铐住"}
GET my_index/_search
GET my_index/_mapping
GET _analyze
{
  "text": ["夫妻结婚30多年AA制,被城管抓"]
}
GET my_index/_search
{
  "query": {
    "prefix": {
      "text": {
        "value": "城管"
      }
    }
  }
}

通配符：wildcard

##### 概念：通配符运算符是匹配一个或多个字符的占位符。例如，*通配符运算符匹配零个或多个字符。您可以将通配符运算符与其他字符结合使用以创建通配符模式。

##### 注意：

- 通配符匹配的也是term，而不是field

   GET <index>/_search
   {
     "query": {
       "wildcard": {
         "<field>": {
           "value": "<word_with_wildcard>"
         }
       }
     }
   }

DELETE my_index
POST /my_index/_bulk
{ "index": { "_id": "1"} }
{ "text": "my english" }
{ "index": { "_id": "2"} }
{ "text": "my english is good" }
{ "index": { "_id": "3"} }
{ "text": "my chinese is good" }
{ "index": { "_id": "4"} }
{ "text": "my japanese is nice" }
{ "index": { "_id": "5"} }
{ "text": "my disk is full" }
DELETE product_en
POST /product_en/_bulk
{ "index": { "_id": "1"} }
{ "title": "my english","desc" :  "shouji zhong de zhandouji","price" :  3999, "tags": [ "xingjiabi", "fashao", "buka", "1"]}
{ "index": { "_id": "2"} }
{ "title": "xiaomi nfc phone","desc" :  "zhichi quangongneng nfc,shouji zhong de jianjiji","price" :  4999, "tags": [ "xingjiabi", "fashao", "gongjiaoka" , "asd2fgas"]}
{ "index": { "_id": "3"} }
{ "title": "nfc phone","desc" :  "shouji zhong de hongzhaji","price" :  2999, "tags": [ "xingjiabi", "fashao", "menjinka" , "as345"]}
{ "title": { "_id": "4"} }
{ "text": "xiaomi erji","desc" :  "erji zhong de huangmenji","price" :  999, "tags": [ "low", "bufangshui", "yinzhicha", "4dsg" ]}
{ "index": { "_id": "5"} }
{ "title": "hongmi erji","desc" :  "erji zhong de kendeji","price" :  399, "tags": [ "lowbee", "xuhangduan", "zhiliangx" , "sdg5"]}
GET my_index/_search
GET product_en/_search

GET my_index/_search
{
  "query": {
    "wildcard": {
      "text.keyword": {
        "value": "my eng*ish"
      }
    }
  }
}
GET product_en/_mapping
#exact value
GET product_en/_search
{
  "query": {
    "wildcard": {
      "tags.keyword": {
        "value": "men*inka"
      }
    }
  }
}

正则：regexp

概念：regexp查询的性能可以根据提供的正则表达式而有所不同。为了提高性能，应避免使用通配符模式，如.*或 .*?+未经前缀或后缀

   GET <index>/_search
   {
     "query": {
       "regexp": {
         "<field>": {
           "value": "<regex>",
           "flags": "ALL",
         }
       }
     }
   }

#正则
GET product_en/_search
GET product_en/_search
{
  "query": {
    "regexp": {
      "title": "[\\s\\S]*nfc[\\s\\S]*"
    }
  }
}
GET product_en/_search
GET product_en/_search
{
  "query": {
    "regexp": {
      "desc": {
        "value": "zh~dng",
        "flags": "COMPLEMENT"
      }
    }
  }
}
GET product_en/_search
{
  "query": {
    "regexp": {
      "tags.keyword": {
        "value": ".*<2-3>.*",
        "flags": "INTERVAL"
      }
    }
  }
}

flags

ALL

启用所有可选操作符。
COMPLEMENT

启用~~操作符。可以使用~~对下面最短的模式进行否定。例如

a~bc # matches 'adc' and 'aec' but not 'abc'
INTERVAL

启用<>操作符。可以使用<>匹配数值范围。例如

foo<1-100> # matches 'foo1', 'foo2' ... 'foo99', 'foo100'

foo<01-100> # matches 'foo01', 'foo02' ... 'foo99', 'foo100'
INTERSECTION

启用&操作符，它充当AND操作符。如果左边和右边的模式都匹配，则匹配成功。例如:

aaa.+&.+bbb # matches 'aaabbb'
ANYSTRING

启用@操作符。您可以使用@来匹配任何整个字符串。您可以将@操作符与&和~操作符组合起来，创建一个“everything except”逻辑。例如:

@&~(abc.+) # matches everything except terms beginning with 'abc'

模糊查询：fuzzy

混淆字符 (box → fox) 缺少字符 (black → lack)

多出字符 (sic → sick) 颠倒次序 (act → cat)

语法

 GET <index>/_search
 {
   "query": {
     "fuzzy": {
       "<field>": {
         "value": "<keyword>"
       }
     }
   }
 }

参数：

value：（必须，关键词）
fuzziness：编辑距离，（0，1，2）并非越大越好，召回率高但结果不准确

1) 两段文本之间的Damerau-Levenshtein距离是使一个字符串与另一个字符串匹配所需的插入、删除、替换和调换的数量

2) 距离公式：Levenshtein是lucene的，es改进版：Damerau-Levenshtein，

axe=>aex Levenshtein=2 Damerau-Levenshtein=1
transpositions：（可选，布尔值）指示编辑是否包括两个相邻字符的变位（ab→ba）。默认为true。

# fuzzy:模糊查询
GET product_en/_search
GET product_en/_search
{
  "query": {
    "fuzzy": {
      "desc": {
        "value": "quangongneng nfc",
        "fuzziness": "2"
      }
    }
  }
}

GET product_en/_search
{
  "query": {
    "match": {
      "desc": {
        "query": "nfe quasdasdasdasd",
        "fuzziness": 1
      }
    }
  }
}

短语前缀：match_phrase_prefix

match_phrase：

match_phrase会分词
被检索字段必须包含match_phrase中的所有词项并且顺序必须是相同的
被检索字段包含的match_phrase中的词项之间不能有其他词项

概念：

match_phrase_prefix与match_phrase相同,但是它多了一个特性,就是它允许在文本的最后一个词项(term)上的前缀匹配,如果是一个单词,比如a,它会匹配文档字段所有以a开头的文档,如果是一个短语,比如 "this is ma" ,他会先在倒排索引中做以ma做前缀搜索,然后在匹配到的doc中做match_phrase查询,(网上有的说是先match_phrase,然后再进行前缀搜索, 是不对的)

参数
analyzer 指定何种分析器来对该短语进行分词处理
max_expansions 限制匹配的最大词项
boost 用于设置该查询的权重
slop 允许短语间的词项(term)间隔：slop 参数告诉 match_phrase 查询词条相隔多远时仍然能将文档视为匹配什么是相隔多远？意思是说为了让查询和文档匹配你需要移动词条多少次？

原理解析：https://www.elastic.co/cn/blog/found-fuzzy-search#performance-considerations

#####################################
# match_phrase_prefix
GET product_en/_search
{
  "query": {
    "match_phrase": {
      "desc": "shouji zhong de"
    }
  }
}

GET product_en/_search
{
  "query": {
    "match_phrase_prefix": {
      "desc": {
        "query": "de zhong shouji hongzhaji",
        "max_expansions": 50,
        "slop":3
      }
    }
  }
}


GET product_en/_search
{
  "query": {
    "match_phrase_prefix": {
      "desc": {
        "query": "zhong hongzhaji",
        "max_expansions": 50,
        "slop": 3
      }
    }
  }
}


# source: zhong de hongzhaji
# query:  zhong >  hongzhaji

# source: shouji zhong de hongzhaji
# query:  de zhong shouji hongzhaji

# de shouji/zhong  hongzhaji  1次
# shouji/de zhong  hongzhaji  2次
# shouji zhong/de  hongzhaji  3次
# shouji zhong de  hongzhaji  4次

N-gram（中缀和后缀）和edge ngram（前缀）

tokenizer

GET _analyze
 {
   "tokenizer": "ngram",
   "text": "reba always loves me"
 }

token filter

GET _analyze
 {
   "tokenizer": "ik_max_word",
   "filter": [ "ngram" ],
   "text": "reba always loves me"
 }

min_gram：创建索引所拆分字符的最小阈值

max_gram：创建索引所拆分字符的最大阈值

ngram：从每一个字符开始,按照步长,进行分词,适合前缀中缀检索（耗资源，性能差一般不用）

edge_ngram：从第一个字符开始,按照步长,进行分词,适合前缀匹配场景（用的比较多，性能比match_phrase_prefix好）

#############################################
# ngram 和 edge-ngram
#ngram   min_gram =1   "max_gram": 2

GET _analyze
{
  "tokenizer": "ik_max_word",
  "filter": [ "edge_ngram" ],
  "text": "reba always loves me"
}

#min_gram =1   "max_gram": 1
#r a l m

#min_gram =1   "max_gram": 2
#r a l m
#re al lo me

#min_gram =2   "max_gram": 3
#re al lo me
#reb alw lov me



PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "2_3_edge_ngram": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 3
        }
      },
      "analyzer": {
        "my_edge_ngram": {
          "type":"custom",
          "tokenizer": "standard",
          "filter": [ "2_3_edge_ngram" ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer":"my_edge_ngram",
        "search_analyzer": "standard"
      }
    }
  }
}
GET /my_index/_mapping


POST /my_index/_bulk
{ "index": { "_id": "1"} }
{ "text": "my english" }
{ "index": { "_id": "2"} }
{ "text": "my english is good" }
{ "index": { "_id": "3"} }
{ "text": "my chinese is good" }
{ "index": { "_id": "4"} }
{ "text": "my japanese is nice" }
{ "index": { "_id": "5"} }
{ "text": "my disk is full" }


GET /my_index/_search
GET /my_index/_mapping
GET /my_index/_search
{
  "query": {
    "match_phrase": {
      "text": "my eng is goo"
    }
  }
}



PUT my_index2
{
  "settings": {
    "analysis": {
      "filter": {
        "2_3_grams": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 3
        }
      },
      "analyzer": {
        "my_edge_ngram": {
          "type":"custom",
          "tokenizer": "standard",
          "filter": [ "2_3_grams" ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer":"my_edge_ngram",
        "search_analyzer": "standard"
      }
    }
  }
}
GET /my_index2/_mapping
POST /my_index2/_bulk
{ "index": { "_id": "1"} }
{ "text": "my english" }
{ "index": { "_id": "2"} }
{ "text": "my english is good" }
{ "index": { "_id": "3"} }
{ "text": "my chinese is good" }
{ "index": { "_id": "4"} }
{ "text": "my japanese is nice" }
{ "index": { "_id": "5"} }
{ "text": "my disk is full" }

GET /my_index2/_search
{
  "query": {
    "match_phrase": {
      "text": "my eng is goo"
    }
  }
}

GET _analyze
{
  "tokenizer": "ik_max_word",
  "filter": [ "ngram" ],
  "text": "用心做皮肤,用脚做游戏"
}

## 搜索推荐：Suggest

概述

搜索一般都会要求具有“搜索推荐”或者叫“搜索补全”的功能，即在用户输入搜索的过程中，进行自动补全或者纠错。以此来提高搜索文档的匹配精准度，进而提升用户的搜索体验，这就是Suggest。

四种`Suggester`

term suggester：term suggester正如其名，只基于tokenizer之后的单个term去匹配建议词，并不会考虑多个term之间的关系

POST <index>/_search
{ 
  "suggest": {
    "<suggest_name>": {
      "text": "<search_content>",
      "term": {
        "suggest_mode": "<suggest_mode>",
        "field": "<field_name>"
      }
    }
  }
}

Options：
- text：用户搜索的文本
- field：要从哪个字段选取推荐数据
- analyzer：使用哪种分词器
- size：每个建议返回的最大结果数
- sort：如何按照提示词项排序，参数值只可以是以下两个枚举：
  - score：分数>词频>词项本身
  - frequency：词频>分数>词项本身
- suggest_mode：搜索推荐的推荐模式，参数值亦是枚举：
  - missing：默认值，仅为不在索引中的词项生成建议词
  - popular：仅返回与搜索词文档词频或文档词频更高的建议词
  - always：根据建议文本中的词项推荐任何匹配的建议词
- max_edits：可以具有最大偏移距离候选建议以便被认为是建议。只能是1到2之间的值。任何其他值都将导致引发错误的请求错误。默认为2
- prefix_length：前缀匹配的时候，必须满足的最少字符
- min_word_length：最少包含的单词数量
- min_doc_freq：最少的文档频率
- max_term_freq：最大的词频
phrase suggester：phrase suggester和term suggester相比，对建议的文本会参考上下文，也就是一个句子的其他token，不只是单纯的token距离匹配，它可以基于共生和频率选出更好的建议。

Options：
- real_word_error_likelihood：此选项的默认值为 0.95。此选项告诉 Elasticsearch 索引中 5% 的术语拼写错误。这意味着随着这个参数的值越来越低，Elasticsearch 会将越来越多存在于索引中的术语视为拼写错误，即使它们是正确的
- max_errors：为了形成更正，最多被认为是拼写错误的术语的最大百分比。默认值为 1
- confidence：默认值为 1.0，最大值也是。该值充当与建议分数相关的阈值。只有得分超过此值的建议才会显示。例如，置信度为 1.0 只会返回得分高于输入短语的建议
- collate：告诉 Elasticsearch 根据指定的查询检查每个建议，以修剪索引中不存在匹配文档的建议。在这种情况下，它是一个匹配查询。由于此查询是模板查询，因此搜索查询是当前建议，位于查询中的参数下。可以在查询下的“params”对象中添加更多字段。同样，当参数“prune”设置为true时，我们将在响应中增加一个字段“collate_match”，指示建议结果中是否存在所有更正关键字的匹配
- direct_generator：phrase suggester使用候选生成器生成给定文本中每个项可能的项的列表。单个候选生成器类似于为文本中的每个单独的调用term suggester。生成器的输出随后与建议候选项中的候选项结合打分。目前只支持一种候选生成器，即direct_generator。建议API接受密钥直接生成器下的生成器列表；列表中的每个生成器都按原始文本中的每个项调用。
completion suggester：自动补全，自动完成，支持三种查询【前缀查询（prefix）模糊查询（fuzzy）正则表达式查询（regex)】，主要针对的应用场景就是"Auto Completion"。此场景下用户每输入一个字符的时候，就需要即时发送一次查询请求到后端查找匹配项，在用户输入速度较高的情况下对后端响应速度要求比较苛刻。因此实现上它和前面两个Suggester采用了不同的数据结构，索引并非通过倒排来完成，而是将analyze过的数据编码成FST和索引一起存放。对于一个open状态的索引，FST会被ES整个装载到内存里的，进行前缀查找速度极快。但是FST只能用于前缀查找，这也是Completion Suggester的局限所在。
- completion：es的一种特有类型，专门为suggest提供，基于内存，性能很高。
- prefix query：基于前缀查询的搜索提示，是最常用的一种搜索推荐查询。
  - prefix：客户端搜索词
  - field：建议词字段
  - size：需要返回的建议词数量（默认5）
  - skip_duplicates：是否过滤掉重复建议，默认false
- fuzzy query
  - fuzziness：允许的偏移量，默认auto
  - transpositions：如果设置为true，则换位计为一次更改而不是两次更改，默认为true。
  - min_length：返回模糊建议之前的最小输入长度，默认 3
  - prefix_length：输入的最小长度（不检查模糊替代项）默认为 1
  - unicode_aware：如果为true，则所有度量（如模糊编辑距离，换位和长度）均以Unicode代码点而不是以字节为单位。这比原始字节略慢，因此默认情况下将其设置为false。
- regex query：可以用正则表示前缀，不建议使用
context suggester：完成建议者会考虑索引中的所有文档，但是通常来说，我们在进行智能推荐的时候最好通过某些条件过滤，并且有可能会针对某些特性提升权重。
- contexts：上下文对象，可以定义多个
  - name：context的名字，用于区分同一个索引中不同的context对象。需要在查询的时候指定当前name
  - type：context对象的类型，目前支持两种：category和geo，分别用于对suggest item分类和指定地理位置。
  - boost：权重值，用于提升排名
- path：如果没有path，相当于在PUT数据的时候需要指定context.name字段，如果在Mapping中指定了path，在PUT数据的时候就不需要了，因为 Mapping是一次性的，而PUT数据是频繁操作，这样就简化了代码。这段解释有木有很牛逼，网上搜到的都是官方文档的翻译，觉悟雷同。

```json
#term suggest

DELETE news
POST _bulk
{ "index" : { "_index" : "news","_id":1 } }
{ "title": "baoqiang bought a new hat with the same color of this font, which is very beautiful baoqiangba baoqiangda baoqiangdada baoqian baoqia"}
{ "index" : { "_index" : "news","_id":2 } }
{ "title": "baoqiangge gave birth to two children, one is upstairs, one is downstairs baoqiangba baoqiangda baoqiangdada baoqian baoqia"}
{ "index" : { "_index" : "news","_id":3} }
{ "title": "baoqiangge 's money was rolled away baoqiangba baoqiangda baoqiangdada baoqian baoqia"}
{ "index" : { "_index" : "news","_id":4} }
{ "title": "baoqiangda baoqiangda baoqiangda baoqiangda baoqiangda baoqian baoqia"}

GET news/_mapping

POST _analyze
{
  "text": [
    "BaoQiang bought a new hat with the same color of this font, which is very beautiful",
    "BaoQiangGe gave birth to two children, one is upstairs, one is downstairs",
    "BaoQiangGe 's money was rolled away"
  ]
}

POST /news/_search
{
  "suggest": {
    "my-suggestion": {
      "text": "baoqing baoqiang",
      "term": {
        "suggest_mode":"always",
        "field": "title",
        "min_doc_freq": 3
      }
    }
  }
}


GET /news/_search
{ 
  "suggest": {
    "my-suggestion": {
      "text": "baoqing baoqiang",
      "term": {
        "suggest_mode": "popular",
        "field": "title"
      }
    }
  }
}

GET /news/_search
{ 
  "suggest": {
    "my-suggestion": {
      "text": "baoqing baoqiang",
      "term": {
        "suggest_mode": "popular",
        "field": "title",
        "max_edits":2,
        "max_term_freq":1
      }
    }
  }
}

GET /news/_search
{ 
  "suggest": {
    "my-suggestion": {
      "text": "baoqing baoqiang",
      "term": {
        "suggest_mode": "always",
        "field": "title",
        "max_edits":2
      }
    }
  }
}

DELETE news2
POST _bulk
{ "index" : { "_index" : "news2","_id":1 } }
{ "title": "baoqiang4"}
{ "index" : { "_index" : "news2","_id":2 } }
{ "title": "baoqiang4 baoqiang3"}
{ "index" : { "_index" : "news2","_id":3 } }
{ "title": "baoqiang4 baoqiang3 baoqiang2"}
{ "index" : { "_index" : "news2","_id":4 } }
{ "title": "baoqiang4 baoqiang3 baoqiang2  baoqiang"}
POST /news2/_search
{ 
  "suggest": {
    "second-suggestion": {
      "text": "baoqian baoqiang baoqiang2 baoqiang3",
      "term": {
        "suggest_mode": "popular",
        "field": "title"
      }
    }
  }
}



#phrase suggester
DELETE test
PUT test
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "analysis": {
        "analyzer": {
          "trigram": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "shingle"
            ]
          }
        },
        "filter": {
          "shingle": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 3
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "trigram": {
            "type": "text",
            "analyzer": "trigram"
          }
        }
      }
    }
  }
}

GET /_analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "shingle",
      "min_shingle_size": 2,
      "max_shingle_size": 3
    }
  ],
  "text": "lucene and elasticsearch"
}


# "min_shingle_size": 2,
# "max_shingle_size": 3
GET test/_analyze
{
  "analyzer": "trigram", 
  "text" : "lucene and elasticsearch"
}
DELETE test
POST test/_bulk
{ "index" : { "_id":1} }
{"title": "lucene and elasticsearch"}
{ "index" : {"_id":2} }
{"title": "lucene and elasticsearhc"}
{ "index" : { "_id":3} }
{"title": "luceen and elasticsearch"}

POST test/_search
GET test/_mapping
POST test/_search
{
  "suggest": {
    "text": "Luceen and elasticsearhc",
    "simple_phrase": {
      "phrase": {
        "field": "title.trigram",
        "max_errors": 2,
        "gram_size": 1,
        "confidence":0,
        "direct_generator": [
          {
            "field": "title.trigram",
            "suggest_mode": "always"
          }
        ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

#complate suggester
DELETE suggest_carinfo
PUT suggest_carinfo
{
  "mappings": {
    "properties": {
        "title": {
          "type": "text",
          "analyzer": "ik_max_word",
          "fields": {
            "suggest": {
              "type": "completion",
              "analyzer": "ik_max_word"
            }
          }
        },
        "content": {
          "type": "text",
          "analyzer": "ik_max_word"
        }
      }
  }
}



POST _bulk
{"index":{"_index":"suggest_carinfo","_id":1}}
{"title":"宝马X5 两万公里准新车","content":"这里是宝马X5图文描述"}
{"index":{"_index":"suggest_carinfo","_id":2}}
{"title":"宝马5系","content":"这里是奥迪A6图文描述"}
{"index":{"_index":"suggest_carinfo","_id":3}}
{"title":"宝马3系","content":"这里是奔驰图文描述"}
{"index":{"_index":"suggest_carinfo","_id":4}}
{"title":"奥迪Q5 两万公里准新车","content":"这里是宝马X5图文描述"}
{"index":{"_index":"suggest_carinfo","_id":5}}
{"title":"奥迪A6 无敌车况","content":"这里是奥迪A6图文描述"}
{"index":{"_index":"suggest_carinfo","_id":6}}
{"title":"奥迪双钻","content":"这里是奔驰图文描述"}
{"index":{"_index":"suggest_carinfo","_id":7}}
{"title":"奔驰AMG 两万公里准新车","content":"这里是宝马X5图文描述"}
{"index":{"_index":"suggest_carinfo","_id":8}}
{"title":"奔驰大G 无敌车况","content":"这里是奥迪A6图文描述"}
{"index":{"_index":"suggest_carinfo","_id":9}}
{"title":"奔驰C260","content":"这里是奔驰图文描述"}
{"index":{"_index":"suggest_carinfo","_id":10}}
{"title":"nir奔驰C260","content":"这里是奔驰图文描述"}


GET suggest_carinfo/_search?pretty
{
  "suggest": {
    "car_suggest": {
      "prefix": "奥迪",
      "completion": {
        "field": "title.suggest"
      }
    }
  }
}

#1：内存代价太大，原话是：性能高是通过大量的内存换来的
#2：只能前缀搜索,假如用户输入的不是前缀 召回率可能很低

POST suggest_carinfo/_search
{
  "suggest": {
    "car_suggest": {
      "prefix": "宝马5系",
      "completion": {
        "field": "title.suggest",
        "skip_duplicates":true,
        "fuzzy": {
          "fuzziness": 2
        }
      }
    }
  }
}
GET suggest_carinfo/_doc/10
GET _analyze
{
  "analyzer": "ik_max_word",
  "text": ["奔驰AMG 两万公里准新车"]
}

POST suggest_carinfo/_search
{
  "suggest": {
    "car_suggest": {
      "regex": "nir",
      "completion": {
        "field": "title.suggest",
        "size": 10
      }
    }
  }
}

# context suggester
# 定义一个名为 place_type 的类别上下文，其中类别必须与建议一起发送。
# 定义一个名为 location 的地理上下文，类别必须与建议一起发送
DELETE place
PUT place
{
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion",
        "contexts": [
          {
            "name": "place_type",
            "type": "category"
          },
          {
            "name": "location",
            "type": "geo",
            "precision": 4
          }
        ]
      }
    }
  }
}

PUT place/_doc/1
{
  "suggest": {
    "input": [ "timmy's", "starbucks", "dunkin donuts" ],
    "contexts": {
      "place_type": [ "cafe", "food" ]                    
    }
  }
}
PUT place/_doc/2
{
  "suggest": {
    "input": [ "monkey", "timmy's", "Lamborghini" ],
    "contexts": {
      "place_type": [ "money"]                    
    }
  }
}


GET place/_search
POST place/_search?pretty
{
  "suggest": {
    "place_suggestion": {
      "prefix": "sta",
      "completion": {
        "field": "suggest",
        "size": 10,
        "contexts": {
          "place_type": [ "cafe", "restaurants" ]
        }
      }
    }
  }
}
# 某些类别的建议可以比其他类别提升得更高。以下按类别过滤建议，并额外提升与某些类别相关的建议
GET place/_search
POST place/_search?pretty
{
  "suggest": {
    "place_suggestion": {
      "prefix": "tim",
      "completion": {
        "field": "suggest",
        "contexts": {
          "place_type": [                             
            { "context": "cafe" },
            { "context": "money", "boost": 2 }
          ]
        }
      }
    }
  }
}

# 地理位置筛选器
PUT place/_doc/3
{
  "suggest": {
    "input": "timmy's",
    "contexts": {
      "location": [
        {
          "lat": 43.6624803,
          "lon": -79.3863353
        },
        {
          "lat": 43.6624718,
          "lon": -79.3873227
        }
      ]
    }
  }
}
POST place/_search
{
  "suggest": {
    "place_suggestion": {
      "prefix": "tim",
      "completion": {
        "field": "suggest",
        "contexts": {
          "location": {
            "lat": 43.662,
            "lon": -79.380
          }
        }
      }
    }
  }
}



# 定义一个名为 place_type 的类别上下文，其中类别是从 cat 字段中读取的。
# 定义一个名为 location 的地理上下文，其中的类别是从 loc 字段中读取的
DELETE place_path_category
PUT place_path_category
{
  "mappings": {
    "properties": {
      "suggest": {
        "type": "completion",
        "contexts": [
          {
            "name": "place_type",
            "type": "category",
            "path": "cat"
          },
          {
            "name": "location",
            "type": "geo",
            "precision": 4,
            "path": "loc"
          }
        ]
      },
      "loc": {
        "type": "geo_point"
      }
    }
  }
}
# 如果映射有路径，那么以下索引请求就足以添加类别
# 这些建议将与咖啡馆和食品类别相关联
# 如果上下文映射引用另一个字段并且类别被明确索引，则建议将使用两组类别进行索引
PUT place_path_category/_doc/1
{
  "suggest": ["timmy's", "starbucks", "dunkin donuts"],
  "cat": ["cafe", "food"] 
}
POST place_path_category/_search?pretty
{
  "suggest": {
    "place_suggestion": {
      "prefix": "tim",
      "completion": {
        "field": "suggest",
        "contexts": {
          "place_type": [                             
            { "context": "cafe" }
          ]
        }
      }
    }
  }
}

```

posted on 2023-05-14 16:58 孙龙-程序员阅读(473) 评论(0) 编辑收藏举报

刷新页面返回顶部