elasticsearch—深入搜索

基于词项和基于全文的搜索

基于 Term 的查询（一般为了性能可以设置参数使其不打分）

关于 Term查询的例子

数据准备

DELETE products
PUT products
{
  "settings": {
    "number_of_shards": 1
  }
}


POST /products/_bulk
{ "index": { "_id": 1 }}
{ "productID" : "XHDK-A-1293-#fJ3","desc":"iPhone" }
{ "index": { "_id": 2 }}
{ "productID" : "KDKE-B-9947-#kL5","desc":"iPad" }
{ "index": { "_id": 3 }}
{ "productID" : "JODL-X-1937-#pV7","desc":"MBP" }

GET /products

分别打开注释执行

POST /products/_search
{
  "query": {
    "term": {
      "desc": {
        //"value": "iPhone"
        "value":"iphone"
      }
    }
  }
}

　我们输入数据的时候是大写，查询的时候使用大写查是查不出来的，stand 默认转小写　

多字段 Mapping 和 Term

使用 keyword 进行精确匹配

POST /products/_search
{
  "explain": true,
  "query": {
    "term": {
      "productID.keyword": {
        "value": "XHDK-A-1293-#fJ3"
      }
    }
  }
}

可以看到默认有计算打分的过程

复合查询constact score 转换为 Filter

POST /products/_search
{
  "explain": true,
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "productID.keyword": "XHDK-A-1293-#fJ3"
        }
      }
    }
  }
}

基于全文的查询

结构化搜索

结构化数据

Es中的结构化搜索

数据准备

#结构化搜索，精确匹配
DELETE products
POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }

GET products/_mapping



#对布尔值 match 查询，有算分
POST products/_search
{
  "profile": "true",
  "explain": true,
  "query": {
    "term": {
      "avaliable": true
    }
  }
}

View Code

布尔值

#对布尔值 match 查询，有算分
POST products/_search
{
  "profile": "true",
  "explain": true,
  "query": {
    "term": {
      "avaliable": true
    }
  }
}

#对布尔值，通过constant score 转成 filtering，没有算分
POST products/_search
{
  "profile": "true",
  "explain": true,
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "avaliable": true
        }
      }
    }
  }
}

数字

#数字类型
POST products/_search
{
  "profile": "true",
  "explain": true,
  "query": {
    "term": {
      "price": 30
    }
  }
}

#数字类型 查询集合
POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "terms": {
          "price": [
            "10",
            "30"
          ]
        }
      }
    }
  }
}

数字range　　

GET products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "price": {
            "gte": 20,
            "lte": 30
          }
        }
      }
    }
  }
}

日期range　　

查询大于3年前的数据

POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "date": {
            "gte": "now-3y"
          }
        }
      }
    }
  }
}

处理空值 exists查询有的数据中没有日期字段

POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "exists": {
          "field": "date"
        }
      }
    }
  }
}

查询日期不存在的数据

POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must_not": {
            "exists": {
              "field": "date"
            }
          }
        }
      }
    }
  }
}

处理多值字段 term 查询是包含，而不是等于

搜索的相关性算分　　

词频 TF

通过 explain API 查看 TF

数据准备

PUT testscore
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text"
      }
    }
  }
}

PUT testscore/_bulk
{ "index": { "_id": 1 }}
{ "content":"we use Elasticsearch to power the search" }
{ "index": { "_id": 2 }}
{ "content":"we like elasticsearch" }
{ "index": { "_id": 3 }}
{ "content":"The scoring of documents is caculated by the scoring formula" }
{ "index": { "_id": 4 }}
{ "content":"you know, for search" }

查询es

POST /testscore/_search
{
  //"explain": true,
  "query": {
    "match": {
      //"content":"you"
      "content": "elasticsearch"
      //"content":"the"
      //"content": "the elasticsearch"
    }
  }
}

发现 2 排在3的前面是因为同样出现一次，2的长度小于3，分数高

boosting relevance

POST testscore/_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "content": "elasticsearch"
        }
      },
      "negative": {
        "term": {
          "content": "like"
        }
      },
      "negative_boost": 0.2
    }
  }
}

Query & Filtering 与多字符串多字段查询

条件组合

bool 查询

bool 查询语法

POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }

#基本语法
POST /products/_search
{
  "query": {
    "bool": {
      "must": {
        "term": {
          "price": "30"
        }
      },
      "filter": {
        "term": {
          "avaliable": "true"
        }
      },
      "must_not": {
        "range": {
          "price": {
            "lte": 10
          }
        }
      },
      "should": [
        {
          "term": {
            "productID.keyword": "JODL-X-1937-#pV7"
          }
        },
        {
          "term": {
            "productID.keyword": "XHDK-A-1293-#fJ3"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

View Code

如何解决结构化数据包含而不是相等的问题

增加 genre count 字段使用bool处理

POST /newmovies/_bulk
{ "index": { "_id": 1 }}
{ "title" : "Father of the Bridge Part II","year":1995, "genre":"Comedy","genre_count":1 }
{ "index": { "_id": 2 }}
{ "title" : "Dave","year":1993,"genre":["Comedy","Romance"],"genre_count":2 }

#must，有算分
POST /newmovies/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {"genre.keyword": {"value": "Comedy"}}},
        {"term": {"genre_count": {"value": 1}}}

      ]
    }
  }
}

#Filter。不参与算分，结果的score是0
POST /newmovies/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term": {"genre.keyword": {"value": "Comedy"}}},
        {"term": {"genre_count": {"value": 1}}}
        ]

    }
  }
}

View Code

#Filtering Context
POST _search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "avaliable": "true"
        }
      },
      "must_not": {
        "range": {
          "price": {
            "lte": 10
          }
        }
      }
    }
  }
}

View Code

#Query Context
POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }

POST /products/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "productID.keyword": {
              "value": "JODL-X-1937-#pV7"
            }
          }
        },
        {
          "term": {
            "avaliable": {
              "value": true
            }
          }
        }
      ]
    }
  }
}

View Code

bool嵌套

#嵌套，实现了 should not 逻辑
POST /products/_search
{
  "query": {
    "bool": {
      "must": {
        "term": {
          "price": "30"
        }
      },
      "should": [
        {
          "bool": {
            "must_not": {
              "term": {
                "avaliable": "false"
              }
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

View Code

控制字段的 boosting

在这里设置在 title 中包含的分数打分高优先显示

DELETE blogs
POST /blogs/_bulk
{ "index": { "_id": 1 }}
{"title":"Apple iPad", "content":"Apple iPad,Apple iPad" }
{ "index": { "_id": 2 }}
{"title":"Apple iPad,Apple iPad", "content":"Apple iPad" }

POST blogs/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {
          "title": {
            "query": "apple,ipad",
            "boost": 4
          }
        }},
        {"match": {
          "content": {
            "query": "apple,ipad",
            "boost":1
          }
        }}
      ]
    }
  }
}

View Code

查询包含苹果的（查全率尽可能返回多）

DELETE news
POST /news/_bulk
{ "index": { "_id": 1 }}
{ "content":"Apple Mac" }
{ "index": { "_id": 2 }}
{ "content":"Apple iPad" }
{ "index": { "_id": 3 }}
{ "content":"Apple employee like Apple Pie and Apple Juice" }


POST news/_search
{
  "query": {
    "bool": {
      "must": {
        "match":{"content":"apple"}
      }
    }
  }
}

View Code

查询包含苹果的不包含pie（查准率）

POST news/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "content": "apple"
        }
      },
      "must_not": {
        "match": {
          "content": "pie"
        }
      }
    }
  }
}

View Code

查询包含苹果的优先显示，pie的排在后面

POST news/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "content": "apple"
        }
      },
      "negative": {
        "match": {
          "content": "pie"
        }
      },
      "negative_boost": 0.5
    }
  }
}

View Code

单字符串多字段查询 Dis Max Query

单字符串查询

单字符串查询实例

PUT /blogs/_doc/1
{
  "title": "Quick brown rabbits",
  "body": "Brown rabbits are commonly seen."
}

PUT /blogs/_doc/2
{
  "title": "Keeping pets healthy",
  "body": "My quick brown fox eats rabbits on a regular basis."
}

POST /blogs/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "Brown fox"
          }
        },
        {
          "match": {
            "body": "Brown fox"
          }
        }
      ]
    }
  }
}

View Code

可以看到2包括所有的 term但算分低于1，主要因为按照titele 和 body的算分相加，最高的返回

算分过程

Dis Max Query 查询

POST blogs/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "Quick pets"
          }
        },
        {
          "match": {
            "body": "Quick pets"
          }
        }
      ]
    }
  }
}

View Code

最佳字段调优

通过 tie_breaker 参数调整

POST blogs/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "Quick pets"
          }
        },
        {
          "match": {
            "body": "Quick pets"
          }
        }
      ],
      "tie_breaker": 0.2
    }
  }
}

View Code

Multi Match Query

POST blogs/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "Quick pets"
          }
        },
        {
          "match": {
            "body": "Quick pets"
          }
        }
      ],
      "tie_breaker": 0.2
    }
  }
}

POST blogs/_search
{
  "query": {
    "multi_match": {
      "type": "best_fields",
      "query": "Quick pets",
      "fields": [
        "title",
        "body"
      ],
      "tie_breaker": 0.2,
      "minimum_should_match": "20%"
    }
  }
}

View Code

因为默认的stand按词切分，match 查询的是 barking or dogs 所以1打的分数高

PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }


GET titles/_search
{
  "query": {
    "match": {
      "title": "barking dogs"
    }
  }
}

View Code

提高字段的权重控制搜索结果的返回

GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title^10", "title.std" ]
        }
    }
}

View Code

使用多数字段匹配解决

english分词器会尽可能多地匹配， standard分词不会对词干进行任何提取

DELETE /titles
PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english",
        "fields": {
          "std": {
            "type": "text",
            "analyzer": "standard"
          }
        }
      }
    }
  }
}

POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }

GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title", "title.std" ]
        }
    }
}

View Code

跨字段搜索

PUT address/_doc/1
{
  "street": "5 Poland Street",
  "city": "London",
  "country": "United Kingdom",
  "postcode": "W1V 3DG"
}

POST address/_search
{
  "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "type": "most_fields",
      "operator": "and",
      "fields": [
        "street",
        "city",
        "country",
        "postcode"
      ]
    }
  }
}

View Code

使用cross_fields

PUT address/_doc/1
{
  "street": "5 Poland Street",
  "city": "London",
  "country": "United Kingdom",
  "postcode": "W1V 3DG"
}


POST address/_search
{
  "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "type": "cross_fields",
      "operator": "and",
      "fields": [
        "street",
        "city",
        "country",
        "postcode"
      ]
    }
  }
}

View Code

多语言及中文分词与检索

自言语言与查询 Recall

混合多语言的挑战

分词的挑战

中文分词现状

一些中文分词器

Hanlp分词器

面向生产环境的自言语言处理工具包

网址

https://www.hanlp.com/

安装

./elasticsearch-plugin install https://github.com/KennFalcon/elasticsearch-analysis-hanlp/releases/download/v7.1.0/elasticsearch-analysis-hanlp-7.1.0.zip

hanlp: hanlp默认分词

hanlp_standard: 标准分词

hanlp_index: 索引分词

hanlp_nlp: NLP分词

hanlp_n_short: N-最短路分词

hanlp_dijkstra: 最短路分词

hanlp_crf: CRF分词（在hanlp 1.6.6已开始废弃）

hanlp_speed: 极速词典分词

POST _analyze
{
  "analyzer": "hanlp_standard",
  "text": [
    "剑桥分析公司多位高管对卧底记者说，他们确保了唐纳德·特朗普在总统大选中获胜"
  ]
}

Ik 分词器

支持字典热更新

安装

./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.1.0/elasticsearch-analysis-ik-7.1.0.zip

ik_max_word 粗粒度分词

ik_smart 最小细度分词

pinyin分词器

安装

./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.1.0/elasticsearch-analysis-pinyin-7.1.0.zip

简单案例

PUT /artists/
{
  "settings": {
    "analysis": {
      "analyzer": {
        "user_name_analyzer": {
          "tokenizer": "whitespace",
          "filter": "pinyin_first_letter_and_full_pinyin_filter"
        }
      },
      "filter": {
        "pinyin_first_letter_and_full_pinyin_filter": {
          "type": "pinyin",
          "keep_first_letter": true,
          "keep_full_pinyin": false,
          "keep_none_chinese": true,
          "keep_original": false,
          "limit_first_letter_length": 16,
          "lowercase": true,
          "trim_whitespace": true,
          "keep_none_chinese_in_first_letter": true
        }
      }
    }
  }
}

GET /artists/_analyze
{
  "text": [
    "刘德华 张学友 郭富城 黎明 四大天王"
  ],
  "analyzer": "user_name_analyzer"
}

View Code

返回结果

{
  "tokens" : [
    {
      "token" : "ldh",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "zxy",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "gfc",
      "start_offset" : 8,
      "end_offset" : 11,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "lm",
      "start_offset" : 12,
      "end_offset" : 14,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "sdtw",
      "start_offset" : 15,
      "end_offset" : 19,
      "type" : "word",
      "position" : 4
    }
  ]
}

View Code

一个全文检索的例子

{
      "_source": ["title","overview"],
      "size":20,
      "query": {
          "multi_match": {
              "type": "most_fields",
              "query": "basketball with cartoon aliens",
              "fields": ["title","overview"]
          }
      },
      "highlight" : {
            "fields" : {
              "overview" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },
              "title" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] }
            }
        }
  }

使用 Search Template 和 Index Alias 查询

Search Template

搜素人员创建一个搜索模板

POST _scripts/tmdb
{
  "script": {
    "lang": "mustache",
    "source": {
      "_source": [
        "title",
        "overview"
      ],
      "size": 20,
      "query": {
        "multi_match": {
          "query": "{{q}}",
          "fields": [
            "title",
            "overview"
          ]
        }
      }
    }
  }
}

开发人员使用模板进行查询（搜素人员修改模板不管自己的是了）

GET _scripts/tmdb

POST tmdb/_search/template
{
    "id":"tmdb",
    "params": {
        "q": "basketball with cartoon aliens"
    }
}

index alias 实现零停机运维　　

为索引创建一个别名，通过别名读写数据

PUT movies-2019/_doc/1
{
  "name": "the matrix",
  "rating": 5
}

PUT movies-2019/_doc/2
{
  "name": "Speed",
  "rating": 3
}

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "movies-2019",
        "alias": "movies-latest"
      }
    }
  ]
}

POST movies-latest/_search
{
  "query": {
    "match_all": {}
  }
}

View Code

可以发现有两条数据

再次创建相同的索引别名会把以前的覆盖掉实现零停机

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "movies-2019",
        "alias": "movies-lastest-highrate",
        "filter": {
          "range": {
            "rating": {
              "gte": 4
            }
          }
        }
      }
    }
  ]
}

POST movies-lastest-highrate/_search
{
  "query": {
    "match_all": {}
  }
}

View Code

综合排序：Function Score Query 优化算分

算分与排序

field_value_factor （按照指定的字段作为算分）

DELETE blogs
PUT /blogs/_doc/1
{
  "title": "About popularity",
  "content": "In this post we will talk about...",
  "votes": 0
}

PUT /blogs/_doc/2
{
  "title": "About popularity",
  "content": "In this post we will talk about...",
  "votes": 100
}

PUT /blogs/_doc/3
{
  "title": "About popularity",
  "content": "In this post we will talk about...",
  "votes": 1000000
}


POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "popularity",
          "fields": [
            "title",
            "content"
          ]
        }
      },
      "field_value_factor": {
        "field": "votes"
      }
    }
  }
}

View Code

返回结果如下

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 133531.39,
    "hits" : [
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 133531.39,
        "_source" : {
          "title" : "About popularity",
          "content" : "In this post we will talk about...",
          "votes" : 1000000
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 13.353139,
        "_source" : {
          "title" : "About popularity",
          "content" : "In this post we will talk about...",
          "votes" : 100
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "title" : "About popularity",
          "content" : "In this post we will talk about...",
          "votes" : 0
        }
      }
    ]
  }
}

View Code

使用 modifier 平滑曲线

发现投票数影响的差异太大

POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "popularity",
          "fields": [
            "title",
            "content"
          ]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p"
      }
    }
  }
}

View Code

输出结果

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.8011884,
    "hits" : [
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.8011884,
        "_source" : {
          "title" : "About popularity",
          "content" : "In this post we will talk about...",
          "votes" : 1000000
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.26763982,
        "_source" : {
          "title" : "About popularity",
          "content" : "In this post we will talk about...",
          "votes" : 100
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "title" : "About popularity",
          "content" : "In this post we will talk about...",
          "votes" : 0
        }
      }
    ]
  }
}

View Code

引入factor

POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "popularity",
          "fields": [
            "title",
            "content"
          ]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p",
        "factor": 0.1
      }
    }
  }
}

View Code

boost_mode max_boost 设置最大份

POST /blogs/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "popularity",
          "fields": [
            "title",
            "content"
          ]
        }
      },
      "field_value_factor": {
        "field": "votes",
        "modifier": "log1p",
        "factor": 0.1
      },
      "boost_mode": "sum",
      "max_boost": 3
    }
  }
}

View Code

一致随机性函数

POST /blogs/_search
{
  "query": {
    "function_score": {
      "random_score": {
        "seed": 911119
      }
    }
  }
}

View Code

排序为 1 3 2

POST /blogs/_search
{
  "query": {
    "function_score": {
      "random_score": {
        "seed": 100
      }
    }
  }
}

View Code

排序为 1 3 2

Term & Phrase Suggester（建议）

什么是搜索建议

Elasticsearch Suggester Api

suggest_mode Missing Mode

DELETE articles

POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{  "body": "elasticsearch is rock solid"}


POST /articles/_search
{
  "size": 1,
  "query": {
    "match": {
      "body": "lucen rock"
    }
  },
  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "missing",
        "field": "body"
      }
    }
  }
}

View Code

suggest_mode popular Mode

POST /articles/_search
{
  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "popular",
        "field": "body"
      }
    }
  }
}

View Code

sorted by frq(频率) & prefix legth(首字母长度)

默认首字母写错就不推荐了

POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen hocks",
      "term": {
        "suggest_mode": "always",
        "field": "body",
        "prefix_length":0,
        "sort": "frequency"
      }
    }
  }
}

phrase 推荐

POST /articles/_search
{
  "suggest": {
    "my-suggestion": {
      "text": "lucne and elasticsear rock hello world ",
      "phrase": {
        "field": "body",
        "max_errors": 2,
        "confidence": 1,
        "direct_generator": [
          {
            "field": "body",
            "suggest_mode": "always"
          }
        ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

confidence 要是改为2的话，只有一条数据

自动补全与基于上下文的搜索

自动补全

使用 completion 步骤

DELETE articles
# 定义mapping
PUT articles
{
  "mappings": {
    "properties": {
      "title_completion":{
        "type": "completion"
      }
    }
  }
}
# 索引数据
POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }

# 运行suggest
POST articles/_search?pretty
{
  "size": 0,
  "suggest": {
    "article-suggester": {
      "prefix": "elk ",
      "completion": {
        "field": "title_completion"
      }
    }
  }
}

View Code

基于上下文的搜索

实现上下文搜索

定义 mapping

DELETE comments
PUT comments
PUT comments/_mapping
{
  "properties": {
    "comment_autocomplete":{
      "type": "completion",
      "contexts":[{
        "type":"category",
        "name":"comment_category"
      }]
    }
  }
}

View Code

索引数据

POST comments/_doc
{
  "comment": "I love the star war movies",
  "comment_autocomplete": {
    "input": [
      "star wars"
    ],
    "contexts": {
      "comment_category": "movies"
    }
  }
}

POST comments/_doc
{
  "comment": "Where can I find a Starbucks",
  "comment_autocomplete": {
    "input": [
      "starbucks"
    ],
    "contexts": {
      "comment_category": "coffee"
    }
  }
}

View Code

不同的上下文自动提示

POST comments/_search
{
  "suggest": {
    "MY_SUGGESTION": {
      "prefix": "sta",
      "completion": {
        "field": "comment_autocomplete",
        "contexts": {
          "comment_category": "coffee"
        }
      }
    }
  }
}

View Code

跨集群搜索

水平扩展的痛点

配置及查询

在每个集群上配置

PUT /_cluster/settings
{

  "persistent": {

    "cluster": {

      "remote": {

        "cluster0": {

          "seeds": [

            "127.0.0.1:9300"

          ],

          "transport.ping_schedule": "30s"

        },

        "cluster1": {

          "seeds": [

            "127.0.0.1:9301"

          ],

          "transport.compress": true,

          "skip_unavailable": true

        },

        "cluster2": {

          "seeds": [

            "127.0.0.1:9302"

          ]

        }

      }

    }

  }

}

View Code

插入数据

#创建测试数据
curl -XPOST "http://localhost:9200/users/_doc" -H 'Content-Type: application/json' -d'
{"name":"user1","age":10}'

curl -XPOST "http://localhost:9201/users/_doc" -H 'Content-Type: application/json' -d'
{"name":"user2","age":20}'

curl -XPOST "http://localhost:9202/users/_doc" -H 'Content-Type: application/json' -d'
{"name":"user3","age":30}'

View Code

查询

GET /users,cluster1:users,cluster2:users/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 20,
        "lte": 40
      }
    }
  }
}

posted @ 2021-03-30 13:03 Crazymagic 阅读(143) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Crazymagic

elasticsearch—深入搜索

基于词项和基于全文的搜索

基于 Term 的查询（一般为了性能可以设置参数使其不打分）

基于全文的查询

相关阅读

结构化搜索

相关阅读

搜索的相关性算分

Query & Filtering 与多字符串多字段查询

相关阅读

单字符串多字段查询 Dis Max Query

单字符串查询

相关阅读

Multi Match Query

多语言及中文分词与检索

一些中文分词器

一个全文检索的例子

使用 Search Template 和 Index Alias 查询

Search Template

index alias 实现零停机运维

综合排序：Function Score Query 优化算分

算分与排序

Term & Phrase Suggester（建议）

suggest_mode Missing Mode

suggest_mode popular Mode

sorted by frq(频率) & prefix legth(首字母长度)

phrase 推荐

自动补全与基于上下文的搜索

基于上下文的搜索

跨集群搜索

公告