elasticsearch5.x:查询建议介绍、Suggester 介绍以及Java-api实现


elasticsearch5.x:查询建议介绍、Suggester 介绍


参考:http://www.cnblogs.com/leeSmall/p/9206646.html

参考(重点):https://elasticsearch.cn/article/142

参考(官网):https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html


一、查询建议介绍

1. 查询建议是什么?

查询建议,为用户提供良好的使用体验。主要包括: 拼写检查; 自动建议查询词(自动补全)

拼写检查如图: image

自动建议查询词(自动补全): image

2. ES中查询建议的API

查询建议也是使用_search端点地址。在DSL中suggest节点来定义需要的建议查询

示例1:定义单个建议查询词
POST twitter/_search
{
 "query" : {
   "match": {
     "message": "tring out Elasticsearch"
   }
 },
 "suggest" : { <!-- 定义建议查询 -->
   "my-suggestion" : { <!-- 一个建议查询名 -->
     "text" : "tring out Elasticsearch", <!-- 查询文本 -->
     "term" : { <!-- 使用词项建议器 -->
       "field" : "message" <!-- 指定在哪个字段上获取建议词 -->
     }
   }
 }
}

PUT  index
{
  "mappings":{
    "completion":{
       "properties":{
          "title": {
            "type": "text",
            "analyzer": "ik_smart"
          },
          "title_suggest": {
            "type": "completion",
            "analyzer": "ik_smart",
            "search_analyzer": "ik_smart"
          }
        }
    }
    
  }
 
}

  

示例2:定义多个建议查询词
POST _search
{
  "suggest": {
    "my-suggest-1" : {
      "text" : "tring out Elasticsearch",
      "term" : {
        "field" : "message"
      }
    },
    "my-suggest-2" : {
      "text" : "kmichy",
      "term" : {
        "field" : "user"
      }
    }
  }
}

 

示例3:多个建议查询可以使用全局的查询文本
POST _search
{
  "suggest": {
    "text" : "tring out Elasticsearch",
    "my-suggest-1" : {
      "term" : {
        "field" : "message"
      }
    },
    "my-suggest-2" : {
       "term" : {
        "field" : "user"
       }
    }
  }
}

 

二、Suggester 介绍

1. Term suggester

term 词项建议器,对给入的文本进行分词,为每个词进行模糊查询提供词项建议。对于在索引中存在词默认不提供建议词,不存在的词则根据模糊查询结果进行排序后取一定数量的建议词。

常用的建议选项: image

示例1:

POST twitter/_search
{
  "query" : {
    "match": {
      "message": "tring out Elasticsearch"
    }
  },
  "suggest" : { <!-- 定义建议查询 -->
    "my-suggestion" : { <!-- 一个建议查询名 -->
      "text" : "tring out Elasticsearch", <!-- 查询文本 -->
      "term" : { <!-- 使用词项建议器 -->
        "field" : "message" <!-- 指定在哪个字段上获取建议词 -->
      }
    }
  }
}

  

2. phrase suggester

phrase 短语建议,在term的基础上,会考量多个term之间的关系,比如是否同时出现在索引的原文里,相邻程度,以及词频等

示例
POST twitter/_search
{
  "query" : {
    "match": {
      "message": "tring out Elasticsearch"
    }
  },
  "suggest" : {
    "my-suggestion" : {
      "text" : "tring out Elasticsearch",
      "phrase" : {
        "field" : "message"
      }
    }
  }
}

 

结果:
{
  "took": 30,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.113083,
    "hits": [
      {
        "_index": "twitter",
        "_type": "tweet",
        "_id": "4",
        "_score": 1.113083,
        "_source": {
          "user": "kimchy",
          "postDate": "2018-07-23T07:29:57.653Z",
          "message": "trying out Elasticsearch"
        }
      },
      {
        "_index": "twitter",
        "_type": "tweet",
        "_id": "7",
        "_score": 0.98382175,
        "_source": {
          "user": "yuchen20",
          "postDate": "2018-07-23T08:12:05.604Z",
          "message": "trying out Elasticsearch"
        }
      }
    ]
  },
 "suggest": { <!-- 建议-->
    "my-suggestion": [
      {
        "text": "tring out Elasticsearch",
        "offset": 0,
        "length": 23,
        "options": [{
          {
            "text": "trying out elasticsearch",
            "score": 0.5118434
          }
        ]
      }
    ]
  }
}

  

3. Completion suggester 自动补全

针对自动补全场景而设计的建议器。此场景下用户每输入一个字符的时候,就需要即时发送一次查询请求到后端查找匹配项,在用户输入速度较高的情况下对后端响应速度要求比较苛刻。因此实现上它和前面两个Suggester采用了不同的数据结构,索引并非通过倒排来完成,而是将analyze过的数据编码成FST和索引一起存放。对于一个open状态的索引,FST会被ES整个装载到内存里的,进行前缀查找速度极快。但是FST只能用于前缀查找,这也是Completion Suggester的局限所在。

官网链接:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

示例1:

为了使用自动补全,索引中用来提供补全建议的字段需特殊设计,字段类型为 completion。 先设置mapping:

PUT  index/
{
  "mappings":{
    "completion":{
       "properties":{
          "title": {
            "type": "text",
            "analyzer": "ik_smart"
          },
          "title_suggest": {
            "type": "completion",
            "analyzer": "ik_smart",
            "search_analyzer": "ik_smart"
          }
        }
    }
    
  }
 
}

 

重点是title_suggest,这个字段就是之后我们搜索补全的字段,需要设置type为completion,analyzer按情况设置分析器

索引数据:

POST /index/completion/_bulk
{ "index" : { } }
{ "title": "背景天安门广场大学", "title_suggest": "背景天安门广场大学"}
{ "index" : { } }
{ "title": "北京天安门","title_suggest": "北京天安门"}
{ "index" : { } }
{ "title": "北京鸟巢","title_suggest": "北京鸟巢"}
{ "index" : { } }
{ "title": "奥林匹克公园","title_suggest": "奥林匹克公园"}
{ "index" : { } }
{ "title": "奥林匹克森林公园","title_suggest": "奥林匹克森林公园"}
{ "index" : { } }
{ "title": "北京奥林匹克公园","title_suggest": "北京奥林匹克公园"}
{ "index" : { } }
{ "title": "北京奥林匹克公园","title_suggest": {"input": "我爱中国","weight": 100}}

 

索引的时候可以对suggest字段,增加weight增加排序权重

搜索补全:

POST /index/completion/_search
{
  "size": 0,
  "suggest":{
    "blog-suggest":{
      "prefix":"北京",
      "completion":{
        "field":"title_suggest"
      }
    }
  }
}

 

结果:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "suggest": {
    "blog-suggest": [
      {
        "text": "北京",
        "offset": 0,
        "length": 2,
        "options": [
          {
            "text": "北京天安门",
            "_index": "index",
            "_type": "completion",
            "_id": "AWSRo_hn9K_aupETR6FR",
            "_score": 1,
            "_source": {
              "title": "北京天安门",
              "title_suggest": "北京天安门"
            }
          },
          {
            "text": "北京奥林匹克公园",
            "_index": "index",
            "_type": "completion",
            "_id": "AWSRo_hn9K_aupETR6FV",
            "_score": 1,
            "_source": {
              "title": "北京奥林匹克公园",
              "title_suggest": "北京奥林匹克公园"
            }
          },
          {
            "text": "北京鸟巢",
            "_index": "index",
            "_type": "completion",
            "_id": "AWSRo_hn9K_aupETR6FS",
            "_score": 1,
            "_source": {
              "title": "北京鸟巢",
              "title_suggest": "北京鸟巢"
            }
          }
        ]
      }
    ]
  }
}

 

示例2:

创建映射

PUT music
{
    "mappings": {
        "docc" : {
            "properties" : {
                "suggest" : {
                    "type" : "completion"
                },
                "title" : {
                    "type": "keyword"
                }
            }
        }
    }
}

  

Input 指定输入词 Weight 指定排序值(可选)
PUT music/docc/1?refresh
{
    "suggest" : {
        "input": [ "Nevermind", "Nirvana" ],
        "weight" : 34
    }
}

  

指定不同的排序值:

PUT music/_doc/1?refresh
{
    "suggest" : [
        {
            "input": "Nevermind",
            "weight" : 10
        },
        {
            "input": "Nirvana",
            "weight" : 3
        }
    ]}

  

放入一条重复数据

PUT music/docc/2?refresh
{
   "suggest" : {
       "input": [ "Nevermind", "Nirvana" ],
       "weight" : 20
   }
}

 

查询建议根据前缀查询:

POST music/_search?pretty
{
    "suggest": {
        "song-suggest" : {
            "prefix" : "nir", 
            "completion" : { 
                "field" : "suggest" 
            }
        }
    }
}

  

对建议查询结果去重: "skip_duplicates": true ,该特性在6.x支持,5.x不支持

POST music/_search?pretty
{
    "suggest": {
        "song-suggest" : {
            "prefix" : "nir", 
            "completion" : { 
                "field" : "suggest",
                "skip_duplicates": true 
            }
        }    
        
    }
    
}

 

查询建议文档存储短语

PUT music/docc/3?refresh
{
    "suggest" : {
        "input": [ "lucene solr", "lucene so cool","lucene elasticsearch" ],
        "weight" : 20
    }
}

PUT music/docc/4?refresh
{
    "suggest" : {
        "input": ["lucene solr cool","lucene elasticsearch" ],
        "weight" : 10
    }
}

 

查询

POST music/_search?pretty
{
   "suggest": {
       "song-suggest" : {
           "prefix" : "lucene s", 
           "completion" : { 
               "field" : "suggest" 
           }
       }
   }

}

 三 、java -api

## elasticsearch5.x:查询建议java-api介绍、Suggester 介绍
参考:http://www.mamicode.com/info-detail-2347270.html


package com.youlan.es.util;

import java.util.concurrent.ExecutionException;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.suggest.*;
import org.elasticsearch.search.suggest.completion.CompletionSuggestion;
import org.elasticsearch.search.suggest.phrase.PhraseSuggestion;
import org.elasticsearch.search.suggest.term.TermSuggestion;

public class SuggestDemo {

    private static Logger logger = LogManager.getRootLogger();

    //拼写检查(英文)
    public static void termSuggest(TransportClient client) {

        // 1、创建search请求
        //SearchRequest searchRequest = new SearchRequest();
        SearchRequest searchRequest = new SearchRequest("twitter");


        // 2、用SearchSourceBuilder来构造查询请求体 ,请仔细查看它的方法,构造各种查询的方法都在这。
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

        sourceBuilder.size(0);

        //做查询建议
        //词项建议
        SuggestionBuilder termSuggestionBuilder =
                SuggestBuilders.termSuggestion("message").text("tring out Elticsearch");//搜索框输入内容:tring out Elticsearch
        SuggestBuilder suggestBuilder = new SuggestBuilder();
        suggestBuilder.addSuggestion("suggest_user", termSuggestionBuilder);
        sourceBuilder.suggest(suggestBuilder);

        searchRequest.source(sourceBuilder);

        try{
            //3、发送请求
            SearchResponse searchResponse = client.search(searchRequest).get();


            //4、处理响应
            //搜索结果状态信息
            if(RestStatus.OK.equals(searchResponse.status())) {
                // 获取建议结果
                Suggest suggest = searchResponse.getSuggest();
                TermSuggestion termSuggestion = suggest.getSuggestion("suggest_user");
                for (TermSuggestion.Entry entry : termSuggestion.getEntries()) {
                    logger.info("text: " + entry.getText().string());
                    for (TermSuggestion.Entry.Option option : entry) {
                        String suggestText = option.getText().string();//建议内容
                        logger.info("   suggest option : " + suggestText);
                    }
                }
            }

        } catch (InterruptedException | ExecutionException e) {
            logger.error(e);
        }
            /*
              "suggest": {
                "my-suggestion": [
                  {
                    "text": "tring",
                    "offset": 0,
                    "length": 5,
                    "options": [
                      {
                        "text": "trying",
                        "score": 0.8,
                        "freq": 2
                      }
                    ]
                  },
                  {
                    "text": "out",
                    "offset": 6,
                    "length": 3,
                    "options": []
                  },
                  {
                    "text": "elasticsearch",
                    "offset": 10,
                    "length": 13,
                    "options": []
                  }
                ]
              }*/

    }

    public static void phraseSuggest(TransportClient client){
        //1、创建search请求
        SearchRequest searchRequest = new SearchRequest("twitter");

        //2、构造查询qing'qi请求体
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        sourceBuilder.size(0);

        SuggestionBuilder phraseSuggestBuilder = SuggestBuilders.phraseSuggestion( "message").text("tring out");

        SuggestBuilder suggestBuilder = new SuggestBuilder();
        suggestBuilder.addSuggestion("my-suggestion",phraseSuggestBuilder);

        sourceBuilder.suggest(suggestBuilder);

        searchRequest.source(sourceBuilder);

        try {
            //3、发送请求
            SearchResponse searchResponse = client.search(searchRequest).get();
            //4、处理响应
            //搜索状态信息
            if (RestStatus.OK.equals(searchResponse.status())){
                //获得建议
                Suggest suggest = searchResponse.getSuggest();
                PhraseSuggestion phraseSuggestion =suggest.getSuggestion("my-suggestion");
                for (PhraseSuggestion.Entry entry:phraseSuggestion){
                    logger.info("text:"+entry.getText().string());
                    for (PhraseSuggestion.Entry.Option option:entry){
                        String suggestText = option.getText().string();
                        logger.info("   suggest option :"+suggestText);
                    }
                }

            }
        } catch (InterruptedException e) {
            logger.error("请求出错:"+e);
        } catch (ExecutionException e) {
            logger.error(e);
        }


    }
   //自动补全
    public static void completionSuggester(TransportClient client) {

        // 1、创建search请求
        //SearchRequest searchRequest = new SearchRequest();
        SearchRequest searchRequest = new SearchRequest("music");

        // 2、用SearchSourceBuilder来构造查询请求体 ,请仔细查看它的方法,构造各种查询的方法都在这。
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

        sourceBuilder.size(0);

        //做查询建议
        //自动补全
        /*POST music/_search?pretty
                {
                    "suggest": {
                        "song-suggest" : {
                            "prefix" : "lucene s",
                            "completion" : {
                                "field" : "suggest" ,
                                "skip_duplicates": true
                            }
                        }
                    }
                }*/

        SuggestionBuilder termSuggestionBuilder =
                SuggestBuilders.completionSuggestion("suggest").prefix("lucene s");
                       // .skipDuplicates(true) 6.x去重;
        SuggestBuilder suggestBuilder = new SuggestBuilder();
        suggestBuilder.addSuggestion("song-suggest", termSuggestionBuilder);
        sourceBuilder.suggest(suggestBuilder);

        searchRequest.source(sourceBuilder);

        try {
            //3、发送请求
            SearchResponse searchResponse = client.search(searchRequest).get();


            //4、处理响应
            //搜索结果状态信息
            if(RestStatus.OK.equals(searchResponse.status())) {
                // 获取建议结果
                Suggest suggest = searchResponse.getSuggest();
                CompletionSuggestion termSuggestion = suggest.getSuggestion("song-suggest");
                for (CompletionSuggestion.Entry entry : termSuggestion.getEntries()) {
                    logger.info("text: " + entry.getText().string());
                    for (CompletionSuggestion.Entry.Option option : entry) {
                        String suggestText = option.getText().string();
                        logger.info("   suggest option : " + suggestText);
                    }
                }
            }

        } catch (InterruptedException | ExecutionException e) {
            logger.error(e);
        }
//        结果:
//        {
//            "took": 7,
//                "timed_out": false,
//                "_shards": {
//            "total": 5,
//                    "successful": 5,
//                    "skipped": 0,
//                    "failed": 0
//        },
//            "hits": {
//            "total": 0,
//                    "max_score": 0,
//                    "hits": []
//        },
//            "suggest": {
//            "song-suggest": [
//            {
//                "text": "lucene s",
//                    "offset": 0,
//                    "length": 8,
//                    "options": [
//                {
//                    "text": "lucene so cool",
//                        "_index": "music",
//                        "_type": "docc",
//                        "_id": "3",
//                        "_score": 20,
//                        "_source": {
//                    "suggest": {
//                        "input": [
//                        "lucene solr",
//                                "lucene so cool",
//                                "lucene elasticsearch"
//                ],
//                        "weight": 20
//                    }
//                }
//                },
//                {
//                    "text": "lucene solr cool",
//                        "_index": "music",
//                        "_type": "docc",
//                        "_id": "4",
//                        "_score": 10,
//                        "_source": {
//                    "suggest": {
//                        "input": [
//                        "lucene solr cool",
//                                "lucene elasticsearch"
//                ],
//                        "weight": 10
//                    }
//                }
//                }
//        ]
//            }
//    ]
//        }
//        }
    }

    public static void main(String[] args) {
        EsClient esClient= new EsClient();
        try (TransportClient client =esClient.getConnection() ;) {
            logger.info("---------------- 拼写检查:termSuggest----------------------");
            termSuggest(client);

            logger.info("------------------ 短语建议:phraseSuggest--------------------");
            phraseSuggest(client);
            logger.info("------------------ 自动补全:completionSuggester--------------------");
            completionSuggester(client);
        } catch (Exception e) {
            logger.error(e);
        }
    }
}

  

posted @ 2018-07-26 20:58  之恒  阅读(863)  评论(0编辑  收藏  举报