1-Elasticsearch - phrase suggester

目录

    词组建议器和词条建议器一样,不过它不再为单个词条提供建议,而是为整个文本提供建议。
    准备数据:

    PUT s4
    {
      "mappings": {
        "doc": {
          "properties": {
            "title": {
              "type": "text",
              "analyzer": "standard"
            }
          }
        }
      }
    }
    
    PUT s4/doc/1
    {
      "title": "Lucene is cool"
    }
    
    PUT s4/doc/2
    {
      "title": "Elasticsearch builds on top of lucene"
    }
    
    PUT s4/doc/3
    {
      "title": "Elasticsearch rocks"
    }
    
    PUT s4/doc/4
    {
      "title": "Elastic is the company behind ELK stack"
    }
    
    PUT s4/doc/5
    {
      "title": "elk rocks"
    }
    
    PUT s4/doc/6
    {
      "title": "elasticsearch is rock solid"
    }
    

    现在我们来看看phrase是如何建议的:

    GET s4/doc/_search
    {
      "suggest": {
        "my_s4": {
          "text": "lucne and elasticsear rock",
          "phrase": {
            "field": "title"
          }
        }
      }
    }
    

    text是输入带有拼错的文本。而建议类型则换成了phrase。来看查询结果:

    {
      "took" : 6,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 0,
        "max_score" : 0.0,
        "hits" : [ ]
      },
      "suggest" : {
        "my_s4" : [
          {
            "text" : "lucne and elasticsear rock",
            "offset" : 0,
            "length" : 26,
            "options" : [
              {
                "text" : "lucne and elasticsearch rocks",
                "score" : 0.12709484
              },
              {
                "text" : "lucne and elasticsearch rock",
                "score" : 0.10422645
              },
              {
                "text" : "lucne and elasticsear rocks",
                "score" : 0.10036137
              }
            ]
          }
        ]
      }
    }
    

    可以看到options直接返回了相关短语列表。虽然lucene建议的并不好。但elasticserchrock很不错。除此之外,我们还可以使用高亮来向用户展示哪些原有的词条被纠正了。

    GET s4/doc/_search
    {
      "suggest": {
        "my_s4": {
          "text": "lucne and elasticsear rock",
          "phrase": {
            "field": "title",
            "highlight":{
              "pre_tag":"<em>",
              "post_tag":"</em>"
            }
          }
        }
      }
    }
    

    除了默认的,还可以自定义高亮显示:

    GET s4/doc/_search
    {
      "suggest": {
        "my_s4": {
          "text": "lucne and elasticsear rock",
          "phrase": {
            "field": "title",
            "highlight":{
              "pre_tag":"<b id='d1' class='t1' style='color:red;font-size:18px;'>",
              "post_tag":"</b>"
            }
          }
        }
      }
    }
    

    需要注意的是,建议器结果的高亮显示和查询结果高亮显示有些许区别,比如说,这里的自定义标签是pre_tagpost_tag而不是之前如这样的:

    GET s4/doc/_search
    {
      "query": {
        "match": {
          "title": "rock"
        }
      },
      "highlight": {
        "pre_tags": "<b style='color:red'>",
        "post_tags": "</b>",
        "fields": {
          "title": {}
        }
      }
    }
    

    phrase suggesterterm suggester的基础上,会考虑多个term之间的关系,比如是否同时出现索引的原文中,临近程度,词频等。


    see also:[phrase suggester](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html) 欢迎斧正,that's all
    posted @ 2019-04-12 11:31  听雨危楼  阅读(1604)  评论(0编辑  收藏  举报