读《深入理解Elasticsearch》点滴-查询二次评分

理解二次评分

二次评分是指重新计算查询返回文档中指定个数文档的得分,es会截取查询返回的前N个,并使用预定义的二次评分方法来重新计算他们的得分

小结

  1. 有时候,我们需要显示查询结果,并且使得页面上靠前文档的顺序能受到一些额外的规则控制,但遗憾的是,我们并不能通过二次评分来实现,也许有些读者会想到window-size参数,然而实际上这个参数与返回列表中靠前文档并无关系,他只是制定了每个分片应该返回的文档数,而且window_size不能小于页面大小
  2. 二次评分功能并不能与排序一起使用,这是因为排序发生在二次评分之前,所以排序没有考虑后续新计算出来的文档得分
  3. 如果search_type为scan或count,二次评分不会被执行

 

 

二次评分参数配置

在resource对象中,必须配置下面的参数:

  • window_size 窗口大小,默认值是from和size参数值之和,它指定了每个分片上参与二次评分的文档个数
  • query_weight 查询权重,默认值是1,原始查询得分与二次评分的得分相加之前将乘以改值
  • rescore_query_weight 二次评分查询的权重值,默认值是1,二次评分查询得分在与原始查询得分相加之前,乘以该值
  • rescore_mode 二次评分模式,默认为total,可用的选项有total、max、min、avg和mutiply
    • total 得分是两种查询之he
    • max 两种查询中的最大值
    • min 两种查询中的最小值
    • avg 两种查询的平均值
    • multiply 两种查询的乘积

以下是官网手册(v5.1)

Rescoring can help to improve precision by reordering just the top (eg 100 - 500) documents returned by the query and post_filter phases, using a secondary (usually more costly) algorithm, instead of applying the costly algorithm to all documents in the index.

A rescore request is executed on each shard before it returns its results to be sorted by the node handling the overall search request.

Currently the rescore API has only one implementation: the query rescorer, which uses a query to tweak the scoring. In the future, alternative rescorers may be made available, for example, a pair-wise rescorer.

Note

the rescore phase is not executed when sort is used.

Note

when exposing pagination to your users, you should not change window_size as you step through each page (by passing different from values) since that can alter the top hits causing results to confusingly shift as the user steps through pages.

Query rescoreredit

The query rescorer executes a second query only on the Top-K results returned by the query and post_filter phases. The number of docs which will be examined on each shard can be controlled by the window_size parameter, which defaults to from and size.

By default the scores from the original query and the rescore query are combined linearly to produce the final _score for each document. The relative importance of the original query and of the rescore query can be controlled with the query_weight and rescore_query_weight respectively. Both default to 1.

For example:

curl -s -XPOST 'localhost:9200/_search' -d '{
   "query" : {
      "match" : {
         "field1" : {
            "operator" : "or",
            "query" : "the quick brown",
            "type" : "boolean"
         }
      }
   },
   "rescore" : {
      "window_size" : 50,
      "query" : {
         "rescore_query" : {
            "match" : {
               "field1" : {
                  "query" : "the quick brown",
                  "type" : "phrase",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }
}
'

The way the scores are combined can be controlled with the score_mode:

Score ModeDescription

total

Add the original score and the rescore query score.  The default.

multiply

Multiply the original score by the rescore query score.  Useful for function query rescores.

avg

Average the original score and the rescore query score.

max

Take the max of original score and the rescore query score.

min

Take the min of the original score and the rescore query score.

Multiple Rescoresedit

It is also possible to execute multiple rescores in sequence:

curl -s -XPOST 'localhost:9200/_search' -d '{
   "query" : {
      "match" : {
         "field1" : {
            "operator" : "or",
            "query" : "the quick brown",
            "type" : "boolean"
         }
      }
   },
   "rescore" : [ {
      "window_size" : 100,
      "query" : {
         "rescore_query" : {
            "match" : {
               "field1" : {
                  "query" : "the quick brown",
                  "type" : "phrase",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }, {
      "window_size" : 10,
      "query" : {
         "score_mode": "multiply",
         "rescore_query" : {
            "function_score" : {
               "script_score": {
                  "script": {
                    "lang": "painless",
                    "inline": "Math.log10(doc['numeric'].value + 2)"
                  }
               }
            }
         }
      }
   } ]
}
'

The first one gets the results of the query then the second one gets the results of the first, etc.  The second rescore will "see" the sorting done by the first rescore so it is possible to use a large window on the first rescore to pull documents into a smaller window for the second rescore.

posted on 2018-03-17 22:48  手握太阳  阅读(2979)  评论(0编辑  收藏  举报

导航