转:https://blog.csdn.net/w1014074794/article/details/120523550
1 function_score介绍
1.1 简介
主要用于让用户自定义查询相关性得分,实现精细化控制评分的目的。
在ES的常规查询中,只有参与了匹配查询的字段才会参与记录的相关性得分score的计算。但很多时候我们希望能根据搜索记录的热度、浏览量、评分高低等来计算相关性得分,提高用户体验。
1.2 官方文档
https://www.elastic.co/guide/en/elasticsearch/reference/7.2/query-dsl-function-score-query.html#score-functions
1.3 哪些信息是用户真正关心的
搜索引擎本质是一个匹配过程,即从海量数据中找到匹配用户需求的内容。
除了根据用户输入的查询关键字去检索外,还应根据用户的使用习惯、浏览记录、最近关注、搜索记录的热度等进行更加智能化的匹配。
1.4 常见的一些场景
1)在百度、谷歌中搜索内容;
2)在淘宝、京东上面搜索商品;
3)在抖音上搜索用户和短视频。
2 示例
2.1 创建索引
说明:创建blog索引,只有2个字段,博客名title和访问量access_num。
用户根据博客名称搜索的时候,既希望名称能尽可能匹配,也希望访问量越多的排在最前面,因为一般访问量越多的博客质量会越好,这样可以提高用户的检索体验。
DELETE /blog
PUT /blog
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"access_num": {
"type": "integer"
}
}
}
}
2.2 添加测试数据
PUT blog/_doc/2
{
"title": "java入门到精通",
"access_num":30
}
PUT blog/_doc/3
{
"title": "es入门到精通",
"access_num":50
}
PUT blog/_doc/4
{
"title": "mysql入门到精通",
"access_num":30
}
PUT blog/_doc/5
{
"title": "精通spark",
"access_num":40
}
2.3 常规检索
直接使用match查询,只会根据检索关键字和title字段值的相关性检索排序
GET /blog/_search
{
"query": {
"match": {
"title": "java入门"
}
}
}
查询结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.474477,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.474477,
"_source" : {
"title" : "java入门到精通",
"access_num" : 30
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.33698124,
"_source" : {
"title" : "es入门到精通",
"access_num" : 50
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.33698124,
"_source" : {
"title" : "mysql入门到精通",
"access_num" : 30
}
}
]
}
}
2.4 采用function_score自定义评分
除了match匹配查询计算相关性得分,还引入了根据浏览量access_num计算得分。
GET /blog/_search
{
"query": {
"function_score": {
"query": {
"match": {
"title": "java入门"
}
},
"functions": [
{
"script_score": {
"script": {
"params": {
"access_num_ratio": 2.5
},
"lang": "painless",
"source": "doc['access_num'].value * params.access_num_ratio "
}
}
}
]
}
}
}
查询结果如下
java入门到精通的分数,上面的常规查询是1.474477,现在是110.58578:就是1.474477*2.5*30=110.585775。
{
"took" : 183,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 110.58578,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "2",
"_score" : 110.58578,
"_source" : {
"title" : "java入门到精通",
"access_num" : 30
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_score" : 42.122654,
"_source" : {
"title" : "es入门到精通",
"access_num" : 50
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "4",
"_score" : 25.273592,
"_source" : {
"title" : "mysql入门到精通",
"access_num" : 30
}
}
]
}
}
3 自定义评分类型
3.1 简介
function_score 查询提供了多种类型的评分函数。
1)script_score scrip:t脚本评分
2)weight :字段权重评分
3)random_score: 随机评分
4)field_value_factor: 字段值因子评分
5)decay functions: gauss, linear, exp衰减函数(decay functions衰减函数太过复杂,这里暂时不作介绍)
3.2 script脚本评分
script_score 函数允许您包装另一个查询并选择性地使用脚本表达式从文档中的其他数字字段值派生的计算自定义它的评分。
这是一个简单的示例:把access_num乘以2.5作为脚本评分
GET /blog/_search
{
"query": {
"function_score": {
"script_score": {
"script": {
"params": {
"access_num_ratio": 2.5
},
"lang": "painless",
"source": "doc['access_num'].value * params.access_num_ratio "
}
}
}
}
}
查询结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 125.0,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_score" : 125.0,
"_source" : {
"title" : "es入门到精通",
"access_num" : 50
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "5",
"_score" : 100.0,
"_source" : {
"title" : "精通spark",
"access_num" : 40
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "2",
"_score" : 75.0,
"_source" : {
"title" : "java入门到精通",
"access_num" : 30
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "4",
"_score" : 75.0,
"_source" : {
"title" : "mysql入门到精通",
"access_num" : 30
}
}
]
}
}
3.3 weight 权重评分
weight函数是最简单的分支,它将得分乘以一个常数。请注意,普通的boost字段按照标准化来增加分数。
而weight函数却真真切切地将得分乘以确定的数值。
下面的例子意味着,在title字段中匹配了java词条查询的文档,他们的分数将被乘以1.5
GET /_search
{
"query": {
"function_score": {
"query": {
"match": { "title": "精通" }
},
"functions":[
{
"weight":1.5 ,
"filter": { "term": { "title": "java" }}
},
{
"weight":3 ,
"filter": { "term": { "title": "mysql" }}
}
]
}
}
}
查询结果
{
"took" : 37,
"timed_out" : false,
"_shards" : {
"total" : 27,
"successful" : 27,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 0.2986292,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.2986292,
"_source" : {
"title" : "mysql入门到精通",
"access_num" : 30
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.1493146,
"_source" : {
"title" : "java入门到精通",
"access_num" : 30
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.12776,
"_source" : {
"title" : "精通spark",
"access_num" : 40
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.099543065,
"_source" : {
"title" : "es入门到精通",
"access_num" : 50
}
}
]
}
}
3.4 random_score随机评分
random_score 生成从 0 到但不包括 1 的均匀分布的分数。默认情况下,它使用内部 Lucene doc id 作为随机源。
如果您希望分数可重现,可以提供种子和字段。 然后将基于此种子、所考虑文档的字段最小值以及基于索引名称和分片 id 计算的盐计算最终分数,以便具有相同值但存储在不同索引中的文档得到 不同的分数。
请注意,位于同一个分片内且具有相同字段值的文档将获得相同的分数,因此通常希望使用对所有文档具有唯一值的字段。 一个好的默认选择可能是使用 _seq_no 字段,其唯一的缺点是如果文档更新,分数会改变,因为更新操作也会更新 _seq_no 字段的值
GET /_search
{
"query": {
"function_score": {
"random_score": {
"seed": 10,
"field": "_seq_no"
}
}
}
}
查询结果
{
"took" : 31,
"timed_out" : false,
"_shards" : {
"total" : 27,
"successful" : 27,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 96,
"relation" : "eq"
},
"max_score" : 0.9906841,
"hits" : [
{
"_index" : "hockey",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.9906841,
"_source" : {
"first" : "jiri",
"last" : "hudler",
"goals" : [
5,
34,
36
],
"assists" : [
11,
62,
42
],
"gp" : [
24,
80,
79
],
"born" : "1984/01/04"
}
},
{
"_index" : "location",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.9901967,
"_source" : {
"locationStr" : "40.13933715136454,116.63441990026217"
}
},
{
"_index" : "food",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.98539144,
"_source" : {
"CreateTime" : "2022-07-07 13:11:11",
"Desc" : "苹果 yyds 好吃 便宜 水分多 营养",
"Level" : "普通水果",
"Name" : "苹果",
"Price" : 11.11,
"Tags" : [
"性价比",
"易种植",
"水果",
"营养"
],
"Type" : "水果"
}
},
{
"_index" : ".kibana_1",
"_type" : "_doc",
"_id" : "ui-metric:kibana-user_agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
"_score" : 0.9759189,
"_source" : {
"ui-metric" : {
"count" : 1
},
"type" : "ui-metric",
"references" : [ ],
"updated_at" : "2023-02-20T06:48:35.053Z"
}
},
{
"_index" : "hockey",
"_type" : "_doc",
"_id" : "11",
"_score" : 0.96869344,
"_source" : {
"first" : "joe",
"last" : "colborne",
"goals" : [
3,
18,
13
],
"assists" : [
6,
20,
24
],
"gp" : [
26,
67,
82
],
"born" : "1990/01/30"
}
},
{
"_index" : "food",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.96792156,
"_source" : {
"CreateTime" : "2022-06-06 13:11:11",
"Desc" : "大白菜 好吃 便宜 水分多",
"Level" : "普通蔬菜",
"Name" : "大白菜",
"Price" : 12.11,
"Tags" : [
"便宜",
"好吃",
"白色蔬菜"
],
"Type" : "蔬菜"
}
},
{
"_index" : "hockey",
"_type" : "_doc",
"_id" : "8",
"_score" : 0.9667589,
"_source" : {
"first" : "tj",
"last" : "brodie",
"goals" : [
2,
14,
7
],
"assists" : [
8,
42,
30
],
"gp" : [
26,
82,
82
],
"born" : "1990/06/07"
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.95516723,
"_source" : {
"name" : "nfc phone",
"desc" : "shouji zhong de hongzhaji",
"price" : 2999,
"tags" : [
"xingjiabi",
"fashao",
"menjinka"
]
}
},
{
"_index" : "location",
"_type" : "_doc",
"_id" : "8",
"_score" : 0.95324606,
"_source" : {
"locationStr" : "40.146045218040605,116.5696251832195"
}
},
{
"_index" : "mytest",
"_type" : "blog",
"_id" : "1",
"_score" : 0.9320041,
"_source" : {
"title" : "Master Java",
"content" : "learn java",
"author" : "Tom"
}
}
]
}
}
3.5 field_value_factor 字段值因子评分
field_value_factor 函数允许您使用文档中的字段来影响分数。 它类似于使用 script_score 函数,但是,它避免了脚本的开销。 如果用于多值字段,则在计算中仅使用该字段的第一个值。
举个例子,假设你有一个用数字 likes 字段索引的文档,并希望用这个字段影响文档的分数,一个这样做的例子看起来像
GET /_search
{
"query": {
"function_score": {
"field_value_factor": {
"field": "likes",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
得分计算公式: sqrt(1.2 * doc['likes'].value)
参数说明:
1)field要从文档中提取的字段。
2)factor与字段值相乘的可选因子,默认为 1。
3)modifier应用于字段值的计算修饰符, none, log, log1p, log2p, ln, ln1p, ln2p, square, sqrt, or reciprocal,默认 none.
4)missing如果文档没有该字段,则使用的值。 修饰符和因子仍然适用于它,就好像它是从文档中读取的一样
4 合并得分
4.1 示例
GET /_search
{
"query": {
"function_score": {
"query": { "match_all": {} },
"boost": "5",
"functions": [
{
"filter": { "match": { "test": "bar" } },
"random_score": {},
"weight": 23
},
{
"filter": { "match": { "test": "cat" } },
"weight": 42
}
],
"max_boost": 42,
"score_mode": "max",
"boost_mode": "multiply",
"min_score" : 42
}
}
}
4.2 参数说明
1)max_boost
可以通过设置 max_boost 参数将新分数限制为不超过某个限制。 max_boost 的默认值是 FLT_MAX。
2)min_score
默认情况下,修改分数不会更改匹配的文档。 要排除不满足某个分数阈值的文档,可以将 min_score 参数设置为所需的分数阈值。
4.3 参数 score_mode 指定如何组合计算的分数
multiply :相乘 (default) sum :求和 avg :平均分 first :使用具有匹配过滤器的第一个函数的得分 max :使用最高分 min :使用最低分 boost_mode:定义新计算的分数与查询的分数相结合。 具体选项: multiply :查询得分和函数得分相乘,默认 replace :仅使用函数得分,查询得分被忽略 sum :查询得分和函数得分求和 avg :查询得分和函数得分取平均值
max 取查询得分和函数得分的最大值
min 取查询得分和函数得分的最小值