ES标签搜索并解决评分排序问题
一、概述
需求:
最近在做一个新闻项目,有这样一个需求:
- 用户根据视频内容手动创建标签,标签个数不限
- 在视频详情页提供根据标签推荐视频功能,即按本视频的标签进行搜索,标签匹配多的排在前面,匹配少的排在后面
经过分析、调研,以单字段存储标签,尝试了下面的几种方案,这里一并写出
不可行方案:
- 字段为keyword类型,数据以数组存储,未找到可实现此功能的检索方式
- 字段为text类型,多个标签以空格隔开或者数组存储,使用match搜索,数据评分不准确
- 字段为text类型,多个标签以空格隔开或者数组存储,使用match结合match_phrase搜索,数据评分扔不准确
可行方案:
1. 字段为text类型,指定分词器为whitespace,以空格分隔标签
"mediaTag" : { "type" : "text", "analyzer": "whitespace" }
2. 字段为text类型,指定分词器为pattern,指定标签分隔字符,以逗号分隔
PUT /es_medias_test2 { "settings": { "analysis": { "analyzer": { "comma": { //自定义分词器名称 "type": "pattern", "pattern": "," } } } }, "mappings": { "esmedias": { "properties": { "mediaTag": { "type": "text", "analyzer": "comma" } } } } }
二、可行方案测试(以可行方案一为例)
1. 创建索引
PUT /es_medias_test2 { "settings": { "index": { "number_of_shards": "1", "number_of_replicas": "0" } }, "mappings": { "esmedias": { "properties": { "mediaTag" : { "type" : "text", "analyzer": "whitespace" } } } } }
2. 添加数据
POST /es_medias_test2/_bulk {"create":{"_index":"es_medias_test2","_type":"esmedias","_id":"o3kyp3YB_f4AQBwwbA7Q"}} {"mediaTag":"美国 英国"} {"create":{"_index":"es_medias_test2","_type":"esmedias","_id":"lHk0p3YB_f4AQBwwvxBz"}} {"mediaTag":"英国 美国"} {"create":{"_index":"es_medias_test2","_type":"esmedias","_id":"-Xk1p3YB_f4AQBwwNRBt"}} {"mediaTag":"美国 法国 英国"} {"create":{"_index":"es_medias_test2","_type":"esmedias","_id":"AXlYp3YB_f4AQBww9zDT"}} {"mediaTag":"china 美国 英国"} {"create":{"_index":"es_medias_test2","_type":"esmedias","_id":"13k1p3YB_f4AQBwwBxDw"}} {"mediaTag":"美国 英国 士大夫"} {"create":{"_index":"es_medias_test2","_type":"esmedias","_id":"PXk1p3YB_f4AQBwwfxGI"}} {"mediaTag":"美国"} {"create":{"_index":"es_medias_test2","_type":"esmedias","_id":"G3k1p3YB_f4AQBwwahEM"}} {"mediaTag":"英国 船"} {"create":{"_index":"es_medias_test2","_type":"esmedias","_id":"G3lap3YB_f4AQBwwNTEX"}} {"mediaTag":"china 美国"} {"create":{"_index":"es_medias_test2","_type":"esmedias","_id":"FXlLp3YB_f4AQBwwUCRf"}} {"mediaTag":"china 美国 法国"}
3. 测试
GET /es_medias_test2/_search { "query": { "match": { "mediaTag": "美国 英国" } } }
返回结果符合预期
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 10,
"max_score" : 1.8475795,
"hits" : [
{
"_index" : "es_medias_test2",
"_type" : "esmedias",
"_id" : "-Xk1p3YB_f4AQBwwNRBt",
"_score" : 1.8475795,
"_source" : {
"mediaTag" : "美国 法国 英国"
}
},
{
"_index" : "es_medias_test2",
"_type" : "esmedias",
"_id" : "FXlLp3YB_f4AQBwwUCRf",
"_score" : 1.5141833,
"_source" : {
"mediaTag" : "china 美国 法国"
}
},
{
"_index" : "es_medias_test2",
"_type" : "esmedias",
"_id" : "o3kyp3YB_f4AQBwwbA7Q",
"_score" : 0.66557413,
"_source" : {
"mediaTag" : "美国 英国"
}
},
{
"_index" : "es_medias_test2",
"_type" : "esmedias",
"_id" : "xXkyp3YB_f4AQBwwpw6Y",
"_score" : 0.66557413,
"_source" : {
"mediaTag" : "美国 英国"
}
},
{
"_index" : "es_medias_test2",
"_type" : "esmedias",
"_id" : "lHk0p3YB_f4AQBwwvxBz",
"_score" : 0.66557413,
"_source" : {
"mediaTag" : "英国 美国"
}
},
{
"_index" : "es_medias_test2",
"_type" : "esmedias",
"_id" : "13k1p3YB_f4AQBwwBxDw",
"_score" : 0.5578373,
"_source" : {
"mediaTag" : "美国 英国 士大夫"
}
},
{
"_index" : "es_medias_test2",
"_type" : "esmedias",
"_id" : "AXlYp3YB_f4AQBww9zDT",
"_score" : 0.39778596,
"_source" : {
"mediaTag" : "china,美国 英国"
}
},
{
"_index" : "es_medias_test2",
"_type" : "esmedias",
"_id" : "G3k1p3YB_f4AQBwwahEM",
"_score" : 0.39778596,
"_source" : {
"mediaTag" : "英国 船"
}
},
{
"_index" : "es_medias_test2",
"_type" : "esmedias",
"_id" : "PXk1p3YB_f4AQBwwfxGI",
"_score" : 0.33188638,
"_source" : {
"mediaTag" : "美国"
}
},
{
"_index" : "es_medias_test2",
"_type" : "esmedias",
"_id" : "G3lap3YB_f4AQBwwNTEX",
"_score" : 0.26778817,
"_source" : {
"mediaTag" : "china 美国"
}
}
]
}
}