es原理
一: 一个请求到达es集群,选中一个coordinate节点以后,会通过请求路由到指定primary shard中,如果分发策略选择为round-robin,如果来4个请求,则2个打到primary shard中2个打到replic shard中。
二: es在多个shard进行分片但数据倾斜严重的时候有可能会发生搜索score不准的情况,因为IDF分值的计算方法实在shard本地完成的;如shard1中数据较多,在计算某一词搜索时的分值时会导致分值整体下降,而这时shard2中出现的词频较少会整体分值偏高,这样容易导致原本不太相关的内容却变得分值高了起来,从而使排序不准;解决方法就是让多个shard在生产环境中尽量做到数据均衡分布,这样就不会因为score的本地计算而整体受影响。
三: es计算分值时有两种策略:
1)most-field->默认策略是全文检索的所有关键词,在document的每一个field中可匹配的次数越多则分值越高;规则:(每个match中field匹配分值的和) *(实际document匹配到了字段个数)/(query中match的个数) ,如下代码:
GET /index3/type3/_search { "query": { "bool": { "should": [ { "match": { "title":"spark"//title中可匹配成功 } }, { "match": { "content":"java"//content中也可匹配成功 } } ] } } }
2)beast-field->如果使用dis_max,document的分值则会根据match中field匹配分值最高的决定,也就是说和其他属性无关
GET /index3/type3/_search { "query": { "dis_max": { "queries": [ { "match": { "title": "spark" } }, { "match": { "content": "java" } } ] } }
3)es中除了most_fields和beast_fields以外,使用cross_fields的情况还是比较多的,使用es系统中默认的cross_fields策略实质是将 "fields": ["name","content"]两个字段的内容放到一起后建立索引,这样就能通过一个fullField字段进行fullText,使结果更加准确
搜索参数: GET /index2/type2/_search { "query": { "multi_match": { "query": "happening like", //query中的搜索词条去content和name两个字段中来匹配,不过会由于两个字段mapping定义不同导致得分不同,排序结果可能有差异 "fields": ["name","content"], //best_fields策略是每个document的得分等于得分最高的match field的值;而匹配出最佳以后,其它document得分未必准确;most_fields根据每个field的评分计算出ducoment的综合评分 "type":"cross_fields", "operator":"and" } } } 结果: { "took": 36, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 0.84968257, "hits": [ { "_index": "index2", "_type": "type2", "_id": "2", "_score": 0.84968257, "_source": { "num": 10, "title": "他的名字", "name": "yes happening like write", "content": "happening like" } }, { "_index": "index2", "_type": "type2", "_id": "4", "_score": 0.8164005, "_source": { "num": 1000, "title": "我的名字", "name": "happening like write", "content": "happening hello like yeas and he happening like had read a lot about happening hello like" } }, { "_index": "index2", "_type": "type2", "_id": "3", "_score": 0.5063205, "_source": { "num": 105, "title": "这是谁的名字", "name": "happening like write", "content": " national treasure because of its rare number and cute appearance. Many foreign people are so crazy about pandas and they can’t watching these lovely creatures all the time. Though some action" } } ] } }
四:提升全文检索效果的两种方法
1) 使用boost提升检索分值
GET index3/type3/_search { "query": { "bool": { "should": [ { "match": { "content": { "query": "from", "boost":5//使用boost将term检索评分提升5倍 } } },{ "match": { "content": { "query": "foot"//如果不使用boost则搜索foot则会得分较高 } } } ] } } } 结果: { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1.3150566, "hits": [ { "_index": "index3", "_type": "type3", "_id": "1", "_score": 1.3150566, "_source": { "date": "2019-01-02", "name": "the little", "content": "Half the hello book ideas in his talk were plagiarized from an article I wrote last month.", "no": "123" } }, { "_index": "index3", "_type": "type3", "_id": "5", "_score": 1.3114156, "_source": { "date": "2019-05-01", "name": "http litty", "content": "There are hello moments in life when you miss book someone so much that you just want to pick them from your dreams", "no": "564", "description": "描述" } }, { "_index": "index3", "_type": "type3", "_id": "3", "_score": 0.28582606, "_source": { "date": "2019-07-01", "name": "very tag", "content": "Some of our hello comrades love book to write long articles with no substance, very much like the foot bindings of a slattern, long as well as smelly", "no": "123" } } ] } }
2)使用boosting的positive和negative进行反向筛选,通过设置 (negative_boost:0.5) 降低分值
GET index3/type3/_search { "query": { "boosting": { //正常匹配的 "positive": { "match": { "content": "from" } }, //降低分值去匹配的,以下字段的分值乘以negative_boost值 "negative": { "match": { "content": { "query": "Half" } } }, "negative_boost": 0.1 } } } 结果: { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.26228312, "hits": [ { "_index": "index3", "_type": "type3", "_id": "5", "_score": 0.26228312, "_source": { "date": "2019-05-01", "name": "http litty", "content": "There are hello moments in life when you miss book someone so much that you just want to pick them from your dreams", "no": "564", "description": "描述" } }, { "_index": "index3", "_type": "type3", "_id": "1", "_score": 0.026301134, "_source": { "date": "2019-01-02", "name": "the little", "content": "Half the hello book ideas in his talk were plagiarized from an article I wrote last month.", "no": "123" } } ] } }