ES 相似度算法设置(续)
Tuning BM25
One of the nice features of BM25 is that, unlike TF/IDF, it has two parameters that allow it to be tuned:
k1
- This parameter controls how quickly an increase in term frequency results in term-frequency saturation. The default value is
1.2
. Lower values result in quicker saturation, and higher values in slower saturation. b
- This parameter controls how much effect field-length normalization should have. A value of
0.0
disables normalization completely, and a value of1.0
normalizes fully. The default is0.75
.
The practicalities of tuning BM25 are another matter. The default values for k1
and b
should be suitable for most document collections, but the optimal values really depend on the collection. Finding good values for your collection is a matter of adjusting, checking, and adjusting again.
The similarity algorithm can be set on a per-field basis. It’s just a matter of specifying the chosen algorithm in the field’s mapping:
PUT /my_index { "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "BM25"

}, "body": { "type": "string", "similarity": "default"

} } } }
The |
|
The |
Currently, it is not possible to change the similarity
mapping for an existing field. You would need to reindex your data in order to do that.
Configuring a similarity is much like configuring an analyzer. Custom similarities can be specified when creating an index. For instance:
PUT /my_index { "settings": { "similarity": { "my_bm25": {

"type": "BM25", "b": 0

} } }, "mappings": { "doc": { "properties": { "title": { "type": "string", "similarity": "my_bm25"

}, "body": { "type": "string", "similarity": "BM25"

} } } } }
参考:https://www.elastic.co/guide/en/elasticsearch/guide/current/changing-similarities.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」