bonelee - 博客园

2017年2月27日

linkedin databus介绍——监听数据库变化，有新数据到来时通知其他消费者app，新数据存在内存里，多份快照

摘要：概要结构如下图。图中显示：Search Index和Read Replicas等系统是Databus的消费者。当主OLTP数据库发生写操作时，连接其上的中继系统会将数据拉到中继中。签入在Search Index或是缓存中的Databus消费者客户端，就会从中继中拉出数据，并更新索引或缓存。 Dat 阅读全文

posted @ 2017-02-27 20:29 bonelee 阅读(3718) 评论(0) 推荐(1)

ES忽略TF-IDF评分——使用constant_score

摘要： Ignoring TF/IDF Ignoring TF/IDF Ignoring TF/IDF Ignoring TF/IDF Sometimes we just don’t care about TF/IDF. All we want to know is that a certain word 阅读全文

posted @ 2017-02-27 19:38 bonelee 阅读(5567) 评论(0) 推荐(0)

ES设置字段搜索权重——Query-Time Boosting

摘要： Query-Time Boosting Query-Time Boosting Query-Time Boosting Query-Time Boosting In Prioritizing Clauses, we explained how you could use the boost para 阅读全文

posted @ 2017-02-27 19:23 bonelee 阅读(9834) 评论(0) 推荐(0)

lucene内置的评分函数

摘要： For multiterm queries, Lucene takes the Boolean model, TF/IDF, and the vector space model and combines them in a single efficient package that collect 阅读全文

posted @ 2017-02-27 19:16 bonelee 阅读(740) 评论(1) 推荐(0)

ES搜索排序，文档相关度评分介绍——Vector Space Model

摘要： Vector Space Model Vector Space Model The vector space model provides a way of comparing a multiterm query against a document. The output is a single 阅读全文

posted @ 2017-02-27 14:52 bonelee 阅读(549) 评论(1) 推荐(0)

摘要： Theory Behind Relevance Scoring Theory Behind Relevance Scoring Theory Behind Relevance Scoring Theory Behind Relevance Scoring Lucene (and thus Elast 阅读全文

posted @ 2017-02-27 14:46 bonelee 阅读(612) 评论(1) 推荐(0)

ES搜索排序，文档相关度评分介绍——Field-length norm

摘要： Field-length norm How long is the field? The shorter the field, the higher the weight. If a term appears in a short field, such as a title field, it i 阅读全文

posted @ 2017-02-27 14:45 bonelee 阅读(1750) 评论(1) 推荐(0)

ES 搜索结果expalain 可以类似数据库性能调优来看排序算法的选择

摘要： When we run a simple term query with explain set to true (see Understanding the Score), you will see that the only factors involved in calculating the 阅读全文

posted @ 2017-02-27 12:21 bonelee 阅读(920) 评论(0) 推荐(0)

Lucene默认的打分算法——ES默认

摘要：改变Lucene的打分模型随着Apache Lucene 4.0版本在2012年的发布，这款伟大的全文检索工具包终于允许用户修改默认的基于TF/IDF原理的打分算法。Lucene API变得更加容易修改和扩展打分公式。但是，对于文档的打分计算，Lucene并只是允许用户在打分公式上修修补补，Luc 阅读全文

posted @ 2017-02-27 11:27 bonelee 阅读(5286) 评论(0) 推荐(0)

ES 相似度算法设置（续）

摘要： Tuning BM25 Tuning BM25 One of the nice features of BM25 is that, unlike TF/IDF, it has two parameters that allow it to be tuned: k1This parameter con 阅读全文

posted @ 2017-02-27 11:14 bonelee 阅读(5361) 评论(0) 推荐(0)

将者，智、信、仁、勇、严也。

Hi，我是李智华，华为-安全AI算法专家，欢迎来到安全攻防对抗的有趣世界。

公告