BooleanQuery与TermInSetQuery分析

booleanQuery:
"must" : [
{ "term" : { "like" : "cooking" } },
{ "term" : { "property" : "bike" } }
]
termInsetQuery:
{ "terms": {"like": [ "cooking", "fishing", "swimming"]}}

BooleanQuery
BooleanWeight
TermQuery.scorer
查询单个term, 参考查询分析.
https://www.cnblogs.com/vsop/p/12152207.html

将子查询结果按照must, should, filter, must_not分类.
根据should, must, filter数量进行:
pure conjunction.
pure disjunction.
conjunction-disjunction mix.

BlockMaxConjunctionScorer
对多个postings求交集, advance时, 使用skipper.
approximations: 每个子查询的结果集.
BlockMaxConjunctionScorer$DocIdSetIterator.
如果每个approximation都有doc, 则命中该doc.
score: 该doc在每个子查询的score求和。
moveToNextBlock:target > upTo(当前block最大doc)时,move to next block。
advanceShallow(target):每个子结果都move到包含target的block。
BlockImpactsDocsEnum 跳表实现
advanceShallow(target):advance到包含target的block上.
SlowImpactsEnum 对于结果数少的postings,不需要skipping

DisjunctionScorer
使用heap, 对多个postings求并集.
DisiPriorityQueue
取堆top, updateTop, 如果newTop等于top, 继续updateTop.
DisjunctionSumScorer, DisjunctionMaxScorer
postings合并后, 评分阶段, 一个doc有多个term时, 用来评分, sum或max。

TermInSetQuery
当terms数量不超过16时, rewrite成should boolean query.
否则使用DocIdSetBuilder构建bit set.
TermInSetQuery$ConstantScoreWeight.rewrite
DocIdSetBuilder
add(iter):
当前已有bitset, 如果iter也是FixedBitSet(bitmap, 而非roaring bitmap), 两个bitset求OR; 否则遍历iter, set到bitset.
否则遍历iter add到buffer或者达到threshold转成bitset。

参考:
lucene8.7.0
posted @ 2020-12-27 17:03  vsop_479  阅读(334)  评论(0编辑  收藏  举报