lucene update流程源码分析


update操作buffer到DocumentsWriterDeleteQueue里,flush时处理deletes.
DocumentsWriterDeleteQueue使用global DeleteSlice和DWPT DeleteSlice存储deletes。
DWPT DeleteSlice
用来更新DWPT绑定的unFlushed segment中docID小于docIdUpTo的docs.
buffer
IndexWriter.updateDocuments更新docs时,update term转化成delete TermNode.
DWPT updateDocuments时,consumer.processDocument之前, 记录docsInRamBefore作为delete docIdUpTo.
consumer.processDocument把new docs(es需把完整的docs传入)写入buffered segment.
finishDocuments时, 将deleteNode加入DWPT DeleteSlice和global DeleteSlice.
applyDeletes
DWPT flush该segment时,传入该segment BufferedUpdates,构造private segment的FrozenBufferedUpdates。
FreqProxTermsWriter.applyDeletes中读取term postings,处理pendingUpdates到小于docIDUpTo的docs。liveDocs中对应的bit设为0。

global DeleteSlice
用来更新flushed segments.
buffer
IndexWriter.deleteDocuments by Terms, deleteDocuments by Queries, updateDocValues时,
TermArrayNode,QueryArrayNode, DocValuesUpdatesNode直接add到global DeleteSlice.
IndexWriter.updateDocuments的delete TermNode既加入DWPT DeleteSlice,也加入global DeleteSlice.
global DeleteSlice中deleteNode的docIdUpTo为MAX INT.
buffer过大,或prepareFlush时,将globalBufferedUpdates构造global FrozenBufferedUpdates。
applyDeletes
IndexWriter flush segment后, applyAllDeletesAndUpdates时,
通过FrozenBufferedUpdates.applyTermDeletes,FrozenBufferedUpdates.applyQueryDeletes,处理global FrozenBufferedUpdates.
FrozenBufferedUpdates.applyTermDeletes
delete by terms, 直接删除包含exact terms的docs.
遍历segments,直接读取term的postings. 遍历postings删除doc。
FrozenBufferedUpdates.applyQueryDeletes
delete by queries, 走query流程,PhraseQuery等会analyze, normalize.
TermQuery情况下,等同于applyTermDeletes。
FrozenBufferedUpdates.

参考:
lucene8.7.0
posted @ 2021-02-28 19:58  vsop_479  阅读(160)  评论(0编辑  收藏  举报