lucene segment会包含所有的索引文件,如tim tip等,可以认为是mini的独立索引

A Lucene index segment can be viewed as a "mini" index or a shard. Each segment is a collection of all needed files for an index, including .tim and .tip. If you list your Lucene index directory, you'll see files belonging to the same segment have the same names with all different types. In fact, if you force a merge, you'll get an index of one single segment.

Each segment  contains an index of a subset of your document collection. Lucene usually creates a new segment when new documents are added to a working index, to avoid (or rather delay and batch later) reindexing cost.

When a search is executed, Lucene will fan that query over all segments, and all the index wide statistics required for relevance ranking (such as idf) are combined, so from the client's perspective, the ranking is the same as searching from an index of one segment. Note that the other famous stat, tf, is per-document, so it is already available at the segment reader layer.

Now things get more interesting when you have Lucene indexes across machines (as the case in Solr Cloud, which is one of the distributed search service built on Lucene). Due to performance and complexity, Solr Cloud don't aggregate global stats across clusters (yet), so each machine would use their own stats on the index it holds (which could be consisted of multiple segments :).

 

摘自:https://www.quora.com/Are-the-individual-tim-and-tip-files-term-dictionaries-of-a-Lucene-index-segment-updated-when-a-new-segment-is-added-to-Lucene

posted @   bonelee  阅读(833)  评论(0编辑  收藏  举报
编辑推荐:
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」
点击右上角即可分享
微信分享提示