Lucene.Net中 FSDirectory存储方式下一个 Document是如何得到的

防止忘记的最好的方法就是记下来。

这是一段最简单的搜索代码:

public void Search()
{
    var dir=FSDirectory.Open(new DirectoryInfo("xxx"));
    var searcher = new IndexSearcher(dir, true);
    var query = new TermQuery(new Term("Title", "jinzhao"));
    var tops=searcher.Search(query,100);
    foreach(var top in tops)
    {
        var doc=searcher.Doc(top);
        Output(doc);
    }
}

 红色的一句话就返回了一个完整document,是search内部的IndexReader(Lucene.Net.Index.IndexReader)返回的document,方法如下:

public abstract Document Document(int n, FieldSelector fieldSelector);

下面是这个类的实现:

他们的关系如下:

MultiReader和ParallelReader维护了IndexReader的一个集合(这些IndexReader可能由下面几重实现,但是不包含SegmentReader),封装了访问多个reader的方式,原理就是lucene里最常见的偏移的方式;

DirectoryReader等除SegmentReader外模拟的是一个目录,就像索引文件夹一样,它维护了一组SegmentReader的实现,原理如上;

SegmentReader是读取文档的最小单位它不再维护任何子的IndexReader,接收到ID后就会读取通过public sealed class FieldsReader 读取这个文档的字段(Lucene的核心就是文档,一个文档由若干字段组成),这里加载方式有立即加载、立即加载指定字段、懒加载等其它几种,方法如下:

public /*internal*/ Document Doc(int n, FieldSelector fieldSelector)
{
    SeekIndex(n);
    long position = indexStream.ReadLong();
    fieldsStream.Seek(position);
     
    Document doc = new Document();
    int numFields = fieldsStream.ReadVInt();
    for (int i = 0; i < numFields; i++)
    {
        int fieldNumber = fieldsStream.ReadVInt();
        FieldInfo fi = fieldInfos.FieldInfo(fieldNumber);
        FieldSelectorResult acceptField = fieldSelector == null?FieldSelectorResult.LOAD:fieldSelector.Accept(fi.name);
         
        byte bits = fieldsStream.ReadByte();
        System.Diagnostics.Debug.Assert(bits <= FieldsWriter.FIELD_IS_COMPRESSED + FieldsWriter.FIELD_IS_TOKENIZED + FieldsWriter.FIELD_IS_BINARY);
         
        bool compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0;
        bool tokenize = (bits & FieldsWriter.FIELD_IS_TOKENIZED) != 0;
        bool binary = (bits & FieldsWriter.FIELD_IS_BINARY) != 0;
        //TODO: Find an alternative approach here if this list continues to grow beyond the
        //list of 5 or 6 currently here.  See Lucene 762 for discussion
        if (acceptField.Equals(FieldSelectorResult.LOAD))
        {
            AddField(doc, fi, binary, compressed, tokenize);
        }
        else if (acceptField.Equals(FieldSelectorResult.LOAD_FOR_MERGE))
        {
            AddFieldForMerge(doc, fi, binary, compressed, tokenize);
        }
        else if (acceptField.Equals(FieldSelectorResult.LOAD_AND_BREAK))
        {
            AddField(doc, fi, binary, compressed, tokenize);
            break; //Get out of this loop
        }
        else if (acceptField.Equals(FieldSelectorResult.LAZY_LOAD))
        {
            AddFieldLazy(doc, fi, binary, compressed, tokenize);
        }
        else if (acceptField.Equals(FieldSelectorResult.SIZE))
        {
            SkipField(binary, compressed, AddFieldSize(doc, fi, binary, compressed));
        }
        else if (acceptField.Equals(FieldSelectorResult.SIZE_AND_BREAK))
        {
            AddFieldSize(doc, fi, binary, compressed);
            break;
        }
        else
        {
            SkipField(binary, compressed);
        }
    }
     
    return doc;
}

标红的是一个IndexInput的实现,它是具体读取的方法,实现一般在存储类中以嵌套公开的方式实现,比如此处例子的实现如下:

public /*protected internal*/class SimpleFSIndexInput : BufferedIndexInput, System.ICloneable
{
 
    protected internal class Descriptor : System.IO.BinaryReader
    {
        // remember if the file is open, so that we don't try to close it
        // more than once
        protected internal volatile bool isOpen;
        internal long position;
        internal long length;
 
        public Descriptor(/*FSIndexInput enclosingInstance,*/ System.IO.FileInfo file, System.IO.FileAccess mode)
            : base(new System.IO.FileStream(file.FullName, System.IO.FileMode.Open, mode, System.IO.FileShare.ReadWrite))
        {
            isOpen = true;
            length = file.Length;
        }
 
        public override void Close()
        {
            if (isOpen)
            {
                isOpen = false;
                base.Close();
            }
        }
 
        ~Descriptor()
        {
            try
            {
                Close();
            }
            finally
            {
            }
        }
    }

 可以看到最后字段由System.IO.BinaryReader到文件中读取。

完。

 

 

 

 

posted @   today4king  阅读(429)  评论(0编辑  收藏  举报
编辑推荐:
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
阅读排行:
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 地球OL攻略 —— 某应届生求职总结
· 提示词工程——AI应用必不可少的技术
· Open-Sora 2.0 重磅开源!
· 周边上新:园子的第一款马克杯温暖上架
点击右上角即可分享
微信分享提示
主题色彩