Lucene.Net入门教程及示例

本人看到这篇非常不错的Lucene.Net入门基础教程，就转载分享一下给大家来学习，
希望大家在工作实践中可以用到。

一.简单的例子

//索引
Private void Index()
{
    IndexWriter writer = new IndexWriter(@"E:\Index", new StandardAnalyzer());
    Document doc = new Document();
    doc.Add(new Field("Text","哦耶,美丽的姑娘。", Field.Store.YES, Field.Index.TOKENIZED));
    writer.AddDocument(doc);
    writer.Close();
}

//搜索
Private void Search(string words)
{
    IndexSearcher searcher = new IndexSearcher(@"E:\Index");
    Query query = new QueryParser(“Text”, new StandardAnalyzer()).Parse(words);
    Hits hits = searcher.Search(query);
    for (int i = 0; i < hits.Length(); i )
        System.Console.WriteLine(hits.Doc(i).GetField("Text").StringValue();
    searcher.Close();
}

二．初识Lucene
1. Lucene是什么
Lucene是一个高性能的、可扩展的信息检索工具包。它只是Java类库，并不是现成的应用程序。它提供简单易用却十分强大的API接口，基于它你可以快速的构建功能强大的搜索程序（搜索引擎？）。当前最新版2.9.2.1。

2. 什么是索引
为了实现快速的搜索，Lucene会首先将需要处理的数据以一种称为倒排索引（Inverted Index）的数据结构进行存储。怎样理解倒排索引呢？简单的说，倒排索引并不是回答“这个文档中包含哪些单词？”这个问题，而是经过优化以后用来快速回答“哪些文档包含词XX？”这个问题。就像需要给书籍整理一份供快速查找的目录一样，Lucene也得为需要被搜索的数据整理优化出一份索引文件(Index file)，而这个过程称之为“索引”(Indexing)。

3. Lucene的核心类
索引过程：
IndexWriter Directory Analyzer Document Field
搜索过程：
IndexSearcher Term Query TermQuery Hits

三.索引
1. 索引过程的流程图:

注:Lucene索引过程分为三个主要的操作阶段：将数据换转成文本、分析文本、并将分析过的文本保存到索引库中

2. 基本的索引操作
2.1添加索引
Document
Field（理解Field的参数）
异构Document
追加域
增量索引
2.2删除索引
软删除，仅添加了删除标记。调用 IndexWriter.Optimize() 后真正删除。

IndexReader reader = IndexReader.Open(directory);

// 删除指定序号(DocId)的 Document。
reader.Delete(123);

// 删除包含指定 Term 的 Document。
reader.Delete(new Term(FieldValue, "Hello"));

// 恢复软删除。
reader.UndeleteAll();

reader.Close();

2.3更新索引
事实上，Lucene没有更新索引的方法
更新 = 删除 + 添加
提示：当删除和添加多个Document对象时，最好进行批量处理。这样做的速度总是比交替的删除和添加操作的速度快得多。

//只需将 create 参数设为 false，即可往现有索引库添加新数据。
Directory directory = FSDirectory.GetDirectory("index", false);
IndexWriter writer = new IndexWriter(directory, analyzer, false);
writer.AddDocument(doc1);
writer.AddDocument(doc2);
writer.Optimize();
writer.Close();

3. 加权(boosing)
可以给 Document 和 Field 增加权重(Boost)，使其在搜索结果排名更加靠前。缺省情况下，搜索结果以 Document.Score 作为排序依据，该数值越大排名越靠前。Boost 缺省值为 1。
Score = Score * Boost
通过上面的公式，我们就可以设置不同的权重来影响排名。
如下面的例子中根据 VIP 级别设定不同的权重。