lucene[java] 搜索框架初步

        一般的索引结构建立的是一种“文档到单词”的映射关系,而倒排索引建立的则是一种“单词到文档”的映射关系。因为在日常的检索中,通常都是按照关键字进行搜索的,所以,倒排索引可以更好地适合这种检索机制的需要。这也是倒排索引如今被大规模使用的原因.

BuildIndex

import java.io.File;
import java.io.IOException;
import java.util.Date;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.demo.FileDocument;
import org.apache.lucene.index.IndexWriter;
public class BuildIndex
{
	public static void main(String[] args)
	{
		//计时用
		Date start = new Date();
		try
		{	
			//建立索引目录
			IndexWriter writer = new IndexWriter("C:\\IndexDir", new SimpleAnalyzer(),true);
		
			//索引的文本文档
			File file = new File("C:\\IndexData.txt");

			//¨将文档添加到索引种以及优化
			System.out.println("adding " + file);
			writer.addDocument(FileDocument.Document(file));
			writer.optimize();
			writer.close();
		} catch (IOException e)
		{
			e.printStackTrace();
		}
		//结束时间
		Date end = new Date();
		//
		System.out.print(end.getTime() - start.getTime());
		System.out.println(" total milliseconds");
	}
}
 
DoSearch
import org.apache.lucene.store.*;
import org.apache.lucene.document.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.index.*;
import org.apache.lucene.search.*;
import org.apache.lucene.queryParser.*;
class DoSearch
{
	public static void main(String[] args)
	{
		try
		{
			//建立索引
			Directory directory = new RAMDirectory();
			//生成分析器对象,用于分词等
			Analyzer analyzer = new SimpleAnalyzer();
			//索引书写器
			IndexWriter writer = new IndexWriter(directory, analyzer, true);
			//建立索引
			String[] docs =
			{
			  "a b c d e",
			  "a b c d e a b c d e",
			  "a b c d e f g h i j",
			  "a c e",
			  "e c a",
			  "a c e a c e",
			  "a c e a b c"
			};
			for (int j = 0; j < docs.length; j++)
			{
				Document d = new Document();
				d.add(Field.Text("contents", docs[j]));
				writer.addDocument(d);
			}
			writer.close();
			//生成搜索对象
			Searcher searcher = new IndexSearcher(directory);
			//生成结果集对象,初始化为空值
			String[] queries = {
				"\"a c e\"",
			};

			//¨¦¨²3¨¦?¨¢1??¡¥???¨®¡ê?3?¨º??¡¥?a???¦Ì
			Hits hits = null;
			//生成QueryParser 对象, 分词
			QueryParser parser = new QueryParser("contents",analyzer);

			//依次使用查询字符串生成查询对象Query
			for (int j = 0; j < queries.length; j++)
			{
				Query query = parser.parse(queries[j]);
				System.out.println("Query: " + query.toString("contents"));
				//结果集
				hits = searcher.search(query);
				//输出搜索到的总文档数
				System.out.println(hits.length() + " total results");
				//依次输出搜索到的文档的内容
				for (int i = 0 ; i < hits.length() && i < 10; i++)
				{
					Document d = hits.doc(i);
					System.out.println(i + " " + hits.score(i)+ " " +d.get("contents"));
			 	}
			}
			searcher.close();
		} catch (Exception e)
		{
			System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage());
		}
	}
}
最基础的就这两个部分了吧. 选自<<征服ajax和lucene框架搜索>>.
再参考吧, 当需要的时候再参考之.. 可在百度文库下载完整pdf资料阅读:
http://wenku.baidu.com/search?word=%D5%F7%B7%FEajax%20lucene&lm=0&od=0

posted on 2010-08-21 09:06  amojry  阅读(846)  评论(0编辑  收藏  举报