lucene[java] 搜索框架初步
一般的索引结构建立的是一种“文档到单词”的映射关系,而倒排索引建立的则是一种“单词到文档”的映射关系。因为在日常的检索中,通常都是按照关键字进行搜索的,所以,倒排索引可以更好地适合这种检索机制的需要。这也是倒排索引如今被大规模使用的原因.
BuildIndex
import java.io.File; import java.io.IOException; import java.util.Date; import org.apache.lucene.analysis.SimpleAnalyzer; import org.apache.lucene.demo.FileDocument; import org.apache.lucene.index.IndexWriter; public class BuildIndex { public static void main(String[] args) { //计时用Date start = new Date(); try { //建立索引目录 IndexWriter writer = new IndexWriter("C:\\IndexDir", new SimpleAnalyzer(),true); //索引的文本文档 File file = new File("C:\\IndexData.txt"); //¨将文档添加到索引种以及优化System.out.println("adding " + file); writer.addDocument(FileDocument.Document(file)); writer.optimize(); writer.close(); } catch (IOException e) { e.printStackTrace(); } //结束时间Date end = new Date(); //System.out.print(end.getTime() - start.getTime()); System.out.println(" total milliseconds"); } }
DoSearch
import org.apache.lucene.store.*; import org.apache.lucene.document.*; import org.apache.lucene.analysis.*; import org.apache.lucene.index.*; import org.apache.lucene.search.*; import org.apache.lucene.queryParser.*; class DoSearch { public static void main(String[] args) { try { //建立索引Directory directory = new RAMDirectory(); //生成分析器对象,用于分词等Analyzer analyzer = new SimpleAnalyzer(); //索引书写器IndexWriter writer = new IndexWriter(directory, analyzer, true); //建立索引String[] docs = { "a b c d e", "a b c d e a b c d e", "a b c d e f g h i j", "a c e", "e c a", "a c e a c e", "a c e a b c" }; for (int j = 0; j < docs.length; j++) { Document d = new Document(); d.add(Field.Text("contents", docs[j])); writer.addDocument(d); } writer.close(); //生成搜索对象Searcher searcher = new IndexSearcher(directory); //生成结果集对象,初始化为空值String[] queries = { "\"a c e\"", }; //¨¦¨²3¨¦?¨¢1??¡¥???¨®¡ê?3?¨º??¡¥?a???¦Ì Hits hits = null; //生成QueryParser 对象, 分词QueryParser parser = new QueryParser("contents",analyzer); //依次使用查询字符串生成查询对象Queryfor (int j = 0; j < queries.length; j++) { Query query = parser.parse(queries[j]); System.out.println("Query: " + query.toString("contents")); //结果集hits = searcher.search(query); //输出搜索到的总文档数 System.out.println(hits.length() + " total results"); //依次输出搜索到的文档的内容for (int i = 0 ; i < hits.length() && i < 10; i++) { Document d = hits.doc(i); System.out.println(i + " " + hits.score(i)+ " " +d.get("contents")); } } searcher.close(); } catch (Exception e) { System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage()); } } }
最基础的就这两个部分了吧. 选自<<征服ajax和lucene框架搜索>>.
再参考吧, 当需要的时候再参考之.. 可在百度文库下载完整pdf资料阅读:
http://wenku.baidu.com/search?word=%D5%F7%B7%FEajax%20lucene&lm=0&od=0
*******************************
***** Never ever let you down. *****
*******************************