编程点滴.LUCENE高亮代码
我们使用搜索引擎(如谷歌,百度)都会在检索结果页高亮显示检索词.这种高亮显示很醒目,能够让我们迅速的关注到我们需要的信息上.
Lucene 的contrib已经包含了这样的功能模块
Highlighter
在检索结果中实现高亮的代码:
public void testHits() throws Exception { IndexSearcher searcher = new IndexSearcher(TestUtil.getBookIndexDirectory()); TermQuery query = new TermQuery(new Term("title", "action")); TopDocs hits = searcher.search(query, 10); QueryScorer scorer = new QueryScorer(query, "title"); Highlighter highlighter = new Highlighter(scorer); highlighter.setTextFragmenter( new SimpleSpanFragmenter(scorer)); Analyzer analyzer = new SimpleAnalyzer(); for (int i = 0; i < hits.scoreDocs.length; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String title = doc.get("title"); TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), hits.scoreDocs[i].doc, "title", doc, analyzer); String fragment = highlighter.getBestFragment(stream, title); System.out.println(fragment); } } //输出 //JUnit in <B>Action</B> //Lucene in <B>Action</B> //Tapestry in <B>Action</B>
FastVectorHighlighter
顾名思义,FastVectorHighlighter是一个快速的高亮工具,相对于Highlighter它有三个好处:
1.FastVectorHighlighter can support fields that are tokenized by n-gram tokenizers. Highlighter cannot support such fields very well.
2.FastVectorHighlighter 可以输出不同颜色的高亮.
3.FastVectorHighlighter 可以对词组高亮.(如检索lazy dog,FastVectorHighlighter<b>lazy dog</b>,而Highlighter则是<b>dog</b>)
FastVectorHighlighter代码:
public class FastVectorHighlighterSample { static final String[] DOCS = { // #A "the quick brown fox jumps over the lazy dog", // #A "the quick gold fox jumped over the lazy black dog", // #A "the quick fox jumps over the black dog", // #A "the red fox jumped over the lazy dark gray dog" // #A }; static final String QUERY = "quick OR fox OR \"lazy dog\"~1"; // #B static final String F = "f"; static Directory dir = new RAMDirectory(); static Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); public static void main(String[] args) throws Exception { if (args.length != 1) { System.err.println("Usage: FastVectorHighlighterSample <filename>"); System.exit(-1); } makeIndex(); // #C searchIndex(args[0]); // #D } static void makeIndex() throws IOException { IndexWriter writer = new IndexWriter(dir, analyzer, true, MaxFieldLength.LIMITED); for(String d : DOCS){ Document doc = new Document(); doc.add(new Field(F, d, Store.YES, Index.ANALYZED, TermVector.WITH_POSITIONS_OFFSETS)); writer.addDocument(doc); } writer.close(); } static void searchIndex(String filename) throws Exception { QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, F, analyzer); Query query = parser.parse(QUERY); FastVectorHighlighter highlighter = getHighlighter(); // #F FieldQuery fieldQuery = highlighter.getFieldQuery(query); // #G IndexSearcher searcher = new IndexSearcher(dir); TopDocs docs = searcher.search(query, 10); FileWriter writer = new FileWriter(filename); writer.write("<html>"); writer.write("<body>"); writer.write("<p>QUERY : " + QUERY + "</p>"); for(ScoreDoc scoreDoc : docs.scoreDocs) { String snippet = highlighter.getBestFragment( // #H fieldQuery, searcher.getIndexReader(), // #H scoreDoc.doc, F, 100 ); // #H if (snippet != null) { // #I writer.write(scoreDoc.doc + " : " + snippet + "<br/>"); // #I } } writer.write("</body></html>"); writer.close(); searcher.close(); } static FastVectorHighlighter getHighlighter() { FragListBuilder fragListBuilder = new SimpleFragListBuilder(); // #J FragmentsBuilder fragmentBuilder = // #K new ScoreOrderFragmentsBuilder( // #K BaseFragmentsBuilder.COLORED_PRE_TAGS, // #K BaseFragmentsBuilder.COLORED_POST_TAGS); // #K return new FastVectorHighlighter(true, true, // #L fragListBuilder, fragmentBuilder); // #L } } #A 示例文档 #B 示例查询语句 #C 创建索引 #D 检索并打印结果 #E Store.YES 并且 TermVector.WITH_POSITIONS_OFFSETS #F 获得一个 FastVectorHighlighter实例 #G 创建FieldQuery #H 高亮片断 #I 打印高亮后片断 #J 创建 SimpleFragListBuilder #K 创建多颜色标签 ScoreOrderFragmentsBuilder #L 创建 FastVectorHighlighter 实例
LUCENE.NET QQ交流群(81361051)