编程点滴.LUCENE高亮代码

我们使用搜索引擎(如谷歌,百度)都会在检索结果页高亮显示检索词.这种高亮显示很醒目,能够让我们迅速的关注到我们需要的信息上.

image

Lucene 的contrib已经包含了这样的功能模块

Highlighter

在检索结果中实现高亮的代码:

public void testHits() throws Exception {
IndexSearcher searcher = new IndexSearcher(TestUtil.getBookIndexDirectory());
TermQuery query = new TermQuery(new Term("title", "action"));
TopDocs hits = searcher.search(query, 10);
QueryScorer scorer = new QueryScorer(query, "title");
Highlighter highlighter = new Highlighter(scorer);
highlighter.setTextFragmenter(
new SimpleSpanFragmenter(scorer));
Analyzer analyzer = new SimpleAnalyzer();
for (int i = 0; i < hits.scoreDocs.length; i++) {
Document doc = searcher.doc(hits.scoreDocs[i].doc);
String title = doc.get("title");
TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
hits.scoreDocs[i].doc,
"title",
doc,
analyzer);
String fragment =
highlighter.getBestFragment(stream, title);
System.out.println(fragment);
}
}
//输出
//JUnit in <B>Action</B>
//Lucene in <B>Action</B>
//Tapestry in <B>Action</B>

 

FastVectorHighlighter

顾名思义,FastVectorHighlighter是一个快速的高亮工具,相对于Highlighter它有三个好处:

1.FastVectorHighlighter can support fields that are tokenized by n-gram tokenizers. Highlighter cannot support such fields very well.

2.FastVectorHighlighter 可以输出不同颜色的高亮.

3.FastVectorHighlighter 可以对词组高亮.(如检索lazy dog,FastVectorHighlighter<b>lazy dog</b>,而Highlighter则是<b>dog</b>)

image

FastVectorHighlighter代码:

public class FastVectorHighlighterSample {
static final String[] DOCS = { // #A
"the quick brown fox jumps over the lazy dog", // #A
"the quick gold fox jumped over the lazy black dog", // #A
"the quick fox jumps over the black dog", // #A
"the red fox jumped over the lazy dark gray dog" // #A
};
static final String QUERY = "quick OR fox OR \"lazy dog\"~1"; // #B
static final String F = "f";
static Directory dir = new RAMDirectory();
static Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
public static void main(String[] args) throws Exception {
if (args.length != 1) {
System.err.println("Usage: FastVectorHighlighterSample <filename>");
System.exit(-1);
}
makeIndex(); // #C
searchIndex(args[0]); // #D
}
static void makeIndex() throws IOException {
IndexWriter writer = new IndexWriter(dir, analyzer, true, MaxFieldLength.LIMITED);

for(String d : DOCS){
Document doc = new Document();
doc.add(new Field(F, d, Store.YES, Index.ANALYZED,
TermVector.WITH_POSITIONS_OFFSETS));
writer.addDocument(doc);
}
writer.close();
}
static void searchIndex(String filename) throws Exception {
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
F, analyzer);
Query query = parser.parse(QUERY);
FastVectorHighlighter highlighter = getHighlighter(); // #F
FieldQuery fieldQuery = highlighter.getFieldQuery(query); // #G
IndexSearcher searcher = new IndexSearcher(dir);
TopDocs docs = searcher.search(query, 10);
FileWriter writer = new FileWriter(filename);
writer.write("<html>");
writer.write("<body>");
writer.write("<p>QUERY : " + QUERY + "</p>");
for(ScoreDoc scoreDoc : docs.scoreDocs) {
String snippet = highlighter.getBestFragment( // #H
fieldQuery, searcher.getIndexReader(), // #H
scoreDoc.doc, F, 100 ); // #H
if (snippet != null) { // #I
writer.write(scoreDoc.doc + " : " + snippet + "<br/>"); // #I
}
}
writer.write("</body></html>");
writer.close();
searcher.close();
}
static FastVectorHighlighter getHighlighter() {
FragListBuilder fragListBuilder = new SimpleFragListBuilder(); // #J
FragmentsBuilder fragmentBuilder = // #K
new ScoreOrderFragmentsBuilder( // #K
BaseFragmentsBuilder.COLORED_PRE_TAGS, // #K
BaseFragmentsBuilder.COLORED_POST_TAGS); // #K
return new FastVectorHighlighter(true, true, // #L
fragListBuilder, fragmentBuilder); // #L
}
}
#A 示例文档
#B 示例查询语句
#C 创建索引
#D 检索并打印结果
#E Store.YES 并且 TermVector.WITH_POSITIONS_OFFSETS
#F 获得一个 FastVectorHighlighter实例
#G 创建FieldQuery
#H 高亮片断
#I 打印高亮后片断
#J 创建 SimpleFragListBuilder
#K 创建多颜色标签 ScoreOrderFragmentsBuilder
#L 创建 FastVectorHighlighter 实例

 

LUCENE.NET QQ交流群(81361051)

posted @ 2010-09-13 18:00  寒 刚入门  阅读(869)  评论(0编辑  收藏  举报
刚入门的寒