Lucene4:了解评分(explain)机制
1. 要求
使用explain()方法深入理解搜索结果评分。
使用explain()理解搜索结果评分,通过此方法可以方便地看到评分计算的内部运作,
但它需要的开销是和查询操作一样的。
2. 实现代码
package com.clzhang.sample.lucene; import java.io.File; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.search.Explanation; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import org.apache.lucene.queryparser.classic.QueryParser; //import org.wltea.analyzer.lucene.IKAnalyzer; import com.chenlb.mmseg4j.Dictionary; import com.chenlb.mmseg4j.analysis.ComplexAnalyzer; import org.junit.Test; /** * 使用explain()理解搜索结果评分,通过此方法可以方便地看到评分计算的内部运作, * 但它需要的开销是和查询操作一样的。 * @author Administrator * */ public class ExplainerDemo { // mmseg4j字典路径 private static final String MMSEG4J_DICT_PATH = "C:\\solr\\news\\conf"; private static Dictionary dictionary = Dictionary.getInstance(MMSEG4J_DICT_PATH); // Lucene索引存放路径 private static final String LUCENE_INDEX_DIR = "C:\\solr\\news\\data\\index"; @Test public void explainIt() throws Exception { String keyword = "苏州"; FSDirectory directory = FSDirectory.open(new File(LUCENE_INDEX_DIR)); DirectoryReader ireader = DirectoryReader.open(directory); IndexSearcher searcher = new IndexSearcher(ireader); QueryParser parser = new QueryParser(Version.LUCENE_41, "text", new ComplexAnalyzer(dictionary)); Query query = parser.parse(keyword); System.out.println("Query: " + keyword); TopDocs topDocs = searcher.search(query, 10); for (ScoreDoc match : topDocs.scoreDocs) { // Generate Explanation Explanation explanation = searcher.explain(query, match.doc); Document doc = searcher.doc(match.doc); System.out.println("----------------------"); System.out.println(doc.get("webTitle")); // Output Explanation System.out.println(explanation.toString()); } ireader.close(); directory.close(); } }
输出:
Query: 苏州 ---------------------- 苏州市司法局为何敢于如此明目张胆指鹿为马? 1525.8616 = (MATCH) weight(text:苏州 in 32116) [DefaultSimilarity], result of: 1525.8616 = score(doc=32116,freq=57.0 = termFreq=57.0 ), product of: 0.99999994 = queryWeight, product of: 6.315791 = idf(docFreq=313, maxDocs=63907) 0.15833329 = queryNorm 1525.8617 = fieldWeight in 32116, product of: 7.5498343 = tf(freq=57.0), with freq of: 57.0 = termFreq=57.0 6.315791 = idf(docFreq=313, maxDocs=63907) 32.0 = fieldNorm(doc=32116) ---------------------- 给中共临汾市纪检委陈国荣的实名举报信 1010.52655 = (MATCH) weight(text:苏州 in 5075) [DefaultSimilarity], result of: 1010.52655 = score(doc=5075,freq=1.0 = termFreq=1.0 ), product of: 0.99999994 = queryWeight, product of: 6.315791 = idf(docFreq=313, maxDocs=63907) 0.15833329 = queryNorm 1010.5266 = fieldWeight in 5075, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 6.315791 = idf(docFreq=313, maxDocs=63907) 160.0 = fieldNorm(doc=5075) ----------------------
说明:
上述结果中的fieldNorm值都非常大,是因为在创建索引的时候,已经把包含负面关键词的field的boost值设置为较高的值。这样在查询时,包括负面关键词的东西会显示在前面。