Lucene4:了解评分(explain)机制

1. 要求

使用explain()方法深入理解搜索结果评分。

使用explain()理解搜索结果评分,通过此方法可以方便地看到评分计算的内部运作,
但它需要的开销是和查询操作一样的。

2. 实现代码

package com.clzhang.sample.lucene;

import java.io.File;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.Explanation;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.apache.lucene.queryparser.classic.QueryParser;

//import org.wltea.analyzer.lucene.IKAnalyzer;
import com.chenlb.mmseg4j.Dictionary;
import com.chenlb.mmseg4j.analysis.ComplexAnalyzer;

import org.junit.Test;

/**
 * 使用explain()理解搜索结果评分,通过此方法可以方便地看到评分计算的内部运作,
 * 但它需要的开销是和查询操作一样的。
 * @author Administrator
 *
 */
public class ExplainerDemo {
    // mmseg4j字典路径
    private static final String MMSEG4J_DICT_PATH = "C:\\solr\\news\\conf";
    private static Dictionary dictionary = Dictionary.getInstance(MMSEG4J_DICT_PATH);
    
    // Lucene索引存放路径 
    private static final String LUCENE_INDEX_DIR = "C:\\solr\\news\\data\\index";
    
    @Test
    public void explainIt() throws Exception {
        String keyword = "苏州";

        FSDirectory directory = FSDirectory.open(new File(LUCENE_INDEX_DIR));
        DirectoryReader ireader = DirectoryReader.open(directory);
        IndexSearcher searcher = new IndexSearcher(ireader);
        QueryParser parser = new QueryParser(Version.LUCENE_41, "text",
                new ComplexAnalyzer(dictionary));
        Query query = parser.parse(keyword);
        System.out.println("Query: " + keyword);

        TopDocs topDocs = searcher.search(query, 10);
        for (ScoreDoc match : topDocs.scoreDocs) {
            // Generate Explanation
            Explanation explanation = searcher.explain(query, match.doc);

            Document doc = searcher.doc(match.doc);
            System.out.println("----------------------");
            System.out.println(doc.get("webTitle"));
            // Output Explanation
            System.out.println(explanation.toString());
        }
        ireader.close();
        directory.close();
    }
}

输出:

Query: 苏州
----------------------
苏州市司法局为何敢于如此明目张胆指鹿为马?
1525.8616 = (MATCH) weight(text:苏州 in 32116) [DefaultSimilarity], result of:
  1525.8616 = score(doc=32116,freq=57.0 = termFreq=57.0
), product of:
    0.99999994 = queryWeight, product of:
      6.315791 = idf(docFreq=313, maxDocs=63907)
      0.15833329 = queryNorm
    1525.8617 = fieldWeight in 32116, product of:
      7.5498343 = tf(freq=57.0), with freq of:
        57.0 = termFreq=57.0
      6.315791 = idf(docFreq=313, maxDocs=63907)
      32.0 = fieldNorm(doc=32116)

----------------------
给中共临汾市纪检委陈国荣的实名举报信
1010.52655 = (MATCH) weight(text:苏州 in 5075) [DefaultSimilarity], result of:
  1010.52655 = score(doc=5075,freq=1.0 = termFreq=1.0
), product of:
    0.99999994 = queryWeight, product of:
      6.315791 = idf(docFreq=313, maxDocs=63907)
      0.15833329 = queryNorm
    1010.5266 = fieldWeight in 5075, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      6.315791 = idf(docFreq=313, maxDocs=63907)
      160.0 = fieldNorm(doc=5075)

----------------------

说明:

上述结果中的fieldNorm值都非常大,是因为在创建索引的时候,已经把包含负面关键词的field的boost值设置为较高的值。这样在查询时,包括负面关键词的东西会显示在前面。

posted @ 2013-01-25 16:42  那些年的事儿  阅读(481)  评论(0编辑  收藏  举报