创建Weight对象

Weight对象是通过IndexSearcher.createNormalizedWeight(Query query)创建的

 

 1 public Weight createNormalizedWeight(Query query) throws IOException {
 2     query = rewrite(query);
 3     Weight weight = query.createWeight(this);
 4     float v = weight.getValueForNormalization();
 5     float norm = getSimilarity().queryNorm(v);
 6     if (Float.isInfinite(norm) || Float.isNaN(norm)) {
 7       norm = 1.0f;
 8     }
 9     weight.normalize(norm, 1.0f);
10     return weight;
11   }

 

1.以TermQuery为例分析Weight对象的创建过程

TermQuery.createWeight(IndexSearcher searcher)

 1 public Weight createWeight(IndexSearcher searcher) throws IOException {
 2     final IndexReaderContext context = searcher.getTopReaderContext();
 3     final TermContext termState;
 4     if (perReaderTermState == null || perReaderTermState.topReaderContext != context) {
 5       // make TermQuery single-pass if we don't have a PRTS or if the context differs!
 6       termState = TermContext.build(context, term);
 7     } else {
 8      // PRTS was pre-build for this IS
 9      termState = this.perReaderTermState;
10     }
11 
12     // we must not ignore the given docFreq - if set use the given value (lie)
13     if (docFreq != -1)
14       termState.setDocFreq(docFreq);
15     
16     return new TermWeight(searcher, termState);
17   }

 

这里的termState是term的统计信息

TermWeight的构造函数

 1 public TermWeight(IndexSearcher searcher, TermContext termStates)
 2       throws IOException {
 3       assert termStates != null : "TermContext must not be null";
 4       this.termStates = termStates;
 5       this.similarity = searcher.getSimilarity();
 6       this.stats = similarity.computeWeight(
 7           getBoost(), 
 8           searcher.collectionStatistics(term.field()), 
 9           searcher.termStatistics(term, termStates));
10     }

TermWeight构造函数里注意第8和第9行,第8行是CollectionStatistics,第9行是TermStatistics,这两者都是关于term的统计信息,两者的区别是CollectionStatistics是关于查询term的所在Field的所有Term的统计信息,而TermStatistics是查询term的统计信息

similarity是IndexSearcher的默认的DefaultSimilarity

1 public final SimWeight computeWeight(float queryBoost, CollectionStatistics collectionStats, TermStatistics... termStats) {
2     final Explanation idf = termStats.length == 1
3     ? idfExplain(collectionStats, termStats[0])
4     : idfExplain(collectionStats, termStats);
5     return new IDFStats(collectionStats.field(), idf, queryBoost);
6   }

注意这里的idf

1 public Explanation idfExplain(CollectionStatistics collectionStats, TermStatistics termStats) {
2     final long df = termStats.docFreq();
3     final long max = collectionStats.maxDoc();
4     final float idf = idf(df, max);
5     return new Explanation(idf, "idf(docFreq=" + df + ", maxDocs=" + max + ")");
6   }

这里的idf值作为参数传递给IDFStats的构造函数

1 public IDFStats(String field, Explanation idf, float queryBoost) {
2       // TODO: Validate?
3       this.field = field;
4       this.idf = idf;
5       this.queryBoost = queryBoost;
6       this.queryWeight = idf.getValue() * queryBoost; // compute query weight
7     }

queryWeight是idf与queryBoost的乘积,queryBoost是查询时作为参数传递进来的,idf是基于term文档频率的分数因子(a score factor based on the term's document frequency),可简单理解为:文档总数/docFreq,df越大term对结果的评分影响越小,索引idf越大,对结果评分影响越大

下面是DefaultSimilarity里的idf实现

public float idf(long docFreq, long numDocs) {
    return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
  }

 

 

2.createNormalizedWeight方法的第4行

public float getValueForNormalization() {
      return stats.getValueForNormalization();
    }

stats是在TermWeight的构造函数里构造的

DefaultSimilarity继成TFIDFSimilarity,getValueForNormalization方法在TFIDFSimilarity里

public float getValueForNormalization() {
      // TODO: (sorta LUCENE-1907) make non-static class and expose this squaring via a nice method to subclasses?
      return queryWeight * queryWeight;  // sum of squared weights
    }

所以可以得到createNormalizedWeight方法第4行v的值是queryWeight * queryWeight

3.createNormalizedWeight方法的第5行

public float queryNorm(float sumOfSquaredWeights) {
    return (float)(1.0 / Math.sqrt(sumOfSquaredWeights));
  }

 

4.createNormalizedWeight方法的第9行

 public void normalize(float queryNorm, float topLevelBoost) {
      stats.normalize(queryNorm, topLevelBoost);
    }

 

public void normalize(float queryNorm, float topLevelBoost) {
      this.queryNorm = queryNorm * topLevelBoost;
      queryWeight *= this.queryNorm;              // normalize query weight
      value = queryWeight * idf.getValue();         // idf for document
    }

 

posted on 2014-07-03 17:59  ukouryou  阅读(349)  评论(0编辑  收藏  举报

导航