huangfox

冰冻三尺,非一日之寒!

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

lucene问题汇总:

Lucene常见问题汇总

 

 

从api中我们可以了解到:

The fields usedto determine sort order must be carefully chosen. Documents must contain asingle term in such a field, and the value of the term should indicate thedocument's relative position in a given sort order. The field must be indexed,but should not be tokenized, and does not need to be stored (unless you happento want it back with the rest of your document data). In other words:

document.add (new Field ("byNumber",Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));

总之需要排序的字段需要索引但不能被分词。

 

 

在常规的检索方法中加入一个参数即可完成排序的要求。

Ø  TopFieldDocs search(Query query, Filter filter, int n, Sort sort)
          Searchimplementation with arbitrary sorting.

Ø  Sort(SortField field)
          Sorts by thecriteria in the given SortField.

Ø  SortField(String field, int type)
          Creates a sort byterms in the given field with the type of term values explicitly given.

 

代码示例:

       SortField sortF = new SortField("f", SortField.INT);

       Sort sort = new Sort(sortF);

       TopFieldDocs docs = searcher.search(query, null, 10, sort);

       ScoreDoc[] docs2 = docs.scoreDocs;

假设当前索引中有5份文档,f域的值分别是:-2,0,1,5,10;当用上述方式(SortField.INT)执行后返回结果为:-2,0,1,5,10;

当改用SortField.STRING后返回结果为:-2,0,1,10,5。

 

通过实验可知,特别对与数字(日期)相关的字段排序,选择SortField的类型很重要。

 

ps:

排序字段类型: 

Field Summary
static int BYTE
          Sort using term values as encoded Bytes.
static int CUSTOM
          Sort using a custom Comparator.
static int DOC
          Sort by document number (index order).
static int DOUBLE
          Sort using term values as encoded Doubles.
static SortField FIELD_DOC
          Represents sorting by document number (index order).
static SortField FIELD_SCORE
          Represents sorting by document score (relevancy).
static int FLOAT
          Sort using term values as encoded Floats.
static int INT
          Sort using term values as encoded Integers.
static int LONG
          Sort using term values as encoded Longs.
static int SCORE
          Sort by document score (relevancy).
static int SHORT
          Sort using term values as encoded Shorts.
static int STRING
          Sort using term values as Strings.
static int STRING_VAL
          Sort using term values as Strings, but comparing by value (using String.compareTo) for all comparisons.

 

注意:

上面提到过选用Int型和String型对排序的效果不同,还有一点需要注意——那就是效率的问题,在数据量比较大的时候,对数字型字段进行排序最好选用合理的类型,不要笼统的全部是用String型进行排序。

 

 

posted on 2010-10-14 13:51  huangfox  阅读(1594)  评论(0编辑  收藏  举报