自定义排序
IndexSearcher.java 动态计算存储的饭馆离某个位置最近最远
/** Expert: Low-level search implementation with arbitrary sorting. Finds
* the top <code>n</code> hits for <code>query</code>, applying
* <code>filter</code> if non-null, and sorting the hits by the criteria in
* <code>sort</code>.
*
* <p>Applications should usually call {@link
* Searcher#search(Query,Filter,int,Sort)} instead.
*
* @throws BooleanQuery.TooManyClauses
*/
@Override
public TopFieldDocs search(Weight weight, Filter filter,
final int nDocs, Sort sort) throws IOException {
return search(weight, filter, nDocs, sort, true);
}
SortField.java
/** Creates a sort with a custom comparison function.
* @param field Name of field to sort by; cannot be <code>null</code>.
* @param comparator Returns a comparator for sorting hits.
*/
public SortField(String field, FieldComparatorSource comparator) {
initFieldType(field, CUSTOM);
this.comparatorSource = comparator;
}
FieldComparatorSource.java
/**
* Provides a {@link FieldComparator} for custom field sorting.
*
* @lucene.experimental
*
*/
public abstract class FieldComparatorSource implements Serializable {
/**
* Creates a comparator for the field in the given index.
*
* @param fieldname
* Name of the field to create comparator for.
* @return FieldComparator.
* @throws IOException
* If an error occurs reading the index.
*/
public abstract FieldComparator<?> newComparator(String fieldname, int numHits, int sortPos, boolean reversed)
throws IOException;
}
对查询结果的进一步计算或者处理
Collector.java
* <p><b>NOTE:</b> The doc that is passed to the collect
* method is relative to the current reader. If your
* collector needs to resolve this to the docID space of the
* Multi*Reader, you must re-base it by recording the
* docBase from the most recent setNextReader call. Here's
* a simple example showing how to collect docIDs into a
* BitSet:</p>
*
* <pre>
* Searcher searcher = new IndexSearcher(indexReader);
* final BitSet bits = new BitSet(indexReader.maxDoc());
* searcher.search(query, new Collector() {
* private int docBase;
*
* <em>// ignore scorer</em>
* public void setScorer(Scorer scorer) {
* }
*
* <em>// accept docs out of order (for a BitSet it doesn't matter)</em>
* public boolean acceptsDocsOutOfOrder() {
* return true;
* }
*
* public void collect(int doc) {
* bits.set(doc + docBase);
* }
*
* public void setNextReader(IndexReader reader, int docBase) {
* this.docBase = docBase;
* }
* });
* </pre>
扩展QueryParse
1.禁用模糊查询和通配符查询
/**
* Builds a new FuzzyQuery instance
* @param term Term
* @param minimumSimilarity minimum similarity
* @param prefixLength prefix length
* @return new FuzzyQuery Instance
*/
protected Query newFuzzyQuery(Term term, float minimumSimilarity, int prefixLength) {
// FuzzyQuery doesn't yet allow constant score rewrite
return new FuzzyQuery(term,minimumSimilarity,prefixLength); //去掉改为抛出异常
}
自定义过滤器,对于搜索结果本身可能会经常变化,导致在某段时间内可能需要过滤掉,某段时间不需要过滤,如果把这个字段加入索引,则可能导致结果不准确。比较好的方案是定义过滤器,可以根据某些特定规则对搜索进行过滤。比如热销书,某本书可能某段时间是热销书,某段时间不是,如果把是否热销书作为一个字段加入索引中,则不太合适,此时可以使用自定义filter计算某个doc是否要过滤掉。
/**
* Abstract base class for restricting which documents may
* be returned during searching.
*/
public abstract class Filter implements java.io.Serializable {
/**
* Creates a {@link DocIdSet} enumerating the documents that should be
* permitted in search results. <b>NOTE:</b> null can be
* returned if no documents are accepted by this Filter.
* <p>
* Note: This method will be called once per segment in
* the index during searching. The returned {@link DocIdSet}
* must refer to document IDs for that segment, not for
* the top-level reader.
*
* @param reader a {@link IndexReader} instance opened on the index currently
* searched on. Note, it is likely that the provided reader does not
* represent the whole underlying index i.e. if the index has more than
* one segment the given reader only represents a single segment.
*
* @return a DocIdSet that provides the documents which should be permitted or
* prohibited in search results. <b>NOTE:</b> null can be returned if
* no documents will be accepted by this Filter.
*
* @see DocIdBitSet
*/
public abstract DocIdSet getDocIdSet(IndexReader reader) throws IOException;
}
DocIdSet是二进制bit位,各bit的位置跟docid对应,如果某个bit设置为1,则会出现在搜索结果中,否则则不会出现在搜索结果。
filterQuery.java使用过滤后的查询,会拼成最终的查询表达式去查询。
性能问题:
1.lucene会在内部把RangeQuery重写booleanQuery来查询,OR查询表达式
如果查询范围超过1024,会抛出 TooManyClauses异常
/** Thrown when an attempt is made to add more than {@link
* #getMaxClauseCount()} clauses. This typically happens if
* a PrefixQuery, FuzzyQuery, WildcardQuery, or TermRangeQuery
* is expanded to many terms during search.
*/
public static class TooManyClauses extends RuntimeException {
public TooManyClauses() {
super("maxClauseCount is set to " + maxClauseCount);
}
}