Solr4.8.0源码分析(18)之缓存机制(一)

2014-11-26 23:02 追风的蓝宝阅读(1367) 评论(0) 收藏举报

Solr4.8.0源码分析(18)之缓存机制(一)

前文在介绍commit的时候具体介绍了getSearcher()的实现，并提到了Solr的预热warn。那么本文开始将详细来学习下Solr的缓存机制。

1. 简介

Solr目前支持4中cache类型，每种缓存对应一种查询类型。

filterCache
documentCache
fieldvalueCache
queryresultCache

Solr提供了两种SolrCache接口实现类：

solr.search.LRUCache
solr.search.FastLRUCache。

FastLRUCache是1.4版本中引入的，其速度在普遍意义上要比LRUCache更fast些。

本文开始将详细介绍以上的内容，而本文的重要内容是Cache的生命周期，即

2. Cache生命周期

所有的Cache的生命周期由SolrIndexSearcher来管理，如果Cache对应的SolrIndexSearcher被重新构建都代表正在运行的Cache对象失效。

2.1 Cache初始化

前文<Solr4.8.0源码分析(17)之SolrCloud索引深入(4)>中讲到了，增量数据更新后提交DirectUpdateHandler2.commit(CommitUpdateCommand cmd)过程时候getSearcher()会获取到SolrIndexSearcher，如果设置forceNew=true ，就会重新打开一个SolrIndexSearcher，这个时候Solr就会进行warn。SolrIndexSearcher的有关cache的初始化代码如下, 可以看出cachingEnabled是个很关键的配置，如果该值为false 缓存就不会初始化。

 1 if (cachingEnabled) {
 2       ArrayList<SolrCache> clist = new ArrayList<>();
 3       fieldValueCache = solrConfig.fieldValueCacheConfig==null ? null : solrConfig.fieldValueCacheConfig.newInstance();
 4       if (fieldValueCache!=null) clist.add(fieldValueCache);
 5       filterCache= solrConfig.filterCacheConfig==null ? null : solrConfig.filterCacheConfig.newInstance();
 6       if (filterCache!=null) clist.add(filterCache);
 7       queryResultCache = solrConfig.queryResultCacheConfig==null ? null : solrConfig.queryResultCacheConfig.newInstance();
 8       if (queryResultCache!=null) clist.add(queryResultCache);
 9       documentCache = solrConfig.documentCacheConfig==null ? null : solrConfig.documentCacheConfig.newInstance();
10       if (documentCache!=null) clist.add(documentCache);
11 
12       if (solrConfig.userCacheConfigs == null) {
13         cacheMap = noGenericCaches;
14       } else {
15         cacheMap = new HashMap<>(solrConfig.userCacheConfigs.length);
16         for (CacheConfig userCacheConfig : solrConfig.userCacheConfigs) {
17           SolrCache cache = null;
18           if (userCacheConfig != null) cache = userCacheConfig.newInstance();
19           if (cache != null) {
20             cacheMap.put(cache.name(), cache);
21             clist.add(cache);
22           }
23         }
24       }
25 
26       cacheList = clist.toArray(new SolrCache[clist.size()]);
27     } else {
28       filterCache=null;
29       queryResultCache=null;
30       documentCache=null;
31       fieldValueCache=null;
32       cacheMap = noGenericCaches;
33       cacheList= noCaches;
34     }

从上述代码中可以看出，Solr的缓存接口主要有两种，LRUCache和FastLRUCache，这可以在SolrConfig.xml上进行配置。

其中size为缓存设置大小，initalSize初始化大小，autowarmCount 是最为关键的参数代表每次构建新的SolrIndexSearcher的时候需要后台线程预热加载到新Cache中多少个结果集。autowarmCount不但支持具体的参数比如300，也支持按百分比设置，比如autowarmCount="300"。如果autowarmCount为0则表示不进行预热。

那是不是这个预热数目越大就越好呢，其实还是要根据实际情况而定。如果你的应用为实时应用，很多实时应用的实现都会在很短的时间内去得到重新打开的内存索引indexReader，而Solr默认实现就会重新打开一个新的SolrIndexSearcher,那么如果Cache需要预热的数目越多，那么打开新的SolrIndexSearcher就会越慢，这样对实时性就会大打折扣。但是如果设置很小。每次都打开新的SolrIndexSearcher都是空Cache，基本上那些fq和facet的查询就基本不会命中缓存。所以对实时应用需要特别注意。

1 <query>
2         <filterCache      class="solr.LRUCache"     size="300"      initialSize="10"      autowarmCount="300"/>
3         <queryResultCache class="solr.LRUCache"     size="300"      initialSize="10"      autowarmCount="300"/>
4         <fieldValueCache  class="solr.FastLRUCache"     size="300"      initialSize="10"       autowarmCount="300" />
5         <documentCache    class="solr.FastLRUCache"     size="5000"      initialSize="512"      autowarmCount="300"/>
6         <useFilterForSortedQuery>true</useFilterForSortedQuery>//是否能使用到filtercache关键配置
7         <queryResultWindowSize>50</queryResultWindowSize>//queryresult的结果集控制
8         <enableLazyFieldLoading>false</enableLazyFieldLoading>//是否启用懒加载field
9  </query>

2.2 Cache的预热

预热，也许会很多人不了解预热的含义，在这里稍微解释下，例如一个Cache已经缓存了比较多的值，如果因为新的IndexSearcher被重新构建，那么新的Cache又会需要重新累积数据，那么会发现搜索突然会在一段时间性能急剧下降，要等到Cache重新累计了一定数据，命中率才会慢慢恢复。所以这样的情形其实是不可接受的，那么我们可以做的事情就是将老Cache对应的key,在重新构建SolrIndexSearcher返回之前将这些已经在老Cache中Key预先从磁盘重新load Value到Cache中，这样暴露出去的SolrIndexSearcher对应的Cache就不是一个内容为空的Cache，而是已经“背地”准备好内容的Cache。

前文在getSearcher()中以及提到了预热，那么再来回顾下它的源码。

 1           future = searcherExecutor.submit(new Callable() {
 2             @Override
 3             public Object call() throws Exception {
 4               try {
 5                 newSearcher.warm(currSearcher);
 6               } catch (Throwable e) {
 7                 SolrException.log(log, e);
 8                 if (e instanceof Error) {
 9                   throw (Error) e;
10                 }
11               }
12               return null;
13             }
14           });
15

真正开始进行预热是在SolrIndexSearcher.warn函数中，可以看出SolrIndexSearcher会对每一个缓存类型(filterCache，documentCache等)依次进行预热。而每一种类型的Cache都会根据其cache的接口类型来调用接口类型的warn。比如Solrconfig.xml中设置了filterCache的类型为FastLRUCache，那么Solr就会调用FastLRUCache的warn。

 1     for (int i=0; i<cacheList.length; i++) {
 2       if (debug) log.debug("autowarming " + this + " from " + old + "\n\t" + old.cacheList[i]);
 3 
 4 
 5       SolrQueryRequest req = new LocalSolrQueryRequest(core,params) {
 6         @Override public SolrIndexSearcher getSearcher() { return SolrIndexSearcher.this; }
 7         @Override public void close() { }
 8       };
 9 
10       SolrQueryResponse rsp = new SolrQueryResponse();
11       SolrRequestInfo.setRequestInfo(new SolrRequestInfo(req, rsp));
12       try {
13         this.cacheList[i].warm(this, old.cacheList[i]);
14       } finally {
15         try {
16           req.close();
17         } finally {
18           SolrRequestInfo.clearRequestInfo();
19         }
20       }
21 
22       if (debug) log.debug("autowarming result for " + this + "\n\t" + this.cacheList[i]);
23     }

FastLRUCache和LRUCache的区别我们暂且不讲，后文会详细介绍。我们提取出了它们共同实现warn的代码，即CacheRegenerator.regenerateItem()。regenerateItem的作用是对单个查询条件进行预热处理。

1       for (int i=0; i<keys.length; i++) {
2         try {
3           boolean continueRegen = regenerator.regenerateItem(searcher, this, old, keys[i], vals[i]);
4           if (!continueRegen) break;
5         }
6         catch (Exception e) {
7           SolrException.log(log,"Error during auto-warming of key:" + keys[i], e);
8         }
9       }

目前有三个CacheRegenerator，分别对应fieldValueCache，filterCache，以及queryResultCache。在SolrCore初始化的时候会分别对这三个类型的CacheRegenerator进行初始化。

 1 public static void initRegenerators(SolrConfig solrConfig) {
 2     if (solrConfig.fieldValueCacheConfig != null && solrConfig.fieldValueCacheConfig.getRegenerator() == null) {
 3       solrConfig.fieldValueCacheConfig.setRegenerator(
 4               new CacheRegenerator() {
 5                 @Override
 6                 public boolean regenerateItem(SolrIndexSearcher newSearcher, SolrCache newCache, SolrCache oldCache, Object oldKey, Object oldVal) throws IOException {
 7                   if (oldVal instanceof UnInvertedField) {
 8                     UnInvertedField.getUnInvertedField((String)oldKey, newSearcher);
 9                   }
10                   return true;
11                 }
12               }
13       );
14     }
15 
16     if (solrConfig.filterCacheConfig != null && solrConfig.filterCacheConfig.getRegenerator() == null) {
17       solrConfig.filterCacheConfig.setRegenerator(
18               new CacheRegenerator() {
19                 @Override
20                 public boolean regenerateItem(SolrIndexSearcher newSearcher, SolrCache newCache, SolrCache oldCache, Object oldKey, Object oldVal) throws IOException {
21                   newSearcher.cacheDocSet((Query)oldKey, null, false);
22                   return true;
23                 }
24               }
25       );
26     }
27 
28     if (solrConfig.queryResultCacheConfig != null && solrConfig.queryResultCacheConfig.getRegenerator() == null) {
29       final int queryResultWindowSize = solrConfig.queryResultWindowSize;
30       solrConfig.queryResultCacheConfig.setRegenerator(
31               new CacheRegenerator() {
32                 @Override
33                 public boolean regenerateItem(SolrIndexSearcher newSearcher, SolrCache newCache, SolrCache oldCache, Object oldKey, Object oldVal) throws IOException {
34                   QueryResultKey key = (QueryResultKey)oldKey;
35                   int nDocs=1;
36                   // request 1 doc and let caching round up to the next window size...
37                   // unless the window size is <=1, in which case we will pick
38                   // the minimum of the number of documents requested last time and
39                   // a reasonable number such as 40.
40                   // TODO: make more configurable later...
41 
42                   if (queryResultWindowSize<=1) {
43                     DocList oldList = (DocList)oldVal;
44                     int oldnDocs = oldList.offset() + oldList.size();
45                     // 40 has factors of 2,4,5,10,20
46                     nDocs = Math.min(oldnDocs,40);
47                   }
48 
49                   int flags=NO_CHECK_QCACHE | key.nc_flags;
50                   QueryCommand qc = new QueryCommand();
51                   qc.setQuery(key.query)
52                     .setFilterList(key.filters)
53                     .setSort(key.sort)
54                     .setLen(nDocs)
55                     .setSupersetMaxDoc(nDocs)
56                     .setFlags(flags);
57                   QueryResult qr = new QueryResult();
58                   newSearcher.getDocListC(qr,qc);
59                   return true;
60                 }
61               }
62       );
63     }
64   }

关于fieldValueCache的预热将会在facet查询部分再介绍，本节将介绍filterCache和queryResultCache的regenerateItem。
filterCache 在预热的时候会根据oldkey即根据缓存中的查询条件再去查询一次，并吧查询结果放入缓存。
queryResultCache 在SolrConfig.xml上有queryResultWindowSize，即每次查询是以该值为单位来返回查询结果个数。比如queryResultWindowSize为50，那么返回的结果就是50的倍数。在预热的时候，queryResultCache会再次用chache的查询条件查询一次，并返回queryResultWindowSize的结果放入缓存。
documentCache 存放的是<id,document>键值对，在预热的时候将会被遗弃。也即该缓存在进行commit的时候就会清零。

从上述的代码中可以看出，预热的本质就是将旧的Searcher中的query在进行commit时重新查询一遍，并将结果放入新的Searcher中。因此如果这个query的个数越多，预热的时间越久，所以autowarnCount这个参数很重要，它控制了旧的Searcher进行预热的query数量，如果该值为0就不会进行预热。如果autowarnCount较大，预热时间就会持续太久，将大大影响索引的性能，如果autowarnCount较少，虽然预热时间会持续较短，但是一开始缓存的命中将很低，影响查询性能。

2.3 Cache的添加

每查询一次，Solr都会把查询条件和查询结果进行缓存，以queryResultCache为例。

1     // lastly, put the superset in the cache if the size is less than or equal
2     // to queryResultMaxDocsCached
3     if (key != null && superset.size() <= queryResultMaxDocsCached && !qr.isPartialResults()) {
4       queryResultCache.put(key, superset);
5     }

2.3 Cache的销毁

Cache的销毁也是通过SolrIndexSearcher的关闭一并进行，见solrIndexSearcher.close()方法：

 1  @Override
 2   public void close() throws IOException {
 3     if (debug) {
 4       if (cachingEnabled) {
 5         StringBuilder sb = new StringBuilder();
 6         sb.append("Closing ").append(name);
 7         for (SolrCache cache : cacheList) {
 8           sb.append("\n\t");
 9           sb.append(cache);
10         }
11         log.debug(sb.toString());
12       } else {
13         if (debug) log.debug("Closing " + name);
14       }
15     }
16 
17     core.getInfoRegistry().remove(name);
18 
19     // super.close();
20     // can't use super.close() since it just calls reader.close() and that may only be called once
21     // per reader (even if incRef() was previously called).
22     
23     long cpg = reader.getIndexCommit().getGeneration();
24     try {
25       if (closeReader) reader.decRef();
26     } catch (Exception e) {
27       SolrException.log(log, "Problem dec ref'ing reader", e);
28     }
29 
30     if (directoryFactory.searchersReserveCommitPoints()) {
31       core.getDeletionPolicy().releaseCommitPoint(cpg);
32     }
33 
34     for (SolrCache cache : cacheList) {
35       cache.close();
36     }
37 
38     if (reserveDirectory) {
39       directoryFactory.release(getIndexReader().directory());
40     }
41     if (createdDirectory) {
42       directoryFactory.release(getIndexReader().directory());
43     }
44    
45     
46     // do this at the end so it only gets done if there are no exceptions
47     numCloses.incrementAndGet();
48   }

总结：

本节主要介绍了Cache的创建，预热，添加以及销毁，重点阐述了预热的原理。下节将介绍几种Cache的结构，使用场景，命中监控等。

刷新页面返回顶部

Ryan 不积跬步，无以至千里；不积小流，无以成江海。