Solr4.8.0源码分析(15) 之 SolrCloud索引深入(2)

2014-11-11 23:55 追风的蓝宝阅读(1800) 评论(0) 编辑收藏举报

Solr4.8.0源码分析(15) 之 SolrCloud索引深入(2)

上一节主要介绍了SolrCloud分布式索引的整体流程图以及索引链的实现，那么本节开始将分别介绍三个索引过程即LogUpdateProcessor，DistributedUpdateProcessor，DirectUpdateHandler2。本节主要研究下LogUpdateProcessor和DistributedUpdateProcessor。

1. LogUpdateProcessor

上一节中记述了LogUpdateProcessor的实例化，如下所示。从getInstance可以发现，LogUpdateProcessor在SolrCloud中并不一定会起作用，只有当Solr的日志等级为INFO的时候LogUpdateProcessor才会被实例化，否则就是Null，不会加入到索引链中。

1   @Override
2   public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) {
3     return LogUpdateProcessor.log.isInfoEnabled() ? new LogUpdateProcessor(req, rsp, this, next) : null;
4   }

那么问题就来了，LogUpdateProcessor 这玩意到底是干什么的呢？看了下LogUpdateProcessor的源码就可以发现，原来这玩意就是一个记录Solr update过程日志的，所以当日志等级大于INFO时候，这个过程就是没有的。以processAdd()和finish()为例看下源码：

 1   @Override
 2   public void processAdd(AddUpdateCommand cmd) throws IOException {
 3     if (logDebug) { log.debug("PRE_UPDATE " + cmd.toString() + " " + req); }
 4 
 5     // call delegate first so we can log things like the version that get set later
 6     if (next != null) next.processAdd(cmd);
 7 
 8     // Add a list of added id's to the response
 9     if (adds == null) {
10       adds = new ArrayList<>();
11       toLog.add("add",adds);
12     }
13 
14     if (adds.size() < maxNumToLog) {
15       long version = cmd.getVersion();
16       String msg = cmd.getPrintableId();
17       if (version != 0) msg = msg + " (" + version + ')';
18       adds.add(msg);
19     }
20 
21     numAdds++;
22   }
23 
24 @Override
25   public void finish() throws IOException {
26     if (logDebug) { log.debug("PRE_UPDATE FINISH " + req); }
27     if (next != null) next.finish();
28 
29     // LOG A SUMMARY WHEN ALL DONE (INFO LEVEL)
30 
31     if (log.isInfoEnabled()) {
32       StringBuilder sb = new StringBuilder(rsp.getToLogAsString(req.getCore().getLogId()));
33 
34       rsp.getToLog().clear();   // make it so SolrCore.exec won't log this again
35 
36       // if id lists were truncated, show how many more there were
37       if (adds != null && numAdds > maxNumToLog) {
38         adds.add("... (" + numAdds + " adds)");
39       }
40       if (deletes != null && numDeletes > maxNumToLog) {
41         deletes.add("... (" + numDeletes + " deletes)");
42       }
43       long elapsed = rsp.getEndTime() - req.getStartTime();
44 
45       sb.append(toLog).append(" 0 ").append(elapsed);
46       log.info(sb.toString());
47     }
48   }

之前我把这个LogUpdateProcessor的log的概念跟updatelog的概念搞混了，这里区分下：

LogUpdateProcessor的log是程序运行的日志，即我们所说的一般的操作日志，带有INFO,WARN,ERROR等等级，当然他也会包含update的document信息，在log4j.property中设置。
UpdateLog，下一节讲述DirectUpdateHandler2的时候会具体讲到，它是对一次request的内容的保存，是Solr内部进行数据备份、还原的文件，也包含了update的document，可再solrconfig.xml中进行配置文件路径。

2. DistributedUpdateProcessor的整体流程图

这是我画的对 DistributedUpdateProcessor过程的示意图，以add过程为例主要介绍了DistributedUpdateProcessor的update 分发过程：

document分发具有以下几种形态，DistribPhase.NONE , DistribPhase.TOLEADER , DistribPhase.FROMLEADER

 1 public static enum DistribPhase {
 2     NONE, TOLEADER, FROMLEADER;
 3 
 4     public static DistribPhase parseParam(final String param) {
 5       if (param == null || param.trim().isEmpty()) {
 6         return NONE;
 7       }
 8       try {
 9         return valueOf(param);
10       } catch (IllegalArgumentException e) {
11         throw new SolrException
12           (SolrException.ErrorCode.BAD_REQUEST, "Illegal value for " + 
13            DISTRIB_UPDATE_PARAM + ": " + param, e);
14       }
15     }
16   }

除了以上几种情况，还需要考虑从ulog中恢复数据，这部分内容会在后文中单独写一节讲述SolrCloud的容灾恢复。
DistribPhase.NONE 表示SolrJ客户端直接往该Node种发送请求，而不是来自其他Node转发。分为两种情况
- 如果本Node是leader，
  - 那么先设置标记forwardtoleader为false，表示不需要再往leader发了。
  - 获取所有的Replica的Node信息
  - 获取当前系统时间，并做移位操作，作为version的值。
  - 在update request的document中加入_version_字段，value为version值
  - 进入下一个索引链DirectUpdateHandler2，即在lucene Index中写入本次update的documents数据
  - 将update request转发至各replica。
- 如果本Node不是leader，
  - 首先获取leader信息
  - 设置标记forwardtoleader为true，表示需要发往leader
  - 将update request转发至leader。
DistribPhase.TOLEADER 表示request update请求是replica 发送给leader的，所有一般情况下，本Node就是leader。
- 如果本Node是leader，同DistribPhase.NONE 的leader的步骤一样
- 如果本Node不是leader，一般情况下是不会出现的，除非刚好在进行split或者在recovery，这部分将会在后面的容灾复原中介绍。
DistribPhase.FROMLEADER 表示request update请求是leader 发送给replica的，所有一般情况下，本Node就是replica。
- 设置isleader=false，表示不是leader
- 设置标记forwardtoleader为false，表示不需要再往leader发了
- 获取update 的document中的version字段，由leader加入version转发过来
- 查询ulog中该unique_id字段的最新version。
- 比较updateversion 与 lastVersion。如果updateversion > lastVersion 继续进行，如果updateversion < lastVersion 停止update。
- 进入下一个索引链DirectUpdateHandler2，即在lucene Index中写入本次update的documents数据
DistributedUpdateProcessor这一步主要实现了document的转发，以及version的生成与比较。而DirectUpdateHandler2才是真正的将request写入updatelog和Lucene Index过程，这在下一节中讲到。
本节主要将的是add过程，那么commit，delete过程跟add大同小异，这里就不再描述，下一节会顺带讲一下。
关于DistributedUpdateProcessor的processAdd()的源码因为篇幅原因就不再描述了。

 1 @Override
 2   public void processAdd(AddUpdateCommand cmd) throws IOException {
 3     updateCommand = cmd;
 4     //集群模式
 5     if (zkEnabled) {
 6       zkCheck();
 7       //根据请求的DistribPhase获取将要转发的Node
 8       nodes = setupRequest(cmd.getHashableId(), cmd.getSolrInputDocument());
 9     } else {
10     //单机模式
11       isLeader = getNonZkLeaderAssumption(req);
12     }
13 
14     boolean dropCmd = false;
15     if (!forwardToLeader) {
16       //对version信息进行处理，在request中加入version信息，并调用下一步索引链DirectUpdateHandler2。
17       dropCmd = versionAdd(cmd);
18     }
19     //如果update version 小于 lastversion，则放弃该次request
20     if (dropCmd) {
21       // TODO: do we need to add anything to the response?
22       return;
23     }
24     //根据DistribPhase 将数据转发至leader或者replica
25     if (zkEnabled && isLeader && !isSubShardLeader)  {
26       DocCollection coll = zkController.getClusterState().getCollection(collection);
27       List<Node> subShardLeaders = getSubShardLeaders(coll, cloudDesc.getShardId(), cmd.getHashableId(), cmd.getSolrInputDocument());
28       // the list<node> will actually have only one element for an add request
29       if (subShardLeaders != null && !subShardLeaders.isEmpty()) {
30         ModifiableSolrParams params = new ModifiableSolrParams(filterParams(req.getParams()));
31         params.set(DISTRIB_UPDATE_PARAM, DistribPhase.FROMLEADER.toString());
32         params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl(
33             zkController.getBaseUrl(), req.getCore().getName()));
34         params.set(DISTRIB_FROM_PARENT, req.getCore().getCoreDescriptor().getCloudDescriptor().getShardId());
35         for (Node subShardLeader : subShardLeaders) {
36           cmdDistrib.distribAdd(cmd, Collections.singletonList(subShardLeader), params, true);
37         }
38       }
39       List<Node> nodesByRoutingRules = getNodesByRoutingRules(zkController.getClusterState(), coll, cmd.getHashableId(), cmd.getSolrInputDocument());
40       if (nodesByRoutingRules != null && !nodesByRoutingRules.isEmpty())  {
41         ModifiableSolrParams params = new ModifiableSolrParams(filterParams(req.getParams()));
42         params.set(DISTRIB_UPDATE_PARAM, DistribPhase.FROMLEADER.toString());
43         params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl(
44             zkController.getBaseUrl(), req.getCore().getName()));
45         params.set(DISTRIB_FROM_COLLECTION, req.getCore().getCoreDescriptor().getCloudDescriptor().getCollectionName());
46         params.set(DISTRIB_FROM_SHARD, req.getCore().getCoreDescriptor().getCloudDescriptor().getShardId());
47         for (Node nodesByRoutingRule : nodesByRoutingRules) {
48           cmdDistrib.distribAdd(cmd, Collections.singletonList(nodesByRoutingRule), params, true);
49         }
50       }
51     }
52 
53     ModifiableSolrParams params = null;
54     if (nodes != null) {
55 
56       params = new ModifiableSolrParams(filterParams(req.getParams()));
57       params.set(DISTRIB_UPDATE_PARAM,
58                  (isLeader || isSubShardLeader ?
59                   DistribPhase.FROMLEADER.toString() :
60                   DistribPhase.TOLEADER.toString()));
61       params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl(
62           zkController.getBaseUrl(), req.getCore().getName()));
63 
64       cmdDistrib.distribAdd(cmd, nodes, params);
65     }
66     
67     // TODO: what to do when no idField?
68     if (returnVersions && rsp != null && idField != null) {
69       if (addsResponse == null) {
70         addsResponse = new NamedList<String>();
71         rsp.add("adds",addsResponse);
72       }
73       if (scratch == null) scratch = new CharsRef();
74       idField.getType().indexedToReadable(cmd.getIndexedId(), scratch);
75       addsResponse.add(scratch.toString(), cmd.getVersion());
76     }
77     
78     // TODO: keep track of errors?  needs to be done at a higher level though since
79     // an id may fail before it gets to this processor.
80     // Given that, it may also make sense to move the version reporting out of this
81     // processor too.
82   }

3. 关于DistributedUpdateProcessor中Version bug的讨论

在查看DistributedUpdateProcessor中发现SolrCloud的version使用有bug，内容如下：

首先来查看VersionBucket这个类，这个类主要存放的是update的最高的version值。

 1 // TODO: make inner?
 2 // TODO: store the highest possible in the index on a commit (but how to not block adds?)
 3 // TODO: could also store highest possible in the transaction log after a commit.
 4 // Or on a new index, just scan "version" for the max?
 5 /** @lucene.internal */
 6 public class VersionBucket {
 7   public long highest;
 8 
 9   public void updateHighest(long val) {
10     if (highest != 0) {
11       highest = Math.max(highest, Math.abs(val));
12     }
13   }
14 }

当一个request发送过来时候，SolrCloud会根据request的document的个数新建VersionBucket数据，并初始化为0.

1    buckets = new VersionBucket[ BitUtil.nextHighestPowerOfTwo(nBuckets) ];
2     for (int i=0; i<buckets.length; i++) {
3       buckets[i] = new VersionBucket();
4     }

当进行update的时候，SolrCloud会根据每一个unique_id获取一个hash值，然后根据这个hash值在VersionBucket数组中寻找对应的VersionBucket的highest version

1  int bucketHash = Hash.murmurhash3_x86_32(idBytes.bytes, idBytes.offset, idBytes.length, 0);
2  VersionBucket bucket = vinfo.bucket(bucketHash);
3 long bucketVersion = bucket.highest;

那么问题来了，虽然VersionBucket的属性highest是public，但是SolrCloud从没有对highest进行赋值过。也就是说SolrCloud只会通过updateHighest这个方法取更新highest，但是由于highest的更新的前提是(highest != 0) ，那么highest就根本只会变成0，根本不会变成其他。所以这个version就成了摆设，根本没有起到任何作用，不知道是我理解的错误还是SolrCloud存在这样一个bug。

4. 总结

本节深入的研究了SolrCloud索引链三步走的前两步LogUpdateProcessor和DistributedUpdateProcessor。重点详细介绍了DistributedUpdateProcessor中对update request的分布情况，并对Version 比较的bug进行了说明。

刷新页面返回顶部

Ryan 不积跬步，无以至千里；不积小流，无以成江海。