Zookeeper源码阅读（二）数据存储

前言

在开始写具体的逻辑之前，还需要把zk的数据管理和事务的日志与保存了解得更深入一些。这部分内容不少，下面几篇都会是相关的内容。

内存数据

zk的数据结构模型是基于ZNode的树状模型。在ZK内部通过类似内存数据库的方式保存了整棵树的内容，并定时写入磁盘。

zk的内存数据放在DataTree中，它是zk内存数据存储的核心，也是一个树形结构。

/**
 * This class maintains the tree data structure. It doesn't have any networking
 * or client connection code in it so that it can be tested in a stand alone
 * way.
 * <p>
 * The tree maintains two parallel data structures: a hashtable that maps from
 * full paths to DataNodes and a tree of DataNodes. All accesses to a path is
 * through the hashtable. The tree is traversed only when serializing to disk.
 */
public class DataTree {
    private static final Logger LOG = LoggerFactory.getLogger(DataTree.class);

    /**
     * This hashtable provides a fast lookup to the datanodes. The tree is the
     * source of truth and is where all the locking occurs
     */
    private final ConcurrentHashMap<String, DataNode> nodes =
        new ConcurrentHashMap<String, DataNode>();

    private final WatchManager dataWatches = new WatchManager();

    private final WatchManager childWatches = new WatchManager();

    /** the root of zookeeper tree */
    private static final String rootZookeeper = "/";

    /** the zookeeper nodes that acts as the management and status node **/
    private static final String procZookeeper = Quotas.procZookeeper;

    /** this will be the string thats stored as a child of root */
    private static final String procChildZookeeper = procZookeeper.substring(1);

    /**
     * the zookeeper quota node that acts as the quota management node for
     * zookeeper
     */
    private static final String quotaZookeeper = Quotas.quotaZookeeper;

    /** this will be the string thats stored as a child of /zookeeper */
    private static final String quotaChildZookeeper = quotaZookeeper
            .substring(procZookeeper.length() + 1);

    /**
     * the path trie that keeps track fo the quota nodes in this datatree
     */
    private final PathTrie pTrie = new PathTrie();

    /**
     * This hashtable lists the paths of the ephemeral nodes of a session.
     */
    private final Map<Long, HashSet<String>> ephemerals =
        new ConcurrentHashMap<Long, HashSet<String>>();

    private final ReferenceCountedACLCache aclCache = new ReferenceCountedACLCache();
    ...

可以看到DataTree主要和四个类有关联关系，即DataNode，Quotas，PathTrie，StatsTrack。接下来会逐个说一下。

DataNode

DataNode类是zookeeper中数据存储的最小单元。在DataTree中，private final ConcurrentHashMap<String, DataNode> nodes = new ConcurrentHashMap<String, DataNode>();所有的datanode存在一个concurrentHashMap中，对zk中所有的znode进行操作，其实底层就是对这个map进行操作。其中path是key，datanode是value。

特别的是，对于所有的临时节点，private final Map<Long, HashSet<String>> ephemerals = new ConcurrentHashMap<Long, HashSet<String>>();专门有一个map去存它们，便于实时的访问和session结束后的集中清理。

其中，DataNode类的代码：

public class DataNode implements Record {
    /** the parent of this datanode */
    DataNode parent;

    /** the data for this datanode */
    byte data[];

    /**
     * the acl map long for this datanode. the datatree has the map
     */
    Long acl;

    /**
     * the stat for this node that is persisted to disk.
     */
    public StatPersisted stat;

    /**
     * the list of children for this node. note that the list of children string
     * does not contain the parent path -- just the last part of the path. This
     * should be synchronized on except deserializing (for speed up issues).
     */
    private Set<String> children = null;

可以看到，DataNode中存储的信息共有三类，数据内容data[]，acl列表和节点状态stat。其中数据内容和节点状态就是在客户端上getdata获取到的那些数据。同时，DataNode中还记录了节点的父节点和子节点列表，并提供了对子节点列表的操作。

加孩子：

/**
 * Method that inserts a child into the children set
 * 
 * @param child
 *            to be inserted
 * @return true if this set did not already contain the specified element
 */
public synchronized boolean addChild(String child) {
    if (children == null) {
        // let's be conservative on the typical number of children
        children = new HashSet<String>(8);//初始化
    }
    return children.add(child);//加入set中
}

删孩子：

/**
 * Method that removes a child from the children set
 * 
 * @param child
 * @return true if this set contained the specified element
 */
public synchronized boolean removeChild(String child) {
    if (children == null) {
        return false;
    }
    return children.remove(child);//把孩子从set中移除
}

get/set:

/**
 * convenience method for setting the children for this datanode
 * 
 * @param children
 */
public synchronized void setChildren(HashSet<String> children) {
    this.children = children;
}

/**
 * convenience methods to get the children
 * 
 * @return the children of this datanode
 */
public synchronized Set<String> getChildren() {//get/set中都加同步，避免了多线程请求时对共享变量形成竞态条件
    if (children == null) {
        return EMPTY_SET;
    }

    return Collections.unmodifiableSet(children);
}

都是很简单的方法，配合注释应该很容易看懂。

Quotas

在看后面的内容前，强烈建议看一看zk权限管理与配额。Quotas其实就是为ZNode设置的节点个数和数据量大小的限制（只是在日志中会提醒，并不是真正限制）。

public class Quotas {

    /** the zookeeper nodes that acts as the management and status node **/
    public static final String procZookeeper = "/zookeeper";

    /** the zookeeper quota node that acts as the quota
     * management node for zookeeper */
    public static final String quotaZookeeper = "/zookeeper/quota";

    /**
     * the limit node that has the limit of
     * a subtree
     */
    public static final String limitNode = "zookeeper_limits";

    /**
     * the stat node that monitors the limit of
     * a subtree.
     */
    public static final String statNode = "zookeeper_stats";

limitnode和statnode的区别：一个是在set quota是的限制，一个是真实的情况。这个会在后面说PathTrie的时候说下。这里说明一点，所有成功设立了quota的节点都会在/zookeeper/quota下建立一个树形的数据结构，并且每个节点都会有两个孩子接点，即path+"zookeeper_limits"和path+"zookeeper_stats"。分别对应上面的limitnode和statnode。特别的是，前面这句话中成功设立是有条件的，如果发现有父节点或者兄弟孩子节点有quota，那么设置quota会失败。

public static String quotaPath(String path) {
    return quotaZookeeper + path +
    "/" + limitNode;//limitnode
}

public static String statPath(String path) {
    return quotaZookeeper + path + "/" +
    statNode;//statnode
}

上面两个方法负责statnode和limitnode的路径生成。

PathTrie

关于字典树的简介，可以看一下浅谈字典树。我自己简单理解了一下，大概就是如果单词有公共字串（从第一个字母开始的），那么这部分公用，剩下的再建立新的接点。

public class PathTrie {
    /**
     * the logger for this class
     */
    private static final Logger LOG = LoggerFactory.getLogger(PathTrie.class);
    
    /**
     * the root node of PathTrie
     */
    private final TrieNode rootNode ;
    
    static class TrieNode {
        boolean property = false;//表示当前节点是否有配额
        final HashMap<String, TrieNode> children;
        TrieNode parent = null;

结构很简单，就是典型的树结构，其中静态内部类TrieNode是节点。

前面说到的一点，果发现有父节点或者兄弟孩子节点有quota，那么设置quota会失败。为什么会这样其实是在PathTrie里控制的，而且这点之前看了很多博客都没提到，一定要注意。

可以从上面三张图片里看到，一旦给一个节点加了quota之后，给它的父节点和子节点加quota都会失败。

原因：

public void addPath(String path) {
    if (path == null) {
        return;
    }
    String[] pathComponents = path.split("/");//把路径按照/分开
    TrieNode parent = rootNode;
    String part = null;
    if (pathComponents.length <= 1) {
        throw new IllegalArgumentException("Invalid path " + path);
    }
    for (int i=1; i<pathComponents.length; i++) {//一层一层查
        part = pathComponents[i];
        if (parent.getChild(part) == null) {
            parent.addChild(part, new TrieNode(parent));////找到位置，插入
        }
        parent = parent.getChild(part);
    }
    parent.setProperty(true);
}

从这里看，确实是按照字典树的规则插入的，但是在zk接受客户端命令的位置在ZookeeperMain中processCMD方法中：

if (cmd.equals("setquota") && args.length >= 4) {
    String option = args[1];
    String val = args[2];
    path = args[3];
    System.err.println("Comment: the parts are " +
                       "option " + option +
                       " val " + val +
                       " path " + path);
    if ("-b".equals(option)) {
        // we are setting the bytes quota
        createQuota(zk, path, Long.parseLong(val), -1);//发送setquota命令后真正添加节点的
    } else if ("-n".equals(option)) {
        // we are setting the num quota
        createQuota(zk, path, -1L, Integer.parseInt(val));
    } else {
        usage();
    }

}

这里可以看到setquota中有一个createQuota方法，其中:

/ check for more than 2 children --
// if zookeeper_stats and zookeeper_qutoas
// are not the children then this path
// is an ancestor of some path that
// already has quota
String realPath = Quotas.quotaZookeeper + path;
//检查孩子节点中是否已经有quota
try {
    List<String> children = zk.getChildren(realPath, false);
    for (String child: children) {
        if (!child.startsWith("zookeeper_")) {
            throw new IllegalArgumentException(path + " has child " +
                    child + " which has a quota");
        }
    }
} catch(KeeperException.NoNodeException ne) {
    // this is fine
}

//check for any parent that has been quota
//检查父节点中是否有quota，可以点进去看，逻辑和判断孩子的差不多的。
checkIfParentQuota(zk, path);

在这里判断了一下后，这也就导致了我先前描述的那种情况，为什么在父节点和子节点有quota时无法添加的quota。特别重要的是，这个判断在客户端就完成了判断。如果父节点和子节点没有quota，客户端会发送请求到服务端创建节点(代码在ZookeeperMain中)，如下所示：

！！！这里我之前说错了！！！！并不是仅仅依靠客户端就完成了判断，需要与服务器通信的。List children = zk.getChildren(realPath, false);这里getChildren其实是一个与服务端通信的动作。

if (zk.exists(quotaPath, false) == null) {
    try {
    	//在create内部会将请求发送到服务器端。
        zk.create(Quotas.procZookeeper, null, Ids.OPEN_ACL_UNSAFE,
                CreateMode.PERSISTENT);
        zk.create(Quotas.quotaZookeeper, null, Ids.OPEN_ACL_UNSAFE,
                CreateMode.PERSISTENT);
    } catch(KeeperException.NodeExistsException ne) {
        // do nothing
    }
}

如果想了解字典树的添加和删除，可以看一下 Zk数据模型-配额。

StatsTrack

StatsTrack其实就是记录某个接点实际的count和bytes信息。

/**
 * a class that represents the stats associated with quotas
 */
public class StatsTrack {
    private int count;
    private long bytes;
    private String countStr = "count";
    private String byteStr = "bytes";

StatsTrack其实就是个实体类，存的就是字典树中statnode节点的数据。下面的代码就是生成statnode时的代码，可以看到，是把statstrack转化为字符串后放入statnode中作为它的内容。

StatsTrack strack = new StatsTrack(null);
strack.setBytes(bytes);
strack.setCount(numNodes);
try {
    zk.create(quotaPath, strack.toString().getBytes(),
            Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    StatsTrack stats = new StatsTrack(null);
    stats.setBytes(0L);
    stats.setCount(0);
    zk.create(statPath, stats.toString().getBytes(),
            Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
}

思考：

有时间可以再仔细研究研究quotas的原理。

不清楚的几点：

为什么父节点、子节点有quota就不让添加了呢？？？为什么要这样设计。

posted @ 2018-09-11 21:45 SmallMushroom 阅读(1429) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

SmallMushroom