




ZooKeeper: A Distributed Coordination Service for Distributed Applications

ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.

Coordination services are notoriously hard to get right. They are especially prone to errors such as race conditions and deadlock. The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch.






ZooKeeper is very fast and very simple. Since its goal, though, is to be a basis for the construction of more complicated services, such as synchronization, it provides a set of guarantees. These are:

  • Sequential Consistency - Updates from a client will be applied in the order that they were sent.
  • Atomicity - Updates either succeed or fail. No partial results.
  • Single System Image - A client will see the same view of the service regardless of the server that it connects to.
  • Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
  • Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.





  • 原子性-更新成功或失败。没有部分结果。
  • 单一系统映像-客户端将看到相同的服务视图,无论它连接到哪个服务器。
  • 可靠性——一旦应用了更新,它将从那时起一直持续到客户端覆盖更新为止。
  • 及时性-客户对系统的看法是保证在一定的时间范围内是最新的。





sh zkServer.sh start
sh zkServer.sh status
#Mode: standalone  表示非集群的标准模式启动


sh zkCli.sh -server


通过执行ls / 查看根节点下的Path

[zk: localhost:2181(CONNECTED) 0] ls /

通过执行get zookeeper查看Path:zookeeper的stat信息

[zk: localhost:2181(CONNECTED) 2] get /zookeeper

cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x0
cversion = -1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 1


Next, create a new znode by running create /zk_test my_data. This creates a new znode and associates the string "my_data" with the node. You should see:

[zkshell: 9] create /zk_test my_data
Created /zk_test

Issue another ls / command to see what the directory looks like:

[zkshell: 11] ls /
[zookeeper, zk_test]

Notice that the zk_test directory has now been created.

Next, verify that the data was associated with the znode by running the get command, as in:

[zkshell: 12] get /zk_test
cZxid = 5
ctime = Fri Jun 05 13:57:06 PDT 2009
mZxid = 5
mtime = Fri Jun 05 13:57:06 PDT 2009
pZxid = 5
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0
dataLength = 7
numChildren = 0

We can change the data associated with zk_test by issuing the set command, as in:

[zkshell: 14] set /zk_test junk
cZxid = 5
ctime = Fri Jun 05 13:57:06 PDT 2009
mZxid = 6
mtime = Fri Jun 05 14:01:52 PDT 2009
pZxid = 5
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0
dataLength = 4
numChildren = 0
[zkshell: 15] get /zk_test
cZxid = 5
ctime = Fri Jun 05 13:57:06 PDT 2009
mZxid = 6
mtime = Fri Jun 05 14:01:52 PDT 2009
pZxid = 5
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0
dataLength = 4
numChildren = 0

(Notice we did a get after setting the data and it did, indeed, change.

Finally, let's delete the node by issuing:

[zkshell: 16] delete /zk_test
[zkshell: 17] ls /
[zkshell: 18]



 * @author: to_be_continued
 * @Date: 2020/6/22 10:36
public class TestZk {
    private static ZooKeeper zooKeeper;

    public static void main(String[] args) throws IOException {
        zooKeeper = new ZooKeeper("", 5000, new TestZkWatch());

 * 测试用zk-watch
 * @author to_be_continued
class TestZkWatch implements Watcher {

    public void process(WatchedEvent watchedEvent) {
        System.out.println("testZkWatch watchedEvent: " + watchedEvent);



 * 创建节点
 * @param path
public static String create(String path, byte[] data) throws KeeperException, InterruptedException {
    return zooKeeper.create(path, data, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);

public static void main(String[] args) throws IOException, KeeperException, InterruptedException {
    zooKeeper = new ZooKeeper("", 5000, new TestZkWatch());

    create("/testZk", "testCreateData".getBytes());



public static void main(String[] args) throws IOException, KeeperException, InterruptedException {
    zooKeeper = new ZooKeeper("", 5000, new TestZkWatch());

//        create("/testZk", "testCreateData".getBytes());

    Stat stat = new Stat();
    System.out.println(new String(getData("/testZk", stat)));


 * 查询节点信息
 * @param path
 * @param stat
 * @return
 * @throws KeeperException
 * @throws InterruptedException
public static byte[] getData(String path, Stat stat) throws KeeperException, InterruptedException {
    return zooKeeper.getData(path, true, stat);


public static void main(String[] args) throws IOException, KeeperException, InterruptedException {
    zooKeeper = new ZooKeeper("", 5000, new TestZkWatch());

//        create("/testZk", "testCreateData".getBytes());

    Stat stat = new Stat();
    System.out.println(new String(getData("/testZk", stat)));

    System.out.println(update("/testZk", "updateData".getBytes(), stat.getVersion()));

    System.out.println(new String(getData("/testZk", stat)));


 * 更新指定节点数据信息
 * @param path
 * @param data
 * @param version 版本号,乐观锁的机制
 * @return stat
 * @throws KeeperException
 * @throws InterruptedException
public static Stat update(String path, byte[] data, int version) throws KeeperException, InterruptedException {
    return zooKeeper.setData(path, data, version);


public static void main(String[] args) throws IOException, KeeperException, InterruptedException {
    zooKeeper = new ZooKeeper("", 5000, new TestZkWatch());

//        create("/testZk", "testCreateData".getBytes());

    Stat stat = new Stat();
    System.out.println(new String(getData("/testZk", stat)));

    delete("/testZk", stat.getVersion());

//        //更新testZk节点信息
//        System.out.println(update("/testZk", "updateData".getBytes(), stat.getVersion()));
//        System.out.println(new String(getData("/testZk", stat)));
//        System.out.println(stat);


 * 删除指定节点
 * @param path
 * @param version
 * @throws KeeperException
 * @throws InterruptedException
public static void delete(String path, int version) throws KeeperException, InterruptedException {
    zooKeeper.delete(path, version);



ACL:zookeeper支持znode设置access control(访问控制)

ACL Permissions

ZooKeeper supports the following permissions:

  • CREATE: you can create a child node
  • READ: you can get data from a node and list its children.
  • WRITE: you can set data for a node
  • DELETE: you can delete a child node
  • ADMIN: you can set permissions


  • EPHEMERAL 临时节点,会话断开时节点被删除

  • EPHEMERAL_SEQUENTIAL 临时有序节点,会话断开时,znode将被删除,其名称将附加一个单调递增的数字。

  • PERSISTENT 持久化节点,会话断开时节点不会被删除

  • PERSISTENT_SEQUENTIAL 持久化有序节点,会话断开时,znode不会被删除,其名称将附加一个单调递增的数字。

  • TTL(Added in 3.6.0) TTL节点是3.6以后新增的一种可以支持设置过期时间的类型。可以理解为节点的属性。TTL仅支持在Persistent和Persistent Sequence上设置。

    TTL Nodes

    Added in 3.6.0

    When creating PERSISTENT or PERSISTENT_SEQUENTIAL znodes, you can optionally set a TTL in milliseconds for the znode. If the znode is not modified within the TTL and has no children it will become a candidate to be deleted by the server at some point in the future.

    Note: TTL Nodes must be enabled via System property as they are disabled by default. See the Administrator's Guide for details. If you attempt to create TTL Nodes without the proper System property set the server will throw KeeperException.UnimplementedException.

  • CONTAINER(Added in 3.6.0) 容器节点是3.6以后新增的一种特殊节点。当容器节点的子节点都被删除时,容器节点被删除。

    Container Nodes

    Added in 3.6.0

    ZooKeeper has the notion of container znodes. Container znodes are special purpose znodes useful for recipes such as leader, lock, etc. When the last child of a container is deleted, the container becomes a candidate to be deleted by the server at some point in the future.

    Given this property, you should be prepared to get KeeperException.NoNodeException when creating children inside of container znodes. i.e. when creating child znodes inside of container znodes always check for KeeperException.NoNodeException and recreate the container znode when it occurs.


What is Curator?

Curator n ˈkyoor͝ˌātər: a keeper or custodian of a museum or other collection - A ZooKeeper Keeper.

Apache Curator is a Java/JVM client library for Apache ZooKeeper, a distributed coordination service. It includes a highlevel API framework and utilities to make using Apache ZooKeeper much easier and more reliable. It also includes recipes for common use cases and extensions such as service discovery and a Java 8 asynchronous DSL.



Apache curator是Apache ZooKeeper(分布式协调服务)的Java/JVM客户端库。它包括一个高级API框架和工具,使使用Apache ZooKeeper更容易和更可靠。它还包括常见用例和扩展(如服务发现和Java 8异步DSL)的配方。

create and getData
 * 使用curator api可以减少很多使用原生api的不便之处,例如多级节点创建、大量的声明式异常等。
 * @author: tu
 * @Date: 2020/6/22 11:22
public class TestCurator {

    public static void main(String[] args) throws Exception {

        CuratorFramework curatorFramework = CuratorFrameworkFactory.newClient("", new ExponentialBackoffRetry(1000, 3));
        String path = "/testCurator";

        System.out.println(curatorFramework.create().forPath(path, "testCurator".getBytes()));
Distribute Lock

可以使用curator api来实现分布式锁。

InterProcessMutex lock = new InterProcessMutex(client, lockPath);
if ( lock.acquire(maxWait, waitUnit) ) 
        // do some work inside of the critical section here
Leader Election

可以使用curator api来实现选举

LeaderSelectorListener listener = new LeaderSelectorListenerAdapter()
    public void takeLeadership(CuratorFramework client) throws Exception
        // this callback will get called when you are the leader
        // do whatever leader work you need to and only exit
        // this method when you want to relinquish leadership

LeaderSelector selector = new LeaderSelector(client, path, listener);
selector.autoRequeue();  // not required, but this is behavior that you will probably expect




If you want to test multiple servers on a single machine, specify the servername as localhost with unique quorum & leader election ports (i.e. 2888:3888, 2889:3889, 2890:3890 in the example above) for each server.X in that server's config file. Of course separate _dataDir_s and distinct _clientPort_s are also necessary (in the above replicated example, running on a single localhost, you would still have three config files).

Please be aware that setting up multiple servers on a single machine will not create any redundancy. If something were to happen which caused the machine to die, all of the zookeeper servers would be offline. Full redundancy requires that each server have its own machine. It must be a completely separate physical server. Multiple virtual machines on the same physical host are still vulnerable to the complete failure of that host.


For replicated mode, a minimum of three servers are required, and it is strongly recommended that you have an odd number of servers. If you only have two servers, then you are in a situation where if one of them fails, there are not enough machines to form a majority quorum. Two servers are inherently less stable than a single server, because there are two single points of failure.





server是配置的前缀,这个是固定的。后面的.1代表zookeeper的serverId,id的文件在dataDir下存放,会有一个叫myid的文件,文件内部就是具体id值。等于号后面的分别是ip:同步通信端口:选举通信端口。其他的配置可以参考:Configuration Parameters

三台机器配置完成后,直接启动就完成了集群的部署。通过 sh zkServer.sh status可以查看到server的mode状态,Follower、Leader或者Observer。

