Zookeeper深入理解(二)应用场景模拟
1.分布式锁实现
我们可以利用临时节点来实现,多个进程都尝试创键临时节点/lock, 但最终只会有一个进程P能创建成功,而其他没能创建成功的进程,可以在节点/lock上Watch(相当于等待锁释放), 一旦进程P处理完事务,断开连接,节点/lock被自动删除,其他进程将得到通知,进而继续创建节点/lock,以争得锁资源。
实现步骤:
打开一个客户端创建临时lock节点
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 0] create -e /lock "lock" Created /lock
打开第二个客户端,创建临时lock节点报错,说明lock节点已经存在
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 2] create -e /lock "lock" Node already exists: /lock
关闭第一个客户端后等待几秒钟后,在第二个客户端查看znode目录,lock节点已经不存在
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 9] ls / [zookeeper]
然后就可创建lock节点
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 10] create -e /lock "lock" Created /lock
通过以上命令也就模拟出了多个agent共享分布式锁的简单功能。
2.Master-Worker实现
第一个会话创建一个叫/master的临时节点
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 3] create -e /master "master1.example.com:2223" Created /master [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 4] ls / [zookeeper, master] [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 5] get /master "master1.example.com:2223" cZxid = 0x20000000f ctime = Wed Mar 16 11:28:18 CST 2016 mZxid = 0x20000000f mtime = Wed Mar 16 11:28:18 CST 2016 pZxid = 0x20000000f cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x3537d63b3340001 dataLength = 26 numChildren = 0
假设现在还有另一个进程作为master备份节点,开始创建master节点,却被告知master节点已经存在
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 12] create -e /master "master2.example.com:2223" Node already exists: /master
但有可能在某一瞬间主master就崩溃了,这时备份master应立即转为主master,所以我们需要Watch主master
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 13] stat /master true cZxid = 0x20000000f ctime = Wed Mar 16 11:28:18 CST 2016 mZxid = 0x20000000f mtime = Wed Mar 16 11:28:18 CST 2016 pZxid = 0x20000000f cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x3537d63b3340001 dataLength = 26 numChildren = 0
stat命令可以获取节点的属性,并且监听其是否存在,参数true表明设置Watch。 这时,主master突然崩溃断开连接(第一个的会话),这时第二个会话将得到节点/master删除的通知,并立即转为主master
在主节点退出client端
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 7] quit Quitting... 2016-03-16 11:31:48,954 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x3537d63b3340001 closed 2016-03-16 11:31:48,955 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@512] - EventThread shut down [root@zookeeper1 ~]#
同时观察从节点会收到watch消息
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 14] WATCHER:: WatchedEvent state:SyncConnected type:NodeDeleted path:/master
查看znode文件系统/master节点已经消失
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 14] ls / [zookeeper] [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 15] create -e /master "master2.example.com:2223" Created /master [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 16] ls / [zookeeper, master]
3.工作者(Workers),任务(Tasks)和分配(Assignments)
先建立分别存在Workers,Tasks和Assignments的节点:/workers,/tasks,/assign。(注意是持久节点)
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 0] create /workers "" Created /workers [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 1] create /tasks "" Created /tasks [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 2] create /assign "" Created /assign [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 3] ls / [zookeeper, workers, tasks, assign] [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 4]
现在我们的master节点需要监听到节点/workers和/tasks,以便分配task到worker
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 4] ls /workers true [] [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 5] ls /tasks true []
在worker角色
打开另一个会话,假设现在有一个worker可用
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 18] create -e /workers/worker1.example.com "worker1.example.com:2224" Created /workers/worker1.example.com
此时master也得到/workers子节点变化的通知
WATCHER:: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/workers
为了收到分配的任务,worker需要创建一个节点 /assign/worker1.example.com,并且监听子节点的变化
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 20] create /assign/worker1.example.com "" Created /assign/worker1.example.com [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 21] ls /assign/worker1.example.com true []
在Client角色
现在假设一个客户端提交了一个任务到服务器中, 并且它必须还得监听该任务节点, 因为客户端必须知道自己提交的任务到底被执行或执行成功没有
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 0] create -s /tasks/task- "cmd" Created /tasks/task-0000000000 [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 1] ls /tasks/task-0000000000 true []
这里我们创建了一个连续持久节点,因此其节点名称加上了一个递增整数0000000000, 这时,master节点就感知到有新的任务提交上来了,将其分配给worker1
WATCHER:: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/tasks
然后master节点检查新的任务,可用的worker节点,并分配任务给worker
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 7] ls /tasks [task-0000000000] [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 8] ls /workers [worker1.example.com] [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 9] create /assign/worker1.example.com/task-0000000000 "" Created /assign/worker1.example.com/task-0000000000
于是,worker节点感知到了分配给自己的任务,并做检查
WATCHER:: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/assign/worker1.example.com [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 24] ls /assign/worker1.example.com [task-0000000000]
worker一旦完成了任务,将在对应的任务下增加一个状态节点
[zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 25] create /tasks/task-0000000000/status "done" Created /tasks/task-0000000000/status
此时客户端将得到通知,并检查任务执行结果
WATCHER:: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/tasks/task-0000000000 [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 2] get /tasks/task-0000000000 "cmd" cZxid = 0x20000001d ctime = Wed Mar 16 13:28:05 CST 2016 mZxid = 0x20000001d mtime = Wed Mar 16 13:28:05 CST 2016 pZxid = 0x20000001f cversion = 1 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 5 numChildren = 1 [zk: 127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184(CONNECTED) 3] get /tasks/task-0000000000/status "done" cZxid = 0x20000001f ctime = Wed Mar 16 13:35:33 CST 2016 mZxid = 0x20000001f mtime = Wed Mar 16 13:35:33 CST 2016 pZxid = 0x20000001f cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 6 numChildren = 0
于是,客户端就知道了任务被执行的结果,这里结果为"done", 表示任务被成功执行。
以上就是整个Master-Worker架构的主要工作机制,虽然只是一个模拟过程, 但是对我们理解Master-Worker工作原理是很有帮助的,对以后要研究代码实现,也是一个很好的铺垫。
4.以上内容主要介绍了zookeeper运用的主要三种模式,最后精炼总结一下:
1)分布式锁实现
通过创建临时节点/lock锁节点的方式,谁先成功创建锁谁就占用锁,谁用完锁谁来释放锁,谁占用锁但程序崩溃就自动释放锁。
2)Master-Worker实现
作为比较常用的主备解决方案原理为:主节点启动占用/master临时节点,被节点启动无法占用/master节点,但备用节点会watch /master节点,当主节点崩溃,备节点收到消息并立即占用主节点/master
3)工作者、任务和分配
存在Workers,Tasks和Assignments的节点 master节点需要监听到节点/workers和/tasks 创建一个workers后master收到消息 worker创建assign节点,监听assign client创建task节点,并监听此task节点 master感知task上传和确定可用的worker,分配任务给worker worker感知task被分配,任务处理完成增加节点状态 client感知task处理完成,任务执行最后成功。