3个manager节点的swarm集群,模拟manager节点故障和故障转移


如果一个swarm集群中,你有多个manager节点,比如3个,你的目的是什么?

 

那还用说吗,当然是一个manager挂掉之后,进行故障的转移了,但是你经历过这个转移吗?

 

如果没有,跟着下面的过程,模拟一次。

 

首先,在集群中有3个manager节点

 

[root@nccztsjb-node-01 ~]# docker node ls
ID                            HOSTNAME           STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
gxfkhuc95br6ltkhorpw1q4tq *   nccztsjb-node-01   Ready     Active         Leader           20.10.17
8zjicf39fk28jn106symk1g5e     nccztsjb-node-02   Ready     Active                          20.10.17
7d59usghrgq05k0yh4lbykw5v     nccztsjb-node-04   Ready     Active         Reachable        20.10.17
wnd24l698iruhhp1xw0y3iyig     nccztsjb-node-05   Ready     Active         Reachable        20.10.17

 

节点nccztsjb-node-01是管理节点,目前的角色是Leader.

 

其他两个manager节点,目前都是Reachable的状态。

 

 

接下来,关闭nccztsjb-node-01这个manager节点:

 

直接将docker引擎给关闭了:

[root@nccztsjb-node-01 ~]# systemctl stop docker 
Warning: Stopping docker.service, but it can still be activated by:
  docker.socket
[root@nccztsjb-node-01 ~]# systemctl stop docker.socket
[root@nccztsjb-node-01 ~]# docker node ls
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
[root@nccztsjb-node-01 ~]# 

 

 

从其他的manager节点查看状态:

 

nccztsjb-node-04:

 

[root@nccztsjb-node-04 ~]# docker node ls
ID                            HOSTNAME           STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
gxfkhuc95br6ltkhorpw1q4tq     nccztsjb-node-01   Down      Active         Unreachable      20.10.17
8zjicf39fk28jn106symk1g5e     nccztsjb-node-02   Ready     Active                          20.10.17
7d59usghrgq05k0yh4lbykw5v *   nccztsjb-node-04   Ready     Active         Reachable        20.10.17
wnd24l698iruhhp1xw0y3iyig     nccztsjb-node-05   Ready     Active         Leader           20.10.17
[root@nccztsjb-node-04 ~]# 

 

 

发现,目前nccztsjb-node-01是Down的状态,并且是Unreachable的,更加重要的是,已经选举出来新的Leader nccztsjb-node-04

 

nccztsjb-node-05:

 

[root@nccztsjb-node-05 ~]# docker node ls
ID                            HOSTNAME           STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
gxfkhuc95br6ltkhorpw1q4tq     nccztsjb-node-01   Down      Active         Unreachable      20.10.17
8zjicf39fk28jn106symk1g5e     nccztsjb-node-02   Ready     Active                          20.10.17
7d59usghrgq05k0yh4lbykw5v     nccztsjb-node-04   Ready     Active         Reachable        20.10.17
wnd24l698iruhhp1xw0y3iyig *   nccztsjb-node-05   Ready     Active         Leader           20.10.17
[root@nccztsjb-node-05 ~]# 

 

 

目前,这个节点就是Leader节点。

 

通过上面的输出,你已经可以看到,轻松的实现了manager节点的故障转移,选取出来了新的Leader角色。

 

到这里完了吗?当然没有

 

如果节点恢复呢······

 

[root@nccztsjb-node-01 ~]# systemctl start docker
[root@nccztsjb-node-01 ~]# 
[root@nccztsjb-node-01 ~]# docker node ls
ID                            HOSTNAME           STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
gxfkhuc95br6ltkhorpw1q4tq *   nccztsjb-node-01   Ready     Active         Reachable        20.10.17
8zjicf39fk28jn106symk1g5e     nccztsjb-node-02   Ready     Active                          20.10.17
7d59usghrgq05k0yh4lbykw5v     nccztsjb-node-04   Ready     Active         Reachable        20.10.17
wnd24l698iruhhp1xw0y3iyig     nccztsjb-node-05   Ready     Active         Leader           20.10.17
[root@nccztsjb-node-01 ~]# 

 

 

可以看到,恢复之后,还是manager节点,但是状态是Reachable。没有恢复到Leader的角色。

 

OK,这个就是模拟了一个3个manager节点的故障、转移的过程。你懂了吗?

posted @ 2022-09-13 14:39  Zhai_David  阅读(334)  评论(0编辑  收藏  举报