模拟Redis集群节点不可用情况自动切换

前文使用docker搭建了redis的cluster集群,现在模拟节点不可用的场景。

首先看下当前的集群进程:

[root@new2 docker-redis-cluster]# ps -ef | grep redis | grep -v 'grep'
polkitd 21836 21810 0 21:14 pts/0 00:00:00 redis-server 0.0.0.0:6380 [cluster]
polkitd 21851 21826 0 21:14 pts/0 00:00:00 redis-server 0.0.0.0:6381 [cluster]
polkitd 21925 21890 0 21:14 pts/0 00:00:00 redis-server 0.0.0.0:6383 [cluster]
polkitd 21958 21930 0 21:14 pts/0 00:00:00 redis-server 0.0.0.0:6382 [cluster]
polkitd 22007 21953 0 21:14 pts/0 00:00:00 redis-server 0.0.0.0:6379 [cluster]
polkitd 22083 21972 0 21:14 pts/0 00:00:00 redis-server 0.0.0.0:6384 [cluster]

以及各自主从关系,发现端口6379是端口6282的从库:

[root@new2 docker-redis-cluster]# redis-cli --cluster check 127.0.0.1:6379
127.0.0.1:6381 (512aa1e3...) -> 0 keys | 5461 slots | 1 slaves.
127.0.0.1:6383 (86a1463a...) -> 2 keys | 5462 slots | 1 slaves.
127.0.0.1:6382 (98197290...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 2 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 127.0.0.1:6379)
S: eb834627d2caa946f0b921d5b0e73f18f3df9f25 127.0.0.1:6379
slots: (0 slots) slave
replicates 98197290ea49812b2c75aae5c7363be4d1a0b31c
M: 512aa1e33ca008d05cc7b4c8cc3b829ebea9f1d1 127.0.0.1:6381
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: ad23e9bf7b5168b511fd5b787a4cbf092a6e29c0 127.0.0.1:6384
slots: (0 slots) slave
replicates 512aa1e33ca008d05cc7b4c8cc3b829ebea9f1d1
M: 86a1463a2571582619deebdfc0cba09c942c0ec8 127.0.0.1:6383
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: 95518dcd45a85f3788feb9c5ef85ff36cc8564c1 127.0.0.1:6380
slots: (0 slots) slave
replicates 86a1463a2571582619deebdfc0cba09c942c0ec8
M: 98197290ea49812b2c75aae5c7363be4d1a0b31c 127.0.0.1:6382
slots:[0-5460] (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

 

现在kill掉主服务端口6382,再观察集群状态可见6379端口已升为master主库(期间集群重新选举master有点延迟1到2秒):

[root@new2 docker-redis-cluster]# redis-cli --cluster check 127.0.0.1:6379
Could not connect to Redis at 127.0.0.1:6382: Connection refused
127.0.0.1:6379 (eb834627...) -> 0 keys | 5461 slots | 0 slaves.
127.0.0.1:6381 (512aa1e3...) -> 0 keys | 5461 slots | 1 slaves.
127.0.0.1:6383 (86a1463a...) -> 2 keys | 5462 slots | 1 slaves.
[OK] 2 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 127.0.0.1:6379)
M: eb834627d2caa946f0b921d5b0e73f18f3df9f25 127.0.0.1:6379
   slots:[0-5460] (5461 slots) master
M: 512aa1e33ca008d05cc7b4c8cc3b829ebea9f1d1 127.0.0.1:6381
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: ad23e9bf7b5168b511fd5b787a4cbf092a6e29c0 127.0.0.1:6384
   slots: (0 slots) slave
   replicates 512aa1e33ca008d05cc7b4c8cc3b829ebea9f1d1
M: 86a1463a2571582619deebdfc0cba09c942c0ec8 127.0.0.1:6383
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 95518dcd45a85f3788feb9c5ef85ff36cc8564c1 127.0.0.1:6380
   slots: (0 slots) slave
   replicates 86a1463a2571582619deebdfc0cba09c942c0ec8
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

 

我们启动6382端口服务,重新打印集群状态(6382端口容器名称即slave-1),可见6382被设为6379端口的从库(期间集群重新选举master有点延迟1到2秒):

[root@new2 docker-redis-cluster]# docker start slave-1
slave-1
[root@new2 docker-redis-cluster]# redis-cli --cluster check 127.0.0.1:6379
127.0.0.1:6379 (eb834627...) -> 0 keys | 5461 slots | 1 slaves.
127.0.0.1:6381 (512aa1e3...) -> 0 keys | 5461 slots | 1 slaves.
127.0.0.1:6383 (86a1463a...) -> 2 keys | 5462 slots | 1 slaves.
[OK] 2 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 127.0.0.1:6379)
M: eb834627d2caa946f0b921d5b0e73f18f3df9f25 127.0.0.1:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: 512aa1e33ca008d05cc7b4c8cc3b829ebea9f1d1 127.0.0.1:6381
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: ad23e9bf7b5168b511fd5b787a4cbf092a6e29c0 127.0.0.1:6384
   slots: (0 slots) slave
   replicates 512aa1e33ca008d05cc7b4c8cc3b829ebea9f1d1
M: 86a1463a2571582619deebdfc0cba09c942c0ec8 127.0.0.1:6383
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 95518dcd45a85f3788feb9c5ef85ff36cc8564c1 127.0.0.1:6380
   slots: (0 slots) slave
   replicates 86a1463a2571582619deebdfc0cba09c942c0ec8
S: 98197290ea49812b2c75aae5c7363be4d1a0b31c 127.0.0.1:6382
   slots: (0 slots) slave
   replicates eb834627d2caa946f0b921d5b0e73f18f3df9f25
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

 

另外根据网上资料,自己模拟了整个集群进入fail状态的三种情况(这里不做打印了):

A、某个主节点和所有从节点全部挂掉,我们集群就进入faill状态。

B、如果集群超过半数以上master挂掉,无论是否有slave,集群进入fail状态.

C、如果集群任意master挂掉,且当前master没有slave.集群进入fail状态 

          

posted @ 2020-09-12 21:26  潮起潮落中看星辰大海  阅读(802)  评论(0编辑  收藏  举报