docker 搭建redis集群
环境:centos7
docker pull redis:5.0.5
vim redis-init.sh
touch redis-run.sh redis-start.sh redis-stop.sh
cat << EOF >redis-run.sh
docker run --name redis-node1 --net host -v /usr/local/redis-data/node1:/data -d redis:5.0.5 --cluster-enabled yes --cluster-config-file nodes-node-1.conf --port 7001 --appendonly yes
docker run --name redis-node2 --net host -v /usr/local/redis-data/node2:/data -d redis:5.0.5 --cluster-enabled yes --cluster-config-file nodes-node-2.conf --port 7002 --appendonly yes
docker run --name redis-node3 --net host -v /usr/local/redis-data/node3:/data -d redis:5.0.5 --cluster-enabled yes --cluster-config-file nodes-node-3.conf --port 7003 --appendonly yes
docker run --name redis-node4 --net host -v /usr/local/redis-data/node4:/data -d redis:5.0.5 --cluster-enabled yes --cluster-config-file nodes-node-4.conf --port 7004 --appendonly yes
docker run --name redis-node5 --net host -v /usr/local/redis-data/node5:/data -d redis:5.0.5 --cluster-enabled yes --cluster-config-file nodes-node-5.conf --port 7005 --appendonly yes
docker run --name redis-node6 --net host -v /usr/local/redis-data/node6:/data -d redis:5.0.5 --cluster-enabled yes --cluster-config-file nodes-node-6.conf --port 7006 --appendonly yes
EOF
cat <<EOF >redis-start.sh
docker start redis-node1 redis-node2 redis-node3 redis-node4 redis-node5 redis-node6
EOF
cat <<EOF >redis-stop.sh
docker stop redis-node1 redis-node2 redis-node3 redis-node4 redis-node5 redis-node6
EOF
chmod 755 redis-run.sh redis-start.sh redis-stop.sh
./redis-run.sh
启动redis ,进入一个redis实例
docker exec -it redis-node1 /bin/bash
redis-cli --cluster create 192.168.209.160:7001 192.168.209.160:7002 192.168.209.160:7003 192.168.209.160:7004 192.168.209.160:7005 192.168.209.160:7006 --cluster-replicas 1
//查看集群目前状况(注意:使用redic-cli命令连接时,必须加上-c参数,表示以集群的方式连接)
redis-cli -h 192.168.209.160 -p 7001 -c
cluster info
//查看节点基本信息
cluster nodes
添加Master节点到集群
-
按照Redis集群一的方式,创建端口为7007的新实例,并启动该实例
-
将7007添加到集群:
第二个参数127.0.0.1:7001为当前集群已存在的节点,这里只要是该集群中的任意一个可用节点都可以,不要求必须是第一个。
新节点不能有数据,否则会报错:[ERR] Node 127.0.0.1:7007 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0。一般是直接复制正在使用的redis目录导致的,使用redis-cli连接该服务,然后依次执行flushall和cluster reset命令。redis-cli --cluster add-node 127.0.0.1:7007 127.0.0.1:7000
//查看节点操作后的信息
redis-cli -h -c -p 7007
cluster nodes
//此时可以看到7007节点的connected后面没有Hash槽(slot),新加入的加点是一个主节点, 当集群需要将某个从节点升级为新的主节点时, 这个新节点不会被选中,也不会参与选举。给新节点分配哈希槽:
#参数127.0.0.1:7001只是表示连接到这个集群,具体对哪个节点进行操作后面会提示输入 redis-cli --cluster reshard 127.0.0.1:7001 返回信息: >>> Performing Cluster Check (using node 127.0.0.1:7001) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. #根据提示选择要迁移的slot数量(这里选择1000) How many slots do you want to move (from 1 to 16384)? 1000 #选择要接受这些slot的node-id(这里是7007) What is the receiving node ID? a70d7fff6d6dde511cb7cb632a347be82dd34643 #选择slot来源: #all表示从所有的master重新分配, #或者数据要提取slot的master节点id(这里是7001),最后用done结束 Please enter all the source node IDs. Type 'all' to use all the nodes as source nodes for the hash slots. Type 'done' once you entered all the source nodes IDs. Source node #1:3bcdfbed858bbdd92dd760632b9cb4c649947fed Source node #2:done #打印被移动的slot后,输入yes开始移动slot以及对应的数据. Do you want to proceed with the proposed reshard plan (yes/no)? yes #结束
再次查看结果
redis-cli -c -p 7001 cluster nodes
添加Slave节点到集群
-
按照Redis集群一的方式,创建端口为7008的新实例,并启动该实例
-
将7008添加到集群:
由于没有指定master节点,所以redis会自动分配master节点,这里把7001作为7008的master。
注意:add-node命令后面的127.0.0.1:7001并不是指7001作为新节点的master。redis-cli --cluster add-node 127.0.0.1:7008 127.0.0.1:7001 --cluster-slave 返回信息: >>> Adding node 127.0.0.1:7008 to cluster 127.0.0.1:7001 >>> Performing Cluster Check (using node 127.0.0.1:7001) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. Automatically selected master 127.0.0.1:7001 >>> Send CLUSTER MEET to node 127.0.0.1:7008 to make it join the cluster. Waiting for the cluster to join >>> Configure node as replica of 127.0.0.1:7001. [OK] New node added correctly.
-
也可以添加时指定master节点:
--cluster-master-id为master节点的 id
redis-cli --cluster add-node 127.0.0.1:7008 127.0.0.1:7001 --cluster-slave --cluster-master-id 2a8f29e22ec38f56e062f588e5941da24a2bafa0
-
更改master节点为7002:
redis-cli -h -p 7008
127.0.0.1:7008> cluster replicate 9b022d79cf860c87dc2190cdffc55b282dd60e42
OK删除一个Slave节点
#redis-trib del-node ip:port '<node-id>' #这里移除的是7008 redis-cli --cluster del-node 127.0.0.1:7001 74957282ffa94c828925c4f7026baac04a67e291 返回信息: >>> Removing node 74957282ffa94c828925c4f7026baac04a67e291 from cluster 127.0.0.1:7001 >>> Sending CLUSTER FORGET messages to the cluster... >>> SHUTDOWN the node.
删除一个Master节点
删除master节点之前首先要使用reshard移除master的全部slot,然后再删除当前节点(目前只能把被删除master的slot迁移到一个节点上)
-
-
redis-cli --cluster reshard 127.0.0.1:7007 #根据提示选择要迁移的slot数量(7003上有1000个slot全部转移) How many slots do you want to move (from 1 to 16384)? 1000 #选择要接受这些slot的node-id What is the receiving node ID? 3bcdfbed858bbdd92dd760632b9cb4c649947fed #选择slot来源: #all表示从所有的master重新分配, #或者数据要提取slot的master节点id,最后用done结束 Please enter all the source node IDs. Type 'all' to use all the nodes as source nodes for the hash slots. Type 'done' once you entered all the source nodes IDs. Source node #1:a70d7fff6d6dde511cb7cb632a347be82dd34643 Source node #2:done #打印被移动的slot后,输入yes开始移动slot以及对应的数据. #Do you want to proceed with the proposed reshard plan (yes/no)? yes #结束 #删除空master节点 redis-cli --cluster del-node 127.0.0.1:7007 'a70d7fff6d6dde511cb7cb632a347be82dd34643'
故障测试
启动一个集群,其中7004节点是7003节点的从节点:
127.0.0.1:7003> cluster nodes ea4e0dcf8dbf6d4611659b5abbd6563926224f0f 127.0.0.1:7004@17004 slave e852e07181f20dd960407e5b08f7122870f67c89 0 1542793126295 4 connected 3bcdfbed858bbdd92dd760632b9cb4c649947fed 127.0.0.1:7000@17000 master - 0 1542793125260 1 connected 1000-5460 2a8f29e22ec38f56e062f588e5941da24a2bafa0 127.0.0.1:7001@17001 master - 0 1542793124000 2 connected 5461-10922 e852e07181f20dd960407e5b08f7122870f67c89 127.0.0.1:7003@17003 myself,master - 0 1542793124000 4 connected 0-999 9b022d79cf860c87dc2190cdffc55b282dd60e42 127.0.0.1:7002@17002 master - 0 1542793126000 3 connected 10923-16383
在集群中添加了key:"hello",该key被放到了7003节点
127.0.0.1:7002> set hello world -> Redirected to slot [866] located at 127.0.0.1:7003 OK
让7003节点崩溃:
redis-cli -p 7003 debug segfault Error: Server closed the connection
查看节点状态,发现7003状态为fail,7004被提升为master
127.0.0.1:7000> cluster nodes 9b022d79cf860c87dc2190cdffc55b282dd60e42 127.0.0.1:7002@17002 master - 0 1542793571000 3 connected 10923-16383 3bcdfbed858bbdd92dd760632b9cb4c649947fed 127.0.0.1:7000@17000 myself,master - 0 1542793570000 1 connected 1000-5460 2a8f29e22ec38f56e062f588e5941da24a2bafa0 127.0.0.1:7001@17001 master - 0 1542793571422 2 connected 5461-10922 ea4e0dcf8dbf6d4611659b5abbd6563926224f0f 127.0.0.1:7004@17004 master - 0 1542793572442 5 connected 0-999 e852e07181f20dd960407e5b08f7122870f67c89 127.0.0.1:7003@17003 master,fail - 1542793477237 1542793474000 4 disconnected
从集群获取"hello",被转发到7004
127.0.0.1:7000> get hello -> Redirected to slot [866] located at 127.0.0.1:7004 "world"
重新启动7003,发现7003自动加入集群,并变成了slave
127.0.0.1:7004> cluster nodes ea4e0dcf8dbf6d4611659b5abbd6563926224f0f 127.0.0.1:7004@17004 myself,master - 0 1542793764000 5 connected 0-999 3bcdfbed858bbdd92dd760632b9cb4c649947fed 127.0.0.1:7000@17000 master - 0 1542793765000 1 connected 1000-5460 2a8f29e22ec38f56e062f588e5941da24a2bafa0 127.0.0.1:7001@17001 master - 0 1542793765560 2 connected 5461-10922 9b022d79cf860c87dc2190cdffc55b282dd60e42 127.0.0.1:7002@17002 master - 0 1542793764529 3 connected 10923-16383 e852e07181f20dd960407e5b08f7122870f67c89 127.0.0.1:7003@17003 slave ea4e0dcf8dbf6d4611659b5abbd6563926224f0f 0 1542793766585 5 connected
获取"hello",被转发到master节点7004
127.0.0.1:7002> get hello -> Redirected to slot [866] located at 127.0.0.1:7004 "world"
将7003和7004都下线,然后再获取"hello",报错提示集群已下线
127.0.0.1:7000> cluster nodes 9b022d79cf860c87dc2190cdffc55b282dd60e42 127.0.0.1:7002@17002 master - 0 1542794095233 3 connected 10923-16383 3bcdfbed858bbdd92dd760632b9cb4c649947fed 127.0.0.1:7000@17000 myself,master - 0 1542794094000 1 connected 1000-5460 2a8f29e22ec38f56e062f588e5941da24a2bafa0 127.0.0.1:7001@17001 master - 0 1542794096261 2 connected 5461-10922 ea4e0dcf8dbf6d4611659b5abbd6563926224f0f 127.0.0.1:7004@17004 master,fail - 1542794075628 1542794074000 5 disconnected 0-999 e852e07181f20dd960407e5b08f7122870f67c89 127.0.0.1:7003@17003 slave,fail ea4e0dcf8dbf6d4611659b5abbd6563926224f0f 1542794070058 1542794067000 5 disconnected 127.0.0.1:7000> get hello (error) CLUSTERDOWN The cluster is down
启动7003,报错,原因是7003下线时是7004的从节点,启动后默认去主节点同步数据。
Connecting to MASTER 127.0.0.1:7004 MASTER <-> REPLICA sync started Error condition on socket for SYNC: Connection refused
启动7004,获取"hello",恢复正常
127.0.0.1:7002> get hello -> Redirected to slot [866] located at 127.0.0.1:7004 "world"
集群故障修复
现在有7000~7003共4个节点的集群,其中0~999slot在7003上,"set hello world"并被分配到7003。
127.0.0.1:7000> cluster nodes e852e07181f20dd960407e5b08f7122870f67c89 127.0.0.1:7003@17003 master - 0 1542852396911 6 connected 0-999 9b022d79cf860c87dc2190cdffc55b282dd60e42 127.0.0.1:7002@17002 master - 0 1542852395887 3 connected 10923-16383 2a8f29e22ec38f56e062f588e5941da24a2bafa0 127.0.0.1:7001@17001 master - 0 1542852394863 2 connected 5461-10922 3bcdfbed858bbdd92dd760632b9cb4c649947fed 127.0.0.1:7000@17000 myself,master - 0 1542852395000 1 connected 1000-5460
关闭7003之后,在集群做任何存取值操作都会报错:(error) CLUSTERDOWN The cluster is down。原因是0~999 sloat在7003上,Redis认为slot不完整,所以报错。
127.0.0.1:7000> get hello (error) CLUSTERDOWN The cluster is down 127.0.0.1:7000> get foo (error) CLUSTERDOWN The cluster is down
执行fix命令,提示无法连接,因为7003已经被关闭了[Facepalm]
redis-cli --cluster fix 127.0.0.1:7003 Could not connect to Redis at 127.0.0.1:7003: Connection refused
重新开启7003恢复正常。
感谢:https://my.oschina.net/tongyufu/blog/406829
-