使用Docker部署Redis Cluster 高可用测试环境

背景：

之前通过Docker部署了Redis单实例。本文要实现的通过docker来部署 6节点（3主3从）的 Cluster 高可用测试环境。

环境及配置：

　　1. 建立相关目录

[root@localhost dir_redis_cluster]# tree 
.
├── m7026
│   ├── data
│   │   ├── appendonly.aof
│   │   ├── dump.rdb
│   │   ├── nodes-7021.conf
│   │   ├── redis_1.log
│   │   ├── redis_1.pid
│   │   └── redis_1.sock
│   └── redis.cnf
├── m7027
│   ├── data
│   │   ├── appendonly.aof
│   │   ├── dump.rdb
│   │   ├── nodes-7021.conf
│   │   ├── redis_1.log
│   │   ├── redis_1.pid
│   │   └── redis_1.sock
│   └── redis.cnf
├── m7028
│   ├── data
│   │   ├── appendonly.aof
│   │   ├── dump.rdb
│   │   ├── nodes-7021.conf
│   │   ├── redis_1.log
│   │   ├── redis_1.pid
│   │   └── redis_1.sock
│   └── redis.cnf
├── s7026
│   ├── data
│   │   ├── appendonly.aof
│   │   ├── dump.rdb
│   │   ├── nodes-7021.conf
│   │   ├── redis_1.log
│   │   ├── redis_1.pid
│   │   └── redis_1.sock
│   └── redis.cnf
├── s7027
│   ├── data
│   │   ├── appendonly.aof
│   │   ├── dump.rdb
│   │   ├── nodes-7021.conf
│   │   ├── redis_1.log
│   │   ├── redis_1.pid
│   │   └── redis_1.sock
│   └── redis.cnf
└── s7028
    ├── data
    │   ├── appendonly.aof
    │   ├── dump.rdb
    │   ├── nodes-7021.conf
    │   ├── redis_1.log
    │   ├── redis_1.pid
    │   └── redis_1.sock
    └── redis.cnf

View Code

2. 修改redis配置文件redis.cnf，在此前的基础上加入集群配置即可

################################ REDIS CLUSTER ###############################
#集群开关，默认是不开启集群模式。
cluster-enabled yes

#集群配置文件的名称，每个节点都有一个集群相关的配置文件，持久化保存集群的信息。这个文件并不需要手动配置，这个配置文件有Redis生成并更新，每个Redis集群节点需要一个单独的配置文件，请确保与实例运行的系统中配置文件名称不冲突
cluster-config-file nodes-7021.conf

#节点互连超时的阀值。集群节点超时毫秒数
cluster-node-timeout 30000

#在进行故障转移的时候，全部slave都会请求申请为master，但是有些slave可能与master断开连接一段时间了，导致数据过于陈旧，这样的slave不应该被提升>为master。该参数就是用来判断slave节点与master断线的时间是否过长。判断方法是：
#比较slave断开连接的时间和(node-timeout * slave-validity-factor) + repl-ping-slave-period
#如果节点超时时间为三十秒, 并且slave-validity-factor为10,假设默认的repl-ping-slave-period是10秒，即如果超过310秒slave将不会尝试进行故障转移
#可能出现由于某主节点失联却没有从节点能顶上的情况，从而导致集群不能正常工作，在这种情况下，只有等到原来的主节点重新回归到集群，集群才恢复运作
#如果设置成０，则无论从节点与主节点失联多久，从节点都会尝试升级成主节
cluster-slave-validity-factor 10

#master的slave数量大于该值，slave才能迁移到其他孤立master上，如这个参数若被设为2，那么只有当一个主节点拥有2 个可工作的从节点时，它的一个从节>点会尝试迁移。
#主节点需要的最小从节点数，只有达到这个数，主节点失败时，它从节点才会进行迁移。
# cluster-migration-barrier 1

#默认情况下，集群全部的slot有节点分配，集群状态才为ok，才能提供服务。设置为no，可以在slot没有全部分配的时候提供服务。不建议打开该配置，这样会造成分区的时候，小分区的master一直在接受写请求，而造成很长时间数据不一致。
#在部分key所在的节点不可用时，如果此参数设置为”yes”(默认值), 则整个集群停止接受操作；如果此参数设置为”no”，则集群依然为可达节点上的key提供读>操作
cluster-require-full-coverage yes

完整的配置文件如下：

[root@localhost dir_redis_cluster]# cat m7026/redis.cnf 
daemonize no
protected-mode yes
pidfile "/data/data/redis_1.pid"
port 7026
tcp-backlog 511
bind 0.0.0.0
unixsocket "/data/data/redis_1.sock"
timeout 0
tcp-keepalive 0
loglevel notice
logfile "/data/data/redis_1.log"
databases 16
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum no
dbfilename "dump.rdb"
dir "/data/data"
masterauth "redis"
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-ping-slave-period 5
repl-timeout 60
repl-disable-tcp-nodelay no
repl-backlog-size 32mb
repl-backlog-ttl 3600
slave-priority 100
requirepass "redis"
rename-command FLUSHDB REDIS_FLUSHDB
rename-command FLUSHALL REDIS_FLUSHALL
rename-command KEYS REDIS_KEYS
maxmemory 128mb
maxmemory-policy allkeys-lru
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite yes
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 1000
latency-monitor-threshold 0
notify-keyspace-events "e"
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
# Generated by CONFIG REWRITE

################################ REDIS CLUSTER ###############################
#集群开关，默认是不开启集群模式。
cluster-enabled yes

#集群配置文件的名称，每个节点都有一个集群相关的配置文件，持久化保存集群的信息。这个文件并不需要手动配置，这个配置文件有Redis生成并更新，每个Redis集群节点需要一个单独的配置文件，请确保与实例运行的系统中配置文件名称不冲突
cluster-config-file nodes-7021.conf

#节点互连超时的阀值。集群节点超时毫秒数
cluster-node-timeout 30000

#在进行故障转移的时候，全部slave都会请求申请为master，但是有些slave可能与master断开连接一段时间了，导致数据过于陈旧，这样的slave不应该被提升>为master。该参数就是用来判断slave节点与master断线的时间是否过长。判断方法是：
#比较slave断开连接的时间和(node-timeout * slave-validity-factor) + repl-ping-slave-period
#如果节点超时时间为三十秒, 并且slave-validity-factor为10,假设默认的repl-ping-slave-period是10秒，即如果超过310秒slave将不会尝试进行故障转移
#可能出现由于某主节点失联却没有从节点能顶上的情况，从而导致集群不能正常工作，在这种情况下，只有等到原来的主节点重新回归到集群，集群才恢复运作
#如果设置成０，则无论从节点与主节点失联多久，从节点都会尝试升级成主节
cluster-slave-validity-factor 10

#master的slave数量大于该值，slave才能迁移到其他孤立master上，如这个参数若被设为2，那么只有当一个主节点拥有2 个可工作的从节点时，它的一个从节>点会尝试迁移。
#主节点需要的最小从节点数，只有达到这个数，主节点失败时，它从节点才会进行迁移。
# cluster-migration-barrier 1

#默认情况下，集群全部的slot有节点分配，集群状态才为ok，才能提供服务。设置为no，可以在slot没有全部分配的时候提供服务。不建议打开该配置，这样会造成分区的时候，小分区的master一直在接受写请求，而造成很长时间数据不一致。
#在部分key所在的节点不可用时，如果此参数设置为”yes”(默认值), 则整个集群停止接受操作；如果此参数设置为”no”，则集群依然为可达节点上的key提供读>操作
cluster-require-full-coverage yes

View Code

3. 创建docker-compose配置文件：

[root@localhost data]# cat compose_redis_cluster.yaml 
version: '2'
networks:
  redisnet:
    external: true
#    ipam:
#      driver: default
#      config:
#        - subnet: 162.29.0.0/24
services:
  redis_m7026:
    #build:
    #  context: .
    #  dockerfile: Dockerfile
    image:  redis:4.0.8
    container_name: redis_m7026
    command: redis-server /data/redis.cnf 
    #restart: always
    #environment:
    #  MYSQL_ROOT_PASSWORD: 12345678

    ports:
      - 17026:7026
    volumes:
      - /data/data/dir_redis_cluster/m7026:/data
    networks: 
      redisnet:
        ipv4_address: 162.29.0.26


  redis_m7027:
    #build:
    #  context: .
    #  dockerfile: Dockerfile
    image:  redis:4.0.8
    container_name: redis_m7027
    command: redis-server /data/redis.cnf 
    #restart: always
    #environment:
    #  MYSQL_ROOT_PASSWORD: 12345678

    ports:
      - 17027:7027
    volumes:
      - /data/data/dir_redis_cluster/m7027:/data
    networks: 
      redisnet:
        ipv4_address: 162.29.0.27



  redis_m7028:
    #build:
    #  context: .
    #  dockerfile: Dockerfile
    image:  redis:4.0.8
    container_name: redis_m7028
    command: redis-server /data/redis.cnf 
    #restart: always
    #environment:
    #  MYSQL_ROOT_PASSWORD: 12345678

    ports:
      - 17028:7028
    volumes:
      - /data/data/dir_redis_cluster/m7028:/data
    networks: 
      redisnet:
        ipv4_address: 162.29.0.28

  redis_s7026:
    #build:
    #  context: .
    #  dockerfile: Dockerfile
    image:  redis:4.0.8
    container_name: redis_s7026
    command: redis-server /data/redis.cnf 
    #restart: always
    #environment:
    #  MYSQL_ROOT_PASSWORD: 12345678

    ports:
      - 27026:7026
    volumes:
      - /data/data/dir_redis_cluster/s7026:/data
    networks: 
      redisnet:
        ipv4_address: 162.29.0.126
 
  redis_s7027:
    #build:
    #  context: .
    #  dockerfile: Dockerfile
    image:  redis:4.0.8
    container_name: redis_s7027
    command: redis-server /data/redis.cnf 
    #restart: always
    #environment:
    #  MYSQL_ROOT_PASSWORD: 12345678

    ports:
      - 27027:7027
    volumes:
      - /data/data/dir_redis_cluster/s7027:/data
    networks: 
      redisnet:
        ipv4_address: 162.29.0.127
        
  redis_s7028:
    #build:
    #  context: .
    #  dockerfile: Dockerfile
    image:  redis:4.0.8
    container_name: redis_s7028
    command: redis-server /data/redis.cnf 
    #restart: always
    #environment:
    #  MYSQL_ROOT_PASSWORD: 12345678

    ports:
      - 27028:7028
    volumes:
      - /data/data/dir_redis_cluster/s7028:/data
    networks: 
      redisnet:
        ipv4_address: 162.29.0.128

View Code

安装

1. 启动

docker-compose -f compose_redis_cluster.yaml

2. 查看启动是否成功

#删除 docker-compose -f compose_redis_cluster.yaml rm

[root@localhost data]# docker-compose -f compose_redis_cluster.yaml ps 

   Name                  Command               State                 Ports              
----------------------------------------------------------------------------------------
redis_m7026   docker-entrypoint.sh redis ...   Up      6379/tcp, 0.0.0.0:17026->7026/tcp
redis_m7027   docker-entrypoint.sh redis ...   Up      6379/tcp, 0.0.0.0:17027->7027/tcp
redis_m7028   docker-entrypoint.sh redis ...   Up      6379/tcp, 0.0.0.0:17028->7028/tcp
redis_s7026   docker-entrypoint.sh redis ...   Up      6379/tcp, 0.0.0.0:27026->7026/tcp
redis_s7027   docker-entrypoint.sh redis ...   Up      6379/tcp, 0.0.0.0:27027->7027/tcp
redis_s7028   docker-entrypoint.sh redis ...   Up      6379/tcp, 0.0.0.0:27028->7028/tcp

3. 部署前，查看集群的状态

查看集群的状态：
[root@localhost data]# docker exec -it 7c0f844b2998 /bin/bash
root@7c0f844b2998:/data# redis-cli -h 127.0.0.1 -p 7028

127.0.0.1:7028> cluster info 
cluster_state:fail                      ### 状态是失败
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0

5. 执行create 创建集群

#执行后报错，老版本的创建方式
./redis-trib.rb create --replicas 1 162.29.0.26:7026 162.29.0.27:7027 162.29.0.28:7028 162.29.0.126:7026   162.29.0.127:7027  162.29.0.128:7028 

WARNING: redis-trib.rb is not longer available!
You should use redis-cli instead.

All commands and features belonging to redis-trib.rb have been moved
to redis-cli.
In order to use them you should call redis-cli with the --cluster
option followed by the subcommand name, arguments and options.

Use the following syntax:
redis-cli --cluster SUBCOMMAND [ARGUMENTS] [OPTIONS]

Example:
redis-cli --cluster create 162.29.0.26:7026 162.29.0.27:7027 162.29.0.28:7028 162.29.0.126:7026 162.29.0.26:7026 162.29.0.127:7027 162.29.0.128:7028 --cluster-replicas 1

To get help about all subcommands, type:
redis-cli --cluster help

[root@localhost data]# redis-cli --cluster help

#本例中，用新方式创建
[root@localhost data]# redis-cli --cluster create 162.29.0.26:7026 162.29.0.27:7027 162.29.0.28:7028 162.29.0.126:7026  162.29.0.127:7027 162.29.0.128:7028 --cluster-replicas 1 -a redis 
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 162.29.0.127:7027 to 162.29.0.26:7026
Adding replica 162.29.0.128:7028 to 162.29.0.27:7027
Adding replica 162.29.0.126:7026 to 162.29.0.28:7028
M: ecca804919cd72765814b87c1e91bb5189f9814f 162.29.0.26:7026
   slots:[0-5460] (5461 slots) master
M: c96261692b546c5d9b5a45b8596c65461b6acd1a 162.29.0.27:7027
   slots:[5461-10922] (5462 slots) master
M: b4776a521a9c88c5eef4c76822a33fb0e23a4fac 162.29.0.28:7028
   slots:[10923-16383] (5461 slots) master
S: 868cfd0950e603077616c11fa54b457bffe170ba 162.29.0.126:7026
   replicates b4776a521a9c88c5eef4c76822a33fb0e23a4fac
S: e00e5d5403fa099491883059a0e4ebc9cd9a801e 162.29.0.127:7027
   replicates ecca804919cd72765814b87c1e91bb5189f9814f
S: 2632e6bee15a0aac561c797d882e3f1215f1ff33 162.29.0.128:7028
   replicates c96261692b546c5d9b5a45b8596c65461b6acd1a
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
......
>>> Performing Cluster Check (using node 162.29.0.26:7026)
M: ecca804919cd72765814b87c1e91bb5189f9814f 162.29.0.26:7026
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: c96261692b546c5d9b5a45b8596c65461b6acd1a 162.29.0.27:7027
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 868cfd0950e603077616c11fa54b457bffe170ba 162.29.0.126:7026
   slots: (0 slots) slave
   replicates b4776a521a9c88c5eef4c76822a33fb0e23a4fac
S: e00e5d5403fa099491883059a0e4ebc9cd9a801e 162.29.0.127:7027
   slots: (0 slots) slave
   replicates ecca804919cd72765814b87c1e91bb5189f9814f
M: b4776a521a9c88c5eef4c76822a33fb0e23a4fac 162.29.0.28:7028
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 2632e6bee15a0aac561c797d882e3f1215f1ff33 162.29.0.128:7028
   slots: (0 slots) slave
   replicates c96261692b546c5d9b5a45b8596c65461b6acd1a
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

6. 检查创建后集群的状态

[root@localhost data]# redis-cli --cluster check 162.29.0.26:7026 -a redis
162.29.0.26:7026 (ecca8049...) -> 0 keys | 5461 slots | 1 slaves.
162.29.0.27:7027 (c9626169...) -> 0 keys | 5462 slots | 1 slaves.
162.29.0.28:7028 (b4776a52...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 162.29.0.26:7026)
M: ecca804919cd72765814b87c1e91bb5189f9814f 162.29.0.26:7026
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: c96261692b546c5d9b5a45b8596c65461b6acd1a 162.29.0.27:7027
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 868cfd0950e603077616c11fa54b457bffe170ba 162.29.0.126:7026
   slots: (0 slots) slave
   replicates b4776a521a9c88c5eef4c76822a33fb0e23a4fac
S: e00e5d5403fa099491883059a0e4ebc9cd9a801e 162.29.0.127:7027
   slots: (0 slots) slave
   replicates ecca804919cd72765814b87c1e91bb5189f9814f
M: b4776a521a9c88c5eef4c76822a33fb0e23a4fac 162.29.0.28:7028
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 2632e6bee15a0aac561c797d882e3f1215f1ff33 162.29.0.128:7028
   slots: (0 slots) slave
   replicates c96261692b546c5d9b5a45b8596c65461b6acd1a
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.


[root@localhost data]# redis-cli --cluster info  162.29.0.26:7026 -a redis
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
162.29.0.26:7026 (ecca8049...) -> 0 keys | 5461 slots | 1 slaves.
162.29.0.27:7027 (c9626169...) -> 0 keys | 5462 slots | 1 slaves.
162.29.0.28:7028 (b4776a52...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.

7. 登录一台redis节点，查看集群信息

[root@localhost data]# redis-cli -h 162.29.0.26 -p 7026 -a redis
162.29.0.26:7026> info cluster
# Cluster
cluster_enabled:1
162.29.0.26:7026> 
162.29.0.26:7026> cluster info 
cluster_state:ok                  ###### 集群状态正常
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6            ####### 一共6个节点
cluster_size:3                   ####### 有3个主实例
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:454
cluster_stats_messages_pong_sent:471
cluster_stats_messages_sent:925
cluster_stats_messages_ping_received:466
cluster_stats_messages_pong_received:454
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:925
        
162.29.0.26:7026> cluster nodes
ecca804919cd72765814b87c1e91bb5189f9814f 162.29.0.26:7026@17026 myself,master - 0 1595960257000 1 connected 0-5460
c96261692b546c5d9b5a45b8596c65461b6acd1a 162.29.0.27:7027@17027 master - 0 1595960259352 2 connected 5461-10922
868cfd0950e603077616c11fa54b457bffe170ba 162.29.0.126:7026@17026 slave b4776a521a9c88c5eef4c76822a33fb0e23a4fac 0 1595960258000 4 connected
e00e5d5403fa099491883059a0e4ebc9cd9a801e 162.29.0.127:7027@17027 slave ecca804919cd72765814b87c1e91bb5189f9814f 0 1595960259000 5 connected
b4776a521a9c88c5eef4c76822a33fb0e23a4fac 162.29.0.28:7028@17028 master - 0 1595960257334 3 connected 10923-16383
2632e6bee15a0aac561c797d882e3f1215f1ff33 162.29.0.128:7028@17028 slave c96261692b546c5d9b5a45b8596c65461b6acd1a 0 1595960257000 6 connected

新加节点测试

增加节点，复制配置文件，重新启动两个节点：

docker run -itd  --name addm7026 -v /data/data/dir_redis_cluster/add_m7026:/data  --net redisnet -p 17029:7026 --ip 162.29.0.29 redis:4.0.8 redis-server /data/redis.cnf

docker run -itd  --name adds7026 -v /data/data/dir_redis_cluster/add_s7026:/data  --net redisnet -p 27029:7026 --ip 162.29.0.129 redis:4.0.8 redis-server /data/redis.cnf

添加主节点162.29.0.29：7026

####### 老版本添加方式
#./redis-trib.rb add-node 192.168.100.134:17022 192.168.100.134:17021

[root@localhost add_s7026]# redis-cli --cluster add-node 162.29.0.29:7026 162.29.0.26:7026 -a redis
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Adding node 162.29.0.29:7026 to cluster 162.29.0.26:7026

>>> Send CLUSTER MEET to node 162.29.0.29:7026 to make it join the cluster.
[OK] New node added correctly.

#查看新加的节点 
[root@localhost add_s7026]# redis-cli --cluster info  162.29.0.26:7026 -a redis
162.29.0.26:7026 (ecca8049...) -> 0 keys | 5461 slots | 1 slaves.
162.29.0.27:7027 (c9626169...) -> 0 keys | 5462 slots | 1 slaves.
162.29.0.29:7026 (addfe99e...) -> 0 keys | 0 slots | 0 slaves.
162.29.0.28:7028 (b4776a52...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 4 masters.
0.00 keys per slot on average.

添加从节点162.29.0.129：7026 为刚添加节点的从库， cluster-master-id 可以用redis-cli --cluster check 来查看

####### 老版本添加方式
#./redis-trib.rb add-node --slave --master-id 7fa64d250b595d8ac21a42477af5ac8c07c35d83 192.168.100.135:17022 192.168.100.134:17021


[root@localhost add_s7026]# redis-cli --cluster add-node 162.29.0.129:7026 162.29.0.26:7026 -a redis --cluster-slave --cluster-master-id  addfe99eb1187be11d474b0034e59081a418db0f
>>> Adding node 162.29.0.129:7026 to cluster 162.29.0.26:7026

>>> Send CLUSTER MEET to node 162.29.0.129:7026 to make it join the cluster.
Waiting for the cluster to join

>>> Configure node as replica of 162.29.0.29:7026.

迁移部分数据到新节点

[root@localhost add_s7026]# redis-cli --cluster reshard  162.29.0.26:7026 -a redis

How many slots do you want to move (from 1 to 16384)? 10                    #####输入要迁移的slot数
What is the receiving node ID? addfe99eb1187be11d474b0034e59081a418db0f     ###### 输入新节点的id 
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: all                                                         ###### all表示从其他的master节点中一共迁移10个slot到新节点，每个master节点各迁移几个

Ready to move 10 slots.
  Source nodes:
    M: ecca804919cd72765814b87c1e91bb5189f9814f 162.29.0.26:7026
       slots:[0-5460] (5461 slots) master
       1 additional replica(s)
    M: c96261692b546c5d9b5a45b8596c65461b6acd1a 162.29.0.27:7027
       slots:[5461-10922] (5462 slots) master
       1 additional replica(s)
    M: b4776a521a9c88c5eef4c76822a33fb0e23a4fac 162.29.0.28:7028
       slots:[10923-16383] (5461 slots) master
       1 additional replica(s)
  Destination node:
    M: addfe99eb1187be11d474b0034e59081a418db0f 162.29.0.29:7026
       slots: (0 slots) master
       1 additional replica(s)
  Resharding plan:
    Moving slot 5461 from c96261692b546c5d9b5a45b8596c65461b6acd1a
    Moving slot 5462 from c96261692b546c5d9b5a45b8596c65461b6acd1a
    Moving slot 5463 from c96261692b546c5d9b5a45b8596c65461b6acd1a
    Moving slot 5464 from c96261692b546c5d9b5a45b8596c65461b6acd1a
    Moving slot 0 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 1 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 2 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 10923 from b4776a521a9c88c5eef4c76822a33fb0e23a4fac
    Moving slot 10924 from b4776a521a9c88c5eef4c76822a33fb0e23a4fac
    Moving slot 10925 from b4776a521a9c88c5eef4c76822a33fb0e23a4fac
Do you want to proceed with the proposed reshard plan (yes/no)? yes
　　Moving slot 5461 from 162.29.0.27:7027 to 162.29.0.29:7026: 
　　Moving slot 5462 from 162.29.0.27:7027 to 162.29.0.29:7026: 
　　Moving slot 5463 from 162.29.0.27:7027 to 162.29.0.29:7026: 
　　Moving slot 5464 from 162.29.0.27:7027 to 162.29.0.29:7026: 
　　Moving slot 0 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 1 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 2 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 10923 from 162.29.0.28:7028 to 162.29.0.29:7026: 
　　Moving slot 10924 from 162.29.0.28:7028 to 162.29.0.29:7026: 
　　Moving slot 10925 from 162.29.0.28:7028 to 162.29.0.29:7026: 

[root@localhost add_s7026]# redis-cli --cluster reshard  162.29.0.26:7026 -a redis

How many slots do you want to move (from 1 to 16384)? 10
What is the receiving node ID? addfe99eb1187be11d474b0034e59081a418db0f
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: ecca804919cd72765814b87c1e91bb5189f9814f                  ###### 输入源节点id，在输入done。表示从该节点迁移10个slot到新节点
Source node #2: done

Ready to move 10 slots.
  Source nodes:
    M: ecca804919cd72765814b87c1e91bb5189f9814f 162.29.0.26:7026
       slots:[3-5460] (5458 slots) master
       1 additional replica(s)
  Destination node:
    M: addfe99eb1187be11d474b0034e59081a418db0f 162.29.0.29:7026
       slots:[0-2],[5461-5464],[10923-10925] (10 slots) master
       1 additional replica(s)
  Resharding plan:
    Moving slot 3 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 4 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 5 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 6 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 7 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 8 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 9 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 10 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 11 from ecca804919cd72765814b87c1e91bb5189f9814f
    Moving slot 12 from ecca804919cd72765814b87c1e91bb5189f9814f
Do you want to proceed with the proposed reshard plan (yes/no)? yes
　　Moving slot 3 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 4 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 5 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 6 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 7 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 8 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 9 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 10 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 11 from 162.29.0.26:7026 to 162.29.0.29:7026: 
　　Moving slot 12 from 162.29.0.26:7026 to 162.29.0.29:7026:

此时数据分布不均匀，可以reblance一下

[root@localhost add_s7026]# redis-cli --cluster rebalance  162.29.0.26:7026 -a redis

>>> Rebalancing across 4 nodes. Total weight = 4.00
Moving 1362 slots from 162.29.0.28:7028 to 162.29.0.29:7026
Moving 1362 slots from 162.29.0.27:7027 to 162.29.0.29:7026
Moving 1352 slots from 162.29.0.26:7026 to 162.29.0.29:7026

集群故障转移

1. 模拟设备宕机后，自动故障转移

kill掉其中一个实例：

### 模拟主库master3故障
[root@localhost add_s7026]# docker kill 92595c9ca5e0

### 查看日志

## master1日志
[root@localhost dir_redis_cluster]# tail -f m7026/data/redis_1.log 

1:M 29 Jul 10:38:30.696 * Marking node b4776a521a9c88c5eef4c76822a33fb0e23a4fac as failing (quorum reached).
1:M 29 Jul 10:38:30.696 # Cluster state changed: fail
1:M 29 Jul 10:38:31.510 # Failover auth granted to 868cfd0950e603077616c11fa54b457bffe170ba for epoch 8
1:M 29 Jul 10:38:31.514 # Cluster state changed: ok

## master2 日志
[root@localhost dir_redis_cluster]# tail -f m7027/data/redis_1.log 

1:M 29 Jul 10:38:30.698 * FAIL message received from ecca804919cd72765814b87c1e91bb5189f9814f about b4776a521a9c88c5eef4c76822a33fb0e23a4fac
1:M 29 Jul 10:38:30.698 # Cluster state changed: fail
1:M 29 Jul 10:38:31.510 # Failover auth granted to 868cfd0950e603077616c11fa54b457bffe170ba for epoch 8
1:M 29 Jul 10:38:31.515 # Cluster state changed: ok

## slave1 日志
[root@localhost dir_redis_cluster]# tail -f s7026/data/redis_1.log 

1:S 29 Jul 10:38:30.696 * FAIL message received from ecca804919cd72765814b87c1e91bb5189f9814f about b4776a521a9c88c5eef4c76822a33fb0e23a4fac
1:S 29 Jul 10:38:30.696 # Cluster state changed: fail
1:S 29 Jul 10:38:30.794 # Start of election delayed for 646 milliseconds (rank #0, offset 13832).
1:S 29 Jul 10:38:31.508 # Starting a failover election for epoch 8.
1:S 29 Jul 10:38:31.511 # Failover election won: I'm the new master.
1:S 29 Jul 10:38:31.511 # configEpoch set to 8 after successful failover
1:M 29 Jul 10:38:31.511 # Setting secondary replication ID to adc9070f7647046e340d29e4808dec78a3fa68ba, valid up to offset: 13833. New replication ID is fc257229c6d869841043ee7aefc9af00e656eb57
1:M 29 Jul 10:38:31.511 * Discarding previously cached master state.
1:M 29 Jul 10:38:31.511 # Cluster state changed: ok

## salve2 日志
[root@localhost dir_redis_cluster]# tail -f s7027/data/redis_1.log 

1:S 29 Jul 10:38:30.698 * FAIL message received from ecca804919cd72765814b87c1e91bb5189f9814f about b4776a521a9c88c5eef4c76822a33fb0e23a4fac
1:S 29 Jul 10:38:30.698 # Cluster state changed: fail
1:S 29 Jul 10:38:31.513 # Cluster state changed: ok

## salve3 日志
[root@localhost dir_redis_cluster]# tail -f s7028/data/redis_1.log 

1:S 29 Jul 10:38:30.697 * FAIL message received from ecca804919cd72765814b87c1e91bb5189f9814f about b4776a521a9c88c5eef4c76822a33fb0e23a4fac
1:S 29 Jul 10:38:30.697 # Cluster state changed: fail
1:S 29 Jul 10:38:31.513 # Cluster state changed: ok

查看集群状态：

[root@localhost add_s7026]# redis-cli --cluster check   162.29.0.26:7026 -a redis
Could not connect to Redis at 162.29.0.28:7028: No route to host
162.29.0.26:7026 (ecca8049...) -> 0 keys | 4096 slots | 1 slaves.
162.29.0.27:7027 (c9626169...) -> 0 keys | 4096 slots | 1 slaves.
162.29.0.29:7026 (addfe99e...) -> 0 keys | 4096 slots | 1 slaves.
162.29.0.126:7026 (868cfd09...) -> 0 keys | 4096 slots | 0 slaves.   ### 0.126 切换为主节点
[OK] 0 keys in 4 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 162.29.0.26:7026)
M: ecca804919cd72765814b87c1e91bb5189f9814f 162.29.0.26:7026
   slots:[1365-5460] (4096 slots) master
   1 additional replica(s)
S: c7d5176346824f59cd01e481e4fd69fb5e015eed 162.29.0.129:7026
   slots: (0 slots) slave
   replicates addfe99eb1187be11d474b0034e59081a418db0f
S: e00e5d5403fa099491883059a0e4ebc9cd9a801e 162.29.0.127:7027
   slots: (0 slots) slave
   replicates ecca804919cd72765814b87c1e91bb5189f9814f
M: c96261692b546c5d9b5a45b8596c65461b6acd1a 162.29.0.27:7027
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
M: addfe99eb1187be11d474b0034e59081a418db0f 162.29.0.29:7026
   slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
   1 additional replica(s)
M: 868cfd0950e603077616c11fa54b457bffe170ba 162.29.0.126:7026
   slots:[12288-16383] (4096 slots) master
S: 2632e6bee15a0aac561c797d882e3f1215f1ff33 162.29.0.128:7028
   slots: (0 slots) slave
   replicates c96261692b546c5d9b5a45b8596c65461b6acd1a
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

重新启动原来的主实例，自动加入集群。变成了slave节点

2. 手动故障转移

指定从节点发起转移流程，主从节点角色进行切换，从节点变为新的主节点对外提供服务，旧的主节点变为它的从节点
cluster failover -- 手动提升从为主节点
·cluster failover force ——用于当主节点宕机且无法自动完成故障转移情况。从节点接到cluster failover force请求时，从节点直接发起选举，不再跟主节点确认复制偏移量（从节点复制延迟的数据会丢失），当从节点选举成功后替换为新的主节点并广播集群配置。

cluster failover takeover ——用于集群内超过一半以上主节点故障的场景，因为从节点无法收到半数以上主节点投票，所以无法完成选举过程。可以执行cluster failover takeover强制转移，takeover故障转移由于没有通过领导者选举发起故障转移，会导致配置纪元存在冲突的可能

手动故障转移时，在满足当前需求的情况下建议优先级：cluster failver>cluster failover force>cluster failover takeover

## 登录从节点0.28：7028 执行手动切换
162.29.0.28:7028> cluster failover

## 查看集群状态

[root@localhost add_s7026]# redis-cli --cluster check   162.29.0.26:7026 -a redis

162.29.0.26:7026 (ecca8049...) -> 0 keys | 4096 slots | 1 slaves.
162.29.0.27:7027 (c9626169...) -> 0 keys | 4096 slots | 1 slaves.
162.29.0.29:7026 (addfe99e...) -> 0 keys | 4096 slots | 1 slaves.
162.29.0.28:7028 (b4776a52...) -> 0 keys | 4096 slots | 1 slaves.     ### 0.28：7028 变成了主节点
[OK] 0 keys in 4 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 162.29.0.26:7026)
M: ecca804919cd72765814b87c1e91bb5189f9814f 162.29.0.26:7026
   slots:[1365-5460] (4096 slots) master
   1 additional replica(s)
S: c7d5176346824f59cd01e481e4fd69fb5e015eed 162.29.0.129:7026
   slots: (0 slots) slave
   replicates addfe99eb1187be11d474b0034e59081a418db0f
S: e00e5d5403fa099491883059a0e4ebc9cd9a801e 162.29.0.127:7027
   slots: (0 slots) slave
   replicates ecca804919cd72765814b87c1e91bb5189f9814f
M: c96261692b546c5d9b5a45b8596c65461b6acd1a 162.29.0.27:7027
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
M: addfe99eb1187be11d474b0034e59081a418db0f 162.29.0.29:7026
   slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
   1 additional replica(s)
S: 868cfd0950e603077616c11fa54b457bffe170ba 162.29.0.126:7026
   slots: (0 slots) slave
   replicates b4776a521a9c88c5eef4c76822a33fb0e23a4fac
S: 2632e6bee15a0aac561c797d882e3f1215f1ff33 162.29.0.128:7028
   slots: (0 slots) slave
   replicates c96261692b546c5d9b5a45b8596c65461b6acd1a
M: b4776a521a9c88c5eef4c76822a33fb0e23a4fac 162.29.0.28:7028
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

下线集群节点

### 删除集群节点 del-node ip:port <node_id>：只能删除没有分配slot的节点，从集群中删出之后直接关闭实例

### 删除一个从节点
[root@localhost add_s7026]# redis-cli --cluster del-node   162.29.0.26:7026 e00e5d5403fa099491883059a0e4ebc9cd9a801e  -a redis

>>> Removing node e00e5d5403fa099491883059a0e4ebc9cd9a801e from cluster 162.29.0.26:7026
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.

### check 验证已下线 
[root@localhost add_s7026]# redis-cli --cluster check   162.29.0.26:7026 -a redis

###  把刚才的节点加进来，此时加进来会是主节点 
redis-cli --cluster add-node  162.29.0.127:7027 162.29.0.26:7026 -a redis

## cluster replicate 手动改成某节点的从节点。或者加进来的时候已从节点的形式加进来

redis-cli -h 162.29.0.127 -p 7027 -a redis
>cluster replicate ecca804919cd72765814b87c1e91bb5189f9814f

cluster 命令的详解：

CLUSTER info：打印集群的信息。
CLUSTER nodes：列出集群当前已知的所有节点（node）的相关信息。
CLUSTER meet <ip> <port>：将ip和port所指定的节点添加到集群当中。
CLUSTER addslots <slot> [slot ...]：将一个或多个槽（slot）指派（assign）给当前节点。
CLUSTER delslots <slot> [slot ...]：移除一个或多个槽对当前节点的指派。
CLUSTER slots：列出槽位、节点信息。
CLUSTER slaves <node_id>：列出指定节点下面的从节点信息。
CLUSTER replicate <node_id>：将当前节点设置为指定节点的从节点。
CLUSTER saveconfig：手动执行命令保存保存集群的配置文件，集群默认在配置修改的时候会自动保存配置文件。
CLUSTER keyslot <key>：列出key被放置在哪个槽上。
CLUSTER flushslots：移除指派给当前节点的所有槽，让当前节点变成一个没有指派任何槽的节点。
CLUSTER countkeysinslot <slot>：返回槽目前包含的键值对数量。
CLUSTER getkeysinslot <slot> <count>：返回count个槽中的键。

CLUSTER setslot <slot> node <node_id> 将槽指派给指定的节点，如果槽已经指派给另一个节点，那么先让另一个节点删除该槽，然后再进行指派。  
CLUSTER setslot <slot> migrating <node_id> 将本节点的槽迁移到指定的节点中。  
CLUSTER setslot <slot> importing <node_id> 从 node_id 指定的节点中导入槽 slot 到本节点。  
CLUSTER setslot <slot> stable 取消对槽 slot 的导入（import）或者迁移（migrate）。 

CLUSTER failover：手动进行故障转移。
CLUSTER forget <node_id>：从集群中移除指定的节点，这样就无法完成握手，过期时为60s，60s后两节点又会继续完成握手。
CLUSTER reset [HARD|SOFT]：重置集群信息，soft是清空其他节点的信息，但不修改自己的id，hard还会修改自己的id，不传该参数则使用soft方式。

CLUSTER count-failure-reports <node_id>：列出某个节点的故障报告的长度。
CLUSTER SET-CONFIG-EPOCH：设置节点epoch，只有在节点加入集群前才能设置。

posted @ 2020-07-29 16:48 故穿庭树作飞花阅读(363) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

故穿庭树作飞花

使用Docker部署Redis Cluster 高可用测试环境

背景：

环境及配置：

安装

新加节点测试

集群故障转移

1. 模拟设备宕机后，自动故障转移

2. 手动故障转移

下线集群节点

公告