mysql从节点自动故障转移(mysql8.0)mysql Async Replication Auto failover

【1】概念

(1.1)本文说的自动故障转移到底是什么?

从MySQL 8.0.23开始,复制结构中,增加了从节点自动故障转移功能。

这个自动故障转移的概念得说清楚不然有歧义;

假设有 ABC,A为主,B/C 为从节点,那么当A出问题后,新主库变成了B,那么C会自动把主库变成B,这就是我们本文中的自动故障转移;

 

(1.2)测试

     从MySQL 8.0.23开始,复制结构中,增加了从节点自动故障转移功能。测试一下起功能:

     用dbdeployer快速安装测试环境,我这边安装的是:dbdeployer1.8.5   

复制代码
-- 安装脚本
dbdeployer deploy replication 8.0.25 --bind-address=0.0.0.0 --base-port=18610 --sandbox-home=/data1/db/mydb --sandbox-binary=/usr/local  
--native-auth-plugin --db-password="cc.123" --db-user="sa" --remote-access="%"
--post-grants-sql="grant all on *.* to 'sa'@'%' WITH GRANT OPTION " --gtid --nodes=4 ; --删除脚本 dbdeployer delete rsandbox_8_0_25 --sandbox-home=/data1/db/mydb --sandbox-binary=/usr/local;
复制代码

    安装1主3从库,基于gtid的主从库。

        主库:端口18611;3个从库: 18612,18613,18614

     stop slave,配置自动转移:SOURCE_CONNECTION_AUTO_FAILOVER=1

复制代码
change master to SOURCE_HOST='127.0.0.1',
  SOURCE_USER='sa',
  SOURCE_PASSWORD='cc.123',
  SOURCE_PORT=18611,
  SOURCE_AUTO_POSITION=1,
SOURCE_CONNECTION_AUTO_FAILOVER = 1
复制代码

  再start slave,查看主从同步正常后:

      在18613执行:

SELECT asynchronous_connection_failover_add_source('', '127.0.0.1', 18612, '', 100);

    查看结果:

   select * from mysql.replication_asynchronous_connection_failover

    

   表示已经设置完成,如果要删除从节点自动故障转移,可以执行:  

select asynchronous_connection_failover_delete_source('', '127.0.0.1', 18612, '');

   cd到主库目录:执行./stop,模拟主库故障,再看18613的slave status,:

  

      完成了主库故障,从库漂移到指定从库做主库。

【2】实践参考

(2.1)创建复制通道

CHANGE REPLICATION SOURCE TO
MASTER_HOST='172.16.16.10', MASTER_PORT=3306, 
MASTER_USER='repl', MASTER_PASSWORD='repl',
MASTER_AUTO_POSITION=1,
SOURCE_CONNECTION_AUTO_FAILOVER=1, #这里是关键,表示开启自动故障转移
MASTER_RETRY_COUNT=3, #最多重试3次
MASTER_CONNECT_RETRY=10 #每次重试间隔10秒
FOR CHANNEL 'MGR-A';

#简单解释下几个参数
- SOURCE_CONNECTION_AUTO_FAILOVER=1  #这里是关键,表示开启自动故障转移
- MASTER_RETRY_COUNT=3  #表示最多重试3次,默认是是86400次
- MASTER_CONNECT_RETRY=10 #表示每次重试间隔10秒,默认是60秒

核验复制通道效果

SELECT * FROM performance_schema.replication_applier_status\G
*************************** 1. row ***************************
              CHANNEL_NAME: mgr-a
             SERVICE_STATE: ON
           REMAINING_DELAY: NULL
COUNT_TRANSACTIONS_RETRIES: 0
*************************** 2. row ***************************
              CHANNEL_NAME: group_replication_applier
             SERVICE_STATE: ON
           REMAINING_DELAY: NULL
COUNT_TRANSACTIONS_RETRIES: 0

(2.2)为复制通道添加多个复制源

[root@GreatSQL mgrB-1][(none)]> SELECT asynchronous_connection_failover_add_source('MGR-A','172.16.16.10',3306,null,60);
[root@GreatSQL mgrB-1][(none)]> SELECT asynchronous_connection_failover_add_source('MGR-A','172.16.16.11',3306,null,60);
[root@GreatSQL mgrB-1][(none)]> SELECT asynchronous_connection_failover_add_source('MGR-A','172.16.16.12',3306,null,60);

#简单解释下几个参数
MGR-A  #表示复制通道,和上面的复制通道同名
172.16.16.10 #表示该复制源的IP
3306 #表示该复制源的端口
null #表示network_namespace,未来的特性,现在先放空即可
60 #表示该复制源的权重,上面我们介绍了不同权重的作用,值越大越有机会抢到成为复制源

确认复制源生效:

[root@GreatSQL mgrB-1][(none)]> SELECT * FROM performance_schema.replication_asynchronous_connection_failover;
+--------------+--------------+------+-------------------+--------+--------------+
| CHANNEL_NAME | HOST         | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME |
+--------------+--------------+------+-------------------+--------+--------------+
| mgr-a        | 172.16.16.10 | 3306 |                   |     60 |              |
| mgr-a        | 172.16.16.11 | 3306 |                   |     60 |              |
| mgr-a        | 172.16.16.12 | 3306 |                   |     60 |              |
+--------------+--------------+------+-------------------+--------+--------------+

启动该复制通道

[root@GreatSQL mgrB-1][(none)]> START REPLICA FOR CHANNEL 'MGR-A';

确认复制通道和 MGR状态:

  SELECT * FROM performance_schema.replication_connection_status\G

[root@GreatSQL mgrB-1][(none)]> SELECT * FROM performance_schema.replication_connection_status\G
*************************** 1. row ***************************
                                      CHANNEL_NAME: mgr-a
                                        GROUP_NAME:
                                       SOURCE_UUID: b084f8a1-96a8-11eb-9a70-525400fb993a
                                         THREAD_ID: 3084
                                     SERVICE_STATE: ON
                         COUNT_RECEIVED_HEARTBEATS: 5974
                          LAST_HEARTBEAT_TIMESTAMP: 2021-05-29 18:53:13.879720
                          RECEIVED_TRANSACTION_SET: 476c0276-be03-11eb-bd34-525400e802e2:21-31:1000016-1000017
                                 LAST_ERROR_NUMBER: 0
                                LAST_ERROR_MESSAGE:
                              LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                           LAST_QUEUED_TRANSACTION: 476c0276-be03-11eb-bd34-525400e802e2:31
 LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2021-05-27 17:19:43.201000
     LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP: 2021-05-27 17:19:43.203315
       LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP: 2021-05-27 17:19:43.203349
                              QUEUEING_TRANSACTION:
    QUEUEING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
   QUEUEING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        QUEUEING_TRANSACTION_START_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 2. row ***************************
                                      CHANNEL_NAME: group_replication_applier
                                        GROUP_NAME: f195537d-19ac-11eb-b29f-5254002eb6d6
                                       SOURCE_UUID: f195537d-19ac-11eb-b29f-5254002eb6d6
                                         THREAD_ID: NULL
                                     SERVICE_STATE: ON
                         COUNT_RECEIVED_HEARTBEATS: 0
                          LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00.000000
                          RECEIVED_TRANSACTION_SET: 476c0276-be03-11eb-bd34-525400e802e2:1-31:1000015-1000017,
f195537d-19ac-11eb-b29f-5254002eb6d6:1-18
                                 LAST_ERROR_NUMBER: 0
                                LAST_ERROR_MESSAGE:
                              LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                           LAST_QUEUED_TRANSACTION: f195537d-19ac-11eb-b29f-5254002eb6d6:18
 LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
     LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP: 2021-05-27 17:04:03.407281
       LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP: 2021-05-27 17:04:03.407317
                              QUEUEING_TRANSACTION:
    QUEUEING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
   QUEUEING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        QUEUEING_TRANSACTION_START_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000
View Code

执行 SHOW REPLICA STATUS 查看状态:

*************************** 1. row ***************************
             Replica_IO_State: Waiting for master to send event
                  Source_Host: 172.16.16.10
                  Source_User: repl
                  Source_Port: 3306
                Connect_Retry: 10
...
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
...
                  Source_UUID: 5499a6cb-91cb-11eb-966f-525400e802e2
...
           Source_Retry_Count: 3
...
           Retrieved_Gtid_Set: 476c0276-be03-11eb-bd34-525400e802e2:21-31:1000016-1000017
            Executed_Gtid_Set: 476c0276-be03-11eb-bd34-525400e802e2:1-31:1000015-1000017,
f195537d-19ac-11eb-b29f-5254002eb6d6:1-18
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: mgr-a
...               

先记住上面输出结果中的 Source_Host 和 Source_UUID 等信息,下面模拟一次复制源服务器宕机后,自动切换复制源的场景。

(2.3)故障转移模拟,确认可自动切换

在当前复制源服务器上,执行 kill -9 杀掉 mysqld 进程,然后就能看到从服务器上有类似如下日志:

# 先尝试3次(每次间隔10秒)重连旧的复制源服务器
[ERROR] [MY-010584] [Repl] Slave I/O for channel 'mgr-a': error connecting to master 'repl@172.16.16.10:3306' 
- retry-time: 10 retries: 3 message: Can't connect to MySQL server on '172.16.16.10:3306' (111), Error_code: MY-002003 #重试失败,停止复制I/O线程 [Note] [MY-010563] [Repl] Slave I/O thread for channel 'mgr-a' killed while connecting to master [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended.
Please consider using the USER and PASSWORD connection options for START SLAVE; see the
'START SLAVE Syntax' in the MySQL Manual for more information. # 再次启动复制I/O线程,连接到新的复制源服务器 [System] [MY-010562] [Repl] Slave I/O thread for channel 'mgr-a': connected to master 'repl@172.16.16.11:3306',replication started in log 'FIRST' at position 8598
# 告知UUID发生切换了 [Warning] [MY-010549] [Repl] The master
's UUID has changed, although this should not happen unless you have changed it manually. The old UUID was ec2fcbeb-976c-11eb-a652-525400e2078a.

再次执行 SHOW REPLICA STATUS 确认复制源切换了:

*************************** 1. row ***************************
             Replica_IO_State: Waiting for master to send event
                  Source_Host: 172.16.16.11
                  Source_User: repl
                  Source_Port: 3306
                Connect_Retry: 10
...
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
...
                  Source_UUID: ec2fcbeb-976c-11eb-a652-525400e2078a
...
           Source_Retry_Count: 3
...
           Retrieved_Gtid_Set: 476c0276-be03-11eb-bd34-525400e802e2:21-32:1000016-1000017
            Executed_Gtid_Set: 476c0276-be03-11eb-bd34-525400e802e2:1-32:1000015-1000017,
f195537d-19ac-11eb-b29f-5254002eb6d6:1-18
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: mgr-a

因为3个复制源的权重设置为一样,所以当原来的复制源服务器宕机恢复后,不会再切换回去。

而如果旧的复制源服务器权重设置较高的话,当他恢复后,会再次发生切换,切回原来的源:

#没有任何尝试重连的行为,直接发起切换
[Note] [MY-011026] [Repl] Slave I/O thread killed while reading event for channel 'mgr-a'.
[Note] [MY-010570] [Repl] Slave I/O thread exiting for channel 'mgr-a', read up to log 'FIRST', position 8871
[Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
[System] [MY-010562] [Repl] Slave I/O thread for channel 'mgr-a': connected to master 'repl@172.16.16.10:3306',replication started in log 'FIRST' at position 8871
[Warning] [MY-010549] [Repl] The master's UUID has changed, although this should not happen unless you have changed it manually. The old UUID was 5499a6cb-91cb-11eb-966f-525400e802e2. -- 再次切回原来的主

这就很方便的可以实现自动故障转移了。

 

 

【参考文档】

MySQL 8.0.25从节点自动故障转移测试

金融级别的MGR:https://mp.weixin.qq.com/s?__biz=MzkzMTIzMDgwMg==&mid=2247484811&idx=1&sn=358720bef3f9150c2c0a6060fa4b3e0b&chksm=c26f60b6f518e9a0dc344f445d4da69e3eb0a8b42c2713dddd0963fc0d5f13a508d77d669de4&scene=178&cur_album_id=1835785426880512003#rd

posted @ 2022-05-16 15:54  郭大侠1  阅读(1153)  评论(0编辑  收藏  举报