代码改变世界

MySQL 组复制故障恢复的有效策略

2024-08-19 09:30  abce  阅读(228)  评论(0编辑  收藏  举报

没有MGR环境,只是学学别人经验。原文地址:https://www.percona.com/blog/effective-strategies-for-recovering-mysql-group-replication-from-failures/

 

组复制是一种容错/高可用复制拓扑结构,可确保在主节点宕机时,由其他候选成员或辅助成员之一接管,从而使写入和读取操作不间断地继续进行。

 

不过,在某些情况下,由于故障、网络分区或数据库崩溃,组内成员资格可能会被破坏,或者我们最终会发现一些孤立的成员。在这种情况下,我们必须执行部分或全部恢复操作,使整个拓扑结构再次处于活动状态。

假设有下面的情况: 当主节点 [Node1] 正在进行读/写操作时,所有组节点都宕机了。

 

1)下面是使用 dbdeployer 工具设置的组复制拓扑。

node1 [localhost:23637] {root} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
| group_replication_applier | 00023638-2222-2222-2222-222222222222 | 127.0.0.1   |       23638 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
| group_replication_applier | 00023639-3333-3333-3333-333333333333 | 127.0.0.1   |       23639 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.01 sec)

2)在主节点(127.0.0.1:23637)上运行负载

sysbench 
--db-driver=mysql 
--mysql-user=sbtest_user 
--mysql-password=Sbtest@2022 
--mysql-db=sbtest 
--mysql-host=127.0.0.1 
--mysql-port=23637 
--tables=10 
--table-size=10000 
--create_secondary=off 
--threads=50 
--time=0 
--events=0 
--report-interval=1 /opt/homebrew/Cellar/sysbench/1.0.20_6/share/sysbench/oltp_read_write.lua run

运行结果:

[ 1s ] thds: 50 tps: 1726.75 qps: 35178.95 (r/w/o: 24710.96/6964.73/3503.26) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 50 tps: 2528.19 qps: 50605.04 (r/w/o: 35405.76/10142.90/5056.39) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 50 tps: 2316.05 qps: 46257.26 (r/w/o: 32348.96/9290.15/4618.15) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
...

3)在这里,我们特意停止了所有的组节点。

./stop_all 
# executing 'stop' on /Users/aniljoshi/sandboxes/group_msb_8_0_36
executing 'stop' on node3
stop /Users/aniljoshi/sandboxes/group_msb_8_0_36/node3
executing 'stop' on node2
stop /Users/aniljoshi/sandboxes/group_msb_8_0_36/node2
executing 'stop' on node1
stop /Users/aniljoshi/sandboxes/group_msb_8_0_36/node1

至此,群集已完全瘫痪,所有群组成员都已停止工作。现在,我们将再次尝试恢复群集。

引导/恢复群集节点

4) 首先,让我们启动每个节点。由于我使用的是 dbdeployer,我可以通过下面的一些内部命令/脚本来管理(启动/停止)或做其他事情。

shell> ./start_all 
# executing 'start' on /Users/aniljoshi/sandboxes/group_msb_8_0_36
executing "start" on node 1
... sandbox server started
executing "start" on node 2
.. sandbox server started
executing "start" on node 3
.. sandbox server started

5) 连接到每个节点,验证当前状态。

node1 [localhost:23637] {root} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier |           |             |        NULL | OFFLINE      |             |                |                            |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.00 sec)

node2 [localhost:23638] {root} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier |           |             |        NULL | OFFLINE      |             |                |                            |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.01 sec)

node3 [localhost:23639] {root} ((none)) >  select * from performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier |           |             |        NULL | OFFLINE      |             |                |                            |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.01 sec)

此时,所有的节点都是offline状态。

6) 我们可以通过引导群组中的一个节点来恢复群集,然后再启动其他节点后,它们将通过分布式恢复过程加入。但在此之前,了解每个节点上已认证/已提交事务的状态非常重要,这样我们就不会启动错误的节点。启动错误的节点会导致极大的数据丢失!

 

注意:在我们的例子中,“received_transaction_set ”为空,因为所有节点都重新启动了。但是,如果节点没有重启过,只有组复制因网络中断或其他事件而损坏,那么我们也可以观察这部分的值。

ode1 [localhost:23637] {root} ((none)) > SELECT @@GLOBAL.GTID_EXECUTED
    -> ;
+-----------------------------------------------+
| @@GLOBAL.GTID_EXECUTED                        |
+-----------------------------------------------+
| 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148034 |
+-----------------------------------------------+
1 row in set (0.00 sec)

node1 [localhost:23637] {root} ((none)) > SELECT received_transaction_set FROM 
    ->       performance_schema.replication_connection_status WHERE 
    ->       channel_name="group_replication_applier";
+--------------------------+
| received_transaction_set |
+--------------------------+
|                           |
+--------------------------+
1 row in set (0.01 sec)

node2 [localhost:23638] {msandbox} ((none)) > SELECT @@GLOBAL.GTID_EXECUTED;
+----------------------------------------------+
| @@GLOBAL.GTID_EXECUTED                       |
+----------------------------------------------+
| 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-96208 |
+----------------------------------------------+
1 row in set (0.00 sec)


node2 [localhost:23638] {msandbox} ((none)) > SELECT received_transaction_set FROM 
    ->       performance_schema.replication_connection_status WHERE 
    ->       channel_name="group_replication_applier";
+--------------------------+
| received_transaction_set |
+--------------------------+
|                          |
+--------------------------+
1 row in set (0.01 sec)

node3 [localhost:23639] {msandbox} ((none)) > SELECT @@GLOBAL.GTID_EXECUTED;
+----------------------------------------------+
| @@GLOBAL.GTID_EXECUTED                       |
+----------------------------------------------+
| 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-76208 |
+----------------------------------------------+
1 row in set (0.00 sec)


node3 [localhost:23639] {msandbox} ((none)) > SELECT received_transaction_set FROM 
    ->       performance_schema.replication_connection_status WHERE 
    ->       channel_name="group_replication_applier";
+--------------------------+
| received_transaction_set |
+--------------------------+
|                          |
+--------------------------+
1 row in set (0.00 sec)

从这里的 gtid_executed 信息可以看出,Node1(“00023636-bbbb-cccc-ddd-eeeeeeeeeeee:1-148034”) 拥有最新的 GTID ,因此我们应该引导这个节点。

 

请注意:收集完所有组员的事务集后,请对它们进行比较,找出哪个组员的事务集最大,包括已执行事务(gtid_executed)和已认证事务(在 group_replication_applier 通道上)。可以通过查看 GTID 手动进行比较,也可以使用存储函数(GTID_SUBSET 和 GTID_SUBSTRACT)比较 GTID 集。

mysql> SELECT @@GTID_SUBSET();
mysql> SELECT @@GTID_SUBSTRACT();

7) 让我们开始 Node1 的引导过程。

mysql> SET GLOBAL group_replication_bootstrap_group=ON;
mysql> START GROUP_REPLICATION;
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;

现在,如果我们检查 Node1 的状态,就会发现该成员已在线。

node1 [localhost:23637] {msandbox} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.00 sec)

8) 现在,我们只需启动其他节点的组复制即可。如果二进制日志已不存在于任何可用的组节点上,则根据源上可用的二进制日志的状态转移,或通过克隆插件对数据进行完全克隆,通过分布式恢复机制加入节点。

node2 [localhost:23638] {msandbox} ((none)) > start group_replication;

 node2 [localhost:23638] {root} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
| group_replication_applier | 00023638-2222-2222-2222-222222222222 | 127.0.0.1   |       23638 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
2 rows in set (0.00 sec)

node3 [localhost:23639] {root} ((none)) > start group_replication;
Query OK, 0 rows affected (1.36 sec)

node3 [localhost:23639] {root} ((none)) >  select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00023637-1111-1111-1111-111111111111 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
| group_replication_applier | 00023638-2222-2222-2222-222222222222 | 127.0.0.1   |       23638 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
| group_replication_applier | 00023639-3333-3333-3333-333333333333 | 127.0.0.1   |       23639 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.00 sec)

注意:如果二进制日志以某种方式被rotated或清除,那么我们需要从备份中恢复,或者我们可以设置克隆插件,它将自动执行完整的快照过程。

 

此外,我们还可以通过执行以下命令来验证节点是否仍在进行分布式恢复。当恢复完成且节点显示 “SECONDARY”状态时,复制进程 [Slave_IO_Running/Slave_SQL_Running] 将停止。

node2 [localhost:23638] {root} ((none)) > SHOW SLAVE STATUS FOR CHANNEL 'group_replication_recovery'G;

到这个阶段,我们已经有了一个完全运行的集群。

从备份中恢复

在极少数情况下,当数据完全损坏、想要进行部分恢复或添加新节点时,我们可以直接执行备份/恢复活动。

 

让我们试着用下面的场景来理解一下。在此,我们将尝试使用物理备份 [Percona XtraBackup] 恢复群集。

 

1) 为演示目的,我们从群组中的一个节点进行了备份。

shell> xtrabackup --host=127.0.0.1 -usbtest_user -pSbtest@2022 -P23637 --backup --target-dir=/Users/aniljoshi/backup/

备份结果:

...
2024-08-03T19:54:50.102188+05:30 0 [Note] [MY-011825] [Xtrabackup] Transaction log of lsn (266657170) to (266657878) was copied.
2024-08-03T19:54:50.329784+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!

2) 为了让备份有用,我们也要做好 prepare。

shell> xtrabackup --prepare --use-memory=2G --target-dir=/Users/aniljoshi/backup/

输出结果:

...
2024-08-03T19:58:00.432140+05:30 0 [Note] [MY-012980] [InnoDB] Shutdown completed; log sequence number 266658326
2024-08-03T19:58:00.434701+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!

现在,考虑到我们要使用这些数据恢复群集。

3) 首先,我们将清理目标成员 [Node1] 的数据目录。

shell> rm -rf  /Users/aniljoshi/sandboxes/group_msb_8_0_36/node1/data/      

4) 然后,我们将把prepare好的数据复制到 Node1 数据目录下。

shell> cp -aR /Users/aniljoshi/backup/  /Users/aniljoshi/sandboxes/group_msb_8_0_36/node1/data/

5) 引导节点 [Node1]

mysql> SET GLOBAL group_replication_bootstrap_group=ON;
mysql> START GROUP_REPLICATION;
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
 
node1 [localhost:23637] {root} (performance_schema) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 11400bec-51a6-11ef-b950-29b40b1b64a7 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.00 sec)

其他节点可以通过分布式恢复/克隆流程加入(如果已经设置);不过,这里我们将演示如何使用备份执行还原/恢复。

 

6) 让我们从 Node1 获取一个新的备份,并记住 GTID 执行信息。

shell> xtrabackup --host=127.0.0.1 -usbtest_user -pSbtest@2022 -P23637 --backup --target-dir=/Users/aniljoshi/backup/Primary/

备份结果:

...
2024-08-03T20:16:36.182978+05:30 0 [Note] [MY-011825] [Xtrabackup] Transaction log of lsn (266723771) to (266724479) was copied.
2024-08-03T20:16:36.408616+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!

GTID的详细信息:

shell> cat /Users/aniljoshi/backup/Primary/xtrabackup_binlog_info
mysql-bin.000006    193    00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148040

7) prepare备份数据

shell> xtrabackup --prepare --use-memory=2G --target-dir=/Users/aniljoshi/backup/Primary/

输出结果:

...
2024-08-03T20:17:47.817955+05:30 0 [Note] [MY-013072] [InnoDB] Starting shutdown...
2024-08-03T20:17:47.822229+05:30 0 [Note] [MY-013084] [InnoDB] Log background threads are being closed...
2024-08-03T20:17:47.865162+05:30 0 [Note] [MY-012980] [InnoDB] Shutdown completed; log sequence number 266724886
2024-08-03T20:17:47.867836+05:30 0 [Note] [MY-011825] [Xtrabackup] completed OK!

8) 清理 Node2 数据目录,并将prepare好的备份复制到 Node2 上。

shell> cp -aR /Users/aniljoshi/backup/Primary/  /Users/aniljoshi/sandboxes/group_msb_8_0_36/node2/data/

9) 重启服务后,我们就可以验证是否应用了 gtid_purged。

node2 [localhost:23638] {root} ((none)) > show global variables like '%gtid_purged%';
+---------------+-----------------------------------------------+
| Variable_name | Value                                         |
+---------------+-----------------------------------------------+
| gtid_purged   | 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148040 |
+---------------+-----------------------------------------------+
1 row in set (0.01 sec)

否则,我们可以使用下面的命令设置 gtid_purged。

MySQL> set global gtid_purged="00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148040 ";

10) 最后,我们可以建立复制并启动组复制进程。

node2 [localhost:23638] {root} ((none)) > change master to master_user=rsandbox,master_password='rsandbox’ FOR CHANNEL 'group_replication_recovery';

node2 [localhost:23638] {root} ((none)) > start group_replication;
Query OK, 0 rows affected (1.97 sec)

node2 [localhost:23638] {root} ((none)) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 11400bec-51a6-11ef-b950-29b40b1b64a7 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
| group_replication_applier | c2b74588-51a7-11ef-81da-5f8da38bd06f | 127.0.0.1   |       23638 | ONLINE       | SECONDARY     | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
2 rows in set (0.00 sec)

节点 [Node3] 后续也可执行类似步骤。此外,除了 Percona XtraBackup ,我们还可以使用一些逻辑备份/恢复方式,如 [mydumper/mysqldump/MySQL shell 等] 来执行恢复并建立组复制进程。我们在此不对其进行讨论;不过,其过程通常与我们在任何异步复制设置中所做的相同。一旦以预期的 gtid 坐标完成恢复,我们就可以执行下面的命令。

MySQL> change master to master_user=rsandbox,master_password='rsandbox’ FOR CHANNEL 'group_replication_recovery';
MySQL> start group_replication;

此外,自 MySQL 8.0.17 起,克隆插件可用来在组复制中进行分布式恢复的远程克隆操作。这需要执行一些额外的配置和步骤。

通过克隆恢复

在 Donor 节点[Node1]上执行:

1) 动态安装插件

node1 [localhost:23637] {root} ((none)) > install plugin clone soname 'mysql_clone.so';
Query OK, 0 rows affected (0.16 sec)

2) 在数据库中持久化变更

[mysqld]
plugin-load-add=mysql_clone.so
clone=FORCE_PLUS_PERMANENT

3) 授予复制用户"rsandbox" BACKUP_ADMIN 权限,由其负责分布式恢复流程

node1 [localhost:23637] {root} ((none)) > show grants for rsandbox@'%';
+---------------------------------------------+
| Grants for rsandbox@%                       |
+---------------------------------------------+
| GRANT USAGE ON *.* TO `rsandbox`@`%`        |
| GRANT BACKUP_ADMIN ON *.* TO `rsandbox`@`%` |
| GRANT `R_REPLICATION`@`%` TO `rsandbox`@`%` |
+---------------------------------------------+
3 rows in set (0.00 sec)

4) 验证插件状态

node3 [localhost:23639] {root} ((none)) > select plugin_name,plugin_status from information_schema.plugins where plugin_name='clone';
+-------------+---------------+
| plugin_name | plugin_status |
+-------------+---------------+
| clone       | ACTIVE        |
+-------------+---------------+
1 row in set (0.00 sec)

在Recipient节点 [Node3] 上执行:

1) 动态安装插件

node1 [localhost:23637] {root} ((none)) > install plugin clone soname 'mysql_clone.so';
Query OK, 0 rows affected (0.16 sec)

2) 在数据库中持久化变更

[mysqld]
plugin-load-add=mysql_clone.so
clone=FORCE_PLUS_PERMANENT

3) 验证插件状态

node3 [localhost:23639] {root} ((none)) > select plugin_name,plugin_status from information_schema.plugins where plugin_name='clone';
+-------------+---------------+
| plugin_name | plugin_status |
+-------------+---------------+
| clone       | ACTIVE        |
+-------------+---------------+
1 row in set (0.00 sec)

4) 在所有相关/群组节点上定义克隆分布式恢复的阈值。节点将使用增量数据同步,除非事务间隙超过下面的阈值,在这种情况下将运行数据克隆过程。在本次测试中,我们使用了一个很小的值,尽管默认值很大。

node3 [localhost:23637] {root} ((none)) >  set global group_replication_clone_threshold = 10;
Query OK, 0 rows affected (0.00 sec)

5) 最后,启动 Node3。在 MySQL 日志中,我们可以看到通过克隆插件进行的分布式恢复已经开始。该过程将替换或克隆 Donor/Source [Node1] 的整个数据目录。

2024-08-03T18:21:04.039835Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Cloning from a remote group donor.'
2024-08-03T18:21:03.033878Z 0 [Warning] [MY-013469] [Repl] Plugin group_replication reported: 'This member will start distributed recovery using clone. It is due to the number of missing transactions being higher than the configured threshold of 10.'
2024-08-03T18:21:04.150730Z 132 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Started
...
/Users/aniljoshi/opt/mysql/8.0.36/bin/mysqld: Shutdown complete (mysqld 8.0.36)  MySQL Community Server - GPL.
2024-08-03T18:21:09.6NZ mysqld_safe mysqld restarted

6) 接下来,我们可以使用下面的文件和命令跟踪 GTID 的执行和坐标信息。

shell> cat #view_status 
2 1
1722709264045254 1722709271116851
127.0.0.1:23637
0

./mysql-bin.000006
4836
00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148056

或者

node3 [localhost:23639] {root} ((none)) > select * from performance_schema.clone_statusG;
*************************** 1. row ***************************
             ID: 1
            PID: 0
          STATE: Completed
     BEGIN_TIME: 2024-08-03 23:51:04.045
       END_TIME: 2024-08-03 23:51:11.117
         SOURCE: 127.0.0.1:23637
    DESTINATION: LOCAL INSTANCE
       ERROR_NO: 0
  ERROR_MESSAGE: 
    BINLOG_FILE: mysql-bin.000006
BINLOG_POSITION: 4836
  GTID_EXECUTED: 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148056
1 row in set (0.00 sec)



node3 [localhost:23639] {root} ((none)) > show variables like 'gtid_purged';
+---------------+-----------------------------------------------+
| Variable_name | Value                                         |
+---------------+-----------------------------------------------+
| gtid_purged   | 00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-148056 |
+---------------+-----------------------------------------------+
1 row in set (0.01 sec)

否则,我们可以使用下面的命令来设置 gtid_purged,方法是传递上面克隆状态中的gtid。

node3 [localhost:23639] {root} ((none)) > set global gtid_purged="00023636-bbbb-cccc-dddd-eeeeeeeeeeee:1-128056 ";

7) 运行 CHANGE REPLICATION 命令并启动组复制。

node3 [localhost:23639] {root} ((none)) > CHANGE REPLICATION SOURCE TO SOURCE_USER='rsandbox', SOURCE_PASSWORD='rsandbox' FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.01 sec)

node3 [localhost:23639] {root} ((none)) > start group_replication;
Query OK, 0 rows affected (10.69 sec)

8) 最后,我们可以看到 Node3 显示在组复制拓扑中。

node3 [localhost:23639] {root} ((none)) >  select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00023639-3333-3333-3333-333333333333 | 127.0.0.1   |       23639 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
| group_replication_applier | 11400bec-51a6-11ef-b950-29b40b1b64a7 | 127.0.0.1   |       23637 | ONLINE       | PRIMARY     | 8.0.36         | XCom                       |
| group_replication_applier | c2b74588-51a7-11ef-81da-5f8da38bd06f | 127.0.0.1   |       23638 | ONLINE       | SECONDARY   | 8.0.36         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.01 sec)