徒手搭建MGR单主集群并模拟主节点切换

2016年MGR出来的时候,我还在读书,这么多年了,洲际哥忙于干农活,还一直没系统盘玩她,没错,我就是个假DBA

纯实操,不带原理

O(∩_∩)O

环境

服务器ip及host
192.168.0.65	master
192.168.0.185	slave1
192.168.0.32	slave2

系统:
CentOS 7

MySQL版本:
(root@localhost) [(none)]> select version();
+-----------+
| version() |
+-----------+
| 8.0.19    |
+-----------+
1 row in set (0.00 sec)

Ⅰ、实例初始化及参数准备

完成三台机器初始化,保持干净如下:

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+-------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000002 |      151 |              |                  |                   |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

参数准备

除group_replication_local_address参数以外,完全三台机器保持完全一致即可

时间紧张,各个参数就不做注释了...

log_bin = binlog
binlog_format = row
log_slave_updates = 1
binlog_checksum = NONE
binlog_transaction_dependency_tracking = writeset
gtid_mode = 1
enforce_gtid_consistency = 1
master_info_repository= TABLE
relay_log_info_repository= TABLE
transaction_write_set_extraction = XXHASH64
plugin_load_add = 'group_replication.so'
loose-group_replication_group_name = "745c2bc6-9fe4-11ea-8c89-fa163e98606f"    #随机生成一串UUID即可
loose-group_replication_start_on_boot = 0
loose-group_replication_local_address = "192.168.0.65:33061"    #配本机IP
loose-group_replication_group_seeds = "192.168.0.32:33061,192.168.0.65:33061,192.168.0.185:33061"	
loose-group_replication_bootstrap_group = 0

Ⅱ、创建组复制账号并在线加载MGR插件

三台机器操作一致如下:

(root@localhost) [(none)]> set sql_log_bin = 0;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [(none)]> create user rpl_user identified with 'mysql_native_password' by 'rpl_123';
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [(none)]> grant replication slave, backup_admin on *.* to rpl_user;
Query OK, 0 rows affected (0.00 sec)


(root@localhost) [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [(none)]> set sql_log_bin = 1;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [(none)]> change master to master_user='rpl_user', master_password='rpl_123' for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.04 sec)

(root@localhost) [(none)]> install plugin group_replication soname 'group_replication.so';
Query OK, 0 rows affected (0.01 sec)

Ⅲ、搭建集群

3.1 初始化集群

选择任意一台机器即可,这里选择master

换言之这把操作后这台机器是集群的主节点

anyway,操作前后我们观察一下gtid情况

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+-------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000002 |      151 |              |                  |                   |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

(root@localhost) [(none)]> set global group_replication_bootstrap_group = 1;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [(none)]> start group_replication;
Query OK, 0 rows affected (3.17 sec)

(root@localhost) [(none)]> set global group_replication_bootstrap_group = 0;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [(none)]> show master status;                                                                                
+---------------+----------+--------------+------------------+----------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                      |
+---------------+----------+--------------+------------------+----------------------------------------+
| binlog.000002 |      462 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1 |
+---------------+----------+--------------+------------------+----------------------------------------+
1 row in set (0.00 sec)

我们发现产生了一个gtid,先放一放,看下集群情况,gtid的问题稍后看

(root@localhost) [(none)]> select * from performance_schema.replication_group_members;                                                                                                           
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 0d703f6e-a498-11ea-8059-fa163ed6ef91 | master      |        3306 | ONLINE       | PRIMARY     | 8.0.19         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.00 sec)

看到此时master的成员角色primary,状态时online

3.2 添加节点

slave1上操作如下:

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+-------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000002 |      151 |              |                  |                   |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

(root@localhost) [(none)]> start group_replication;
Query OK, 0 rows affected (3.74 sec)

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000002 |      813 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-2 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

看到gtid又+1了

查看节点情况:

(root@localhost) [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 6299f18f-a49e-11ea-b9e2-fa163ed6ef91 | master      |        3306 | ONLINE       | PRIMARY     | 8.0.19         |
| group_replication_applier | 648c91f7-a49e-11ea-bd2b-fa163e6f3efc | slave1      |        3306 | ONLINE       | SECONDARY   | 8.0.19         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.01 sec)

此时两个节点,master时primary,slave1是secondary,两个节点都是online状态

再加一个节点,slave2操作同上,此处略过

完成后,会发现gtid又+1了,如下:

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000002 |     1164 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-3 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

3.3 分析binlog

...

SET @@SESSION.GTID_NEXT= '745c2bc6-9fe4-11ea-8c89-fa163e98606f:1'/*!*/;
# at 233
#200602 15:00:25 server id 5  end_log_pos 295   Query   thread_id=14    exec_time=4     error_code=0
SET TIMESTAMP=1591081225/*!*/;
SET @@session.pseudo_thread_id=14/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1168113696/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=255,@@session.collation_connection=255,@@session.collation_server=255/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
/*!80011 SET @@session.default_collation_for_utf8mb4=255*//*!*/;
BEGIN
/*!*/;
# at 295
#200602 15:00:25 server id 5  end_log_pos 394 View_change_log_event: view_id=15910812280466629:1
# at 394
#200602 15:00:25 server id 5  end_log_pos 462   Query   thread_id=14    exec_time=4     error_code=0
SET TIMESTAMP=1591081225/*!*/;
COMMIT
/*!*/;

...

SET @@SESSION.GTID_NEXT= '745c2bc6-9fe4-11ea-8c89-fa163e98606f:2'/*!*/;
# at 544
#200602 15:00:25 server id 5  end_log_pos 606   Query   thread_id=14    exec_time=50    error_code=0
SET TIMESTAMP=1591081225/*!*/;
BEGIN
/*!*/;
# at 606
#200602 15:00:25 server id 5  end_log_pos 745 View_change_log_event: view_id=15910812280466629:2
# at 745
#200602 15:00:25 server id 5  end_log_pos 813   Query   thread_id=14    exec_time=50    error_code=0
SET TIMESTAMP=1591081225/*!*/;
COMMIT
/*!*/;

...

SET @@SESSION.GTID_NEXT= '745c2bc6-9fe4-11ea-8c89-fa163e98606f:3'/*!*/;
# at 895
#200602 15:00:25 server id 5  end_log_pos 957   Query   thread_id=14    exec_time=265   error_code=0
SET TIMESTAMP=1591081225/*!*/;
BEGIN
/*!*/;
# at 957
#200602 15:00:25 server id 5  end_log_pos 1096 View_change_log_event: view_id=15910812280466629:3
# at 1096
#200602 15:00:25 server id 5  end_log_pos 1164  Query   thread_id=14    exec_time=265   error_code=0
SET TIMESTAMP=1591081225/*!*/;
COMMIT
...

很直观地看到,添加节点会产生一个空事务,初始化节点在执行空事务之前做了一些参数设置

但是,这个空事务中是有一些门道的,这里先不展开,后面讲原理会提到

Ⅳ、模拟切换并写入数据观察gtid

鄙人总结如下:

  • 所有节点online被看作初始状态,一旦出现这个状态,写入数据GTID都记录在第一个GTID区间
  • 达到初始状态之后,当前主节点写入过数据,关闭主节点GR,在新主上写数据会记录到一个新的GTID区间

简单测试搂一眼:

由于三个节点,所有case都模拟全整理出来太冗余,故只做必要测试

case1:
主节点未写入数据,停掉主节点GR,在新主写数据

master:

(root@localhost) [(none)]> stop group_replication;                                                                                                                                               
Query OK, 0 rows affected (4.06 sec)

新主节点slave1:

(root@localhost) [(none)]> select * from performance_schema.replication_group_members;                                                                                                           
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | a6a83b07-a4b0-11ea-ace2-fa163e6f3efc | slave1      |        3306 | ONLINE       | PRIMARY     | 8.0.19         |
| group_replication_applier | a7dbfa5f-a4b0-11ea-b21b-fa163e98606f | slave2      |        3306 | ONLINE       | SECONDARY   | 8.0.19         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.01 sec)

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000002 |     1164 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-3 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

(root@localhost) [(none)]> create database t;
Query OK, 1 row affected (0.01 sec)

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000002 |     1339 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-4 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

发现GTID记录在第一个区间

case2:
主节点写入数据后停掉GR,在新主节点写入数据

此时新主节点slave1已写入数据,停掉GR,在slave2写入数据

slave1:

(root@localhost) [(none)]> stop group_replication;                                                                                                                                               
Query OK, 0 rows affected (4.37 sec)

slave2:

(root@localhost) [(none)]>  select * from performance_schema.replication_group_members;                                                                                                          
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | a7dbfa5f-a4b0-11ea-b21b-fa163e98606f | slave2      |        3306 | ONLINE       | PRIMARY     | 8.0.19         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.00 sec)

(root@localhost) [(none)]> create table t.t(id int primary key);
Query OK, 0 rows affected (0.03 sec)

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+--------------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                |
+---------------+----------+--------------+------------------+--------------------------------------------------+
| binlog.000002 |     1532 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-4:1000003 |
+---------------+----------+--------------+------------------+--------------------------------------------------+
1 row in set (0.00 sec)

(root@localhost) [(none)]> insert into t.t values(1);
Query OK, 1 row affected (0.01 sec)

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+----------------------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                        |
+---------------+----------+--------------+------------------+----------------------------------------------------------+
| binlog.000002 |     1782 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-4:1000003-1000004 |
+---------------+----------+--------------+------------------+----------------------------------------------------------+
1 row in set (0.00 sec)

可以看到gtid记录到了新区间

case3:

保持三个节点都online

此时主节点写入数据GTID是记录在第二段区间

恢复至三个节点

master:

(root@localhost) [(none)]> start group_replication;                                                                                                                                              
Query OK, 0 rows affected (3.87 sec)

slave1:

(root@localhost) [(none)]> start group_replication;                                                                                                                                              
Query OK, 0 rows affected (4.28 sec)

slave2:

(root@localhost) [(none)]> insert into t.t values(2);
Query OK, 1 row affected (0.01 sec)

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+----------------------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                        |
+---------------+----------+--------------+------------------+----------------------------------------------------------+
| binlog.000002 |     2848 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-7:1000003-1000004 |
+---------------+----------+--------------+------------------+----------------------------------------------------------+
1 row in set (0.00 sec)

看到gtid写回了第一个区间,但是为什么是7?前面解释过,加入节点会产生空事务,刚加了俩节点

case4:

连续停两次写入过数据的主节点

此时集群的状态已经到达初始状态,slave2是主节点

预测到最新主上写数据会产生第三个GTID区间

(root@localhost) [(none)]>  select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | a50df330-a4b0-11ea-bfd3-fa163ed6ef91 | master      |        3306 | ONLINE       | SECONDARY   | 8.0.19         |
| group_replication_applier | a6a83b07-a4b0-11ea-ace2-fa163e6f3efc | slave1      |        3306 | ONLINE       | SECONDARY   | 8.0.19         |
| group_replication_applier | a7dbfa5f-a4b0-11ea-b21b-fa163e98606f | slave2      |        3306 | ONLINE       | PRIMARY     | 8.0.19         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)

slave2:

(root@localhost) [(none)]> insert into t.t values(3);
Query OK, 1 row affected (0.00 sec)

(root@localhost) [(none)]> stop group_replication;                                                                                                                                               
Query OK, 0 rows affected (4.14 sec)

master:

(root@localhost) [(none)]> select * from performance_schema.replication_group_members;                                                                                                           
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | a50df330-a4b0-11ea-bfd3-fa163ed6ef91 | master      |        3306 | ONLINE       | PRIMARY     | 8.0.19         |
| group_replication_applier | a6a83b07-a4b0-11ea-ace2-fa163e6f3efc | slave1      |        3306 | ONLINE       | SECONDARY   | 8.0.19         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)
                                                                                                                                                   
(root@localhost) [(none)]> insert into t.t values(4);
Query OK, 1 row affected (0.00 sec)

(root@localhost) [(none)]> stop group_replication;                                                                                                                                               
Query OK, 0 rows affected (4.44 sec)

slave1:

(root@localhost) [(none)]> insert into t.t values(5);
Query OK, 1 row affected (0.02 sec)

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+------------------------------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                |
+---------------+----------+--------------+------------------+------------------------------------------------------------------+
| binlog.000002 |     3582 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-8:1000003-1000005:2000005 |
+---------------+----------+--------------+------------------+------------------------------------------------------------------+
1 row in set (0.00 sec)

(root@localhost) [(none)]> insert into t.t values(6);
Query OK, 1 row affected (0.01 sec)

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+--------------------------------------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                        |
+---------------+----------+--------------+------------------+--------------------------------------------------------------------------+
| binlog.000002 |     3832 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-8:1000003-1000005:2000005-2000006 |
+---------------+----------+--------------+------------------+--------------------------------------------------------------------------+
1 row in set (0.00 sec)

预测正确,写入到了第三个GTID区间

最后,我们再恢复到初始状态看一把

master:

(root@localhost) [(none)]> start group_replication;                                                                                                                                               
Query OK, 0 rows affected (3.66 sec)

slave2:

(root@localhost) [(none)]> start group_replication;                                                                                                                                              
Query OK, 0 rows affected (3.59 sec)

slave1:

(root@localhost) [(none)]> insert into t.t values(7);
Query OK, 1 row affected (0.01 sec)

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                         |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
| binlog.000002 |     4946 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-11:1000003-1000005:2000005-2000006 |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
1 row in set (0.00 sec)

(root@localhost) [(none)]> insert into t.t values(8);
Query OK, 1 row affected (0.02 sec)

(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                         |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
| binlog.000002 |     5196 |              |                  | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-12:1000003-1000005:2000005-2000006 |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
1 row in set (0.00 sec)

又写回到了第一个GTID区间,英吹斯汀!

划重点

  • 添加节点会产生一个空事务
  • 主节点写入过数据并关闭gr后,新主节点写入数据会记录到新GTID区间

over~~~

posted @ 2020-06-03 02:54  91洲际哥  阅读(808)  评论(0编辑  收藏  举报