徒手搭建MGR单主集群并模拟主节点切换
2016年MGR出来的时候,我还在读书,这么多年了,洲际哥忙于干农活,还一直没系统盘玩她,没错,我就是个假DBA
纯实操,不带原理
O(∩_∩)O
环境
服务器ip及host
192.168.0.65 master
192.168.0.185 slave1
192.168.0.32 slave2
系统:
CentOS 7
MySQL版本:
(root@localhost) [(none)]> select version();
+-----------+
| version() |
+-----------+
| 8.0.19 |
+-----------+
1 row in set (0.00 sec)
Ⅰ、实例初始化及参数准备
完成三台机器初始化,保持干净如下:
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000002 | 151 | | | |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
参数准备
除group_replication_local_address参数以外,完全三台机器保持完全一致即可
时间紧张,各个参数就不做注释了...
log_bin = binlog
binlog_format = row
log_slave_updates = 1
binlog_checksum = NONE
binlog_transaction_dependency_tracking = writeset
gtid_mode = 1
enforce_gtid_consistency = 1
master_info_repository= TABLE
relay_log_info_repository= TABLE
transaction_write_set_extraction = XXHASH64
plugin_load_add = 'group_replication.so'
loose-group_replication_group_name = "745c2bc6-9fe4-11ea-8c89-fa163e98606f" #随机生成一串UUID即可
loose-group_replication_start_on_boot = 0
loose-group_replication_local_address = "192.168.0.65:33061" #配本机IP
loose-group_replication_group_seeds = "192.168.0.32:33061,192.168.0.65:33061,192.168.0.185:33061"
loose-group_replication_bootstrap_group = 0
Ⅱ、创建组复制账号并在线加载MGR插件
三台机器操作一致如下:
(root@localhost) [(none)]> set sql_log_bin = 0;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [(none)]> create user rpl_user identified with 'mysql_native_password' by 'rpl_123';
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [(none)]> grant replication slave, backup_admin on *.* to rpl_user;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [(none)]> set sql_log_bin = 1;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [(none)]> change master to master_user='rpl_user', master_password='rpl_123' for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.04 sec)
(root@localhost) [(none)]> install plugin group_replication soname 'group_replication.so';
Query OK, 0 rows affected (0.01 sec)
Ⅲ、搭建集群
3.1 初始化集群
选择任意一台机器即可,这里选择master
换言之这把操作后这台机器是集群的主节点
anyway,操作前后我们观察一下gtid情况
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000002 | 151 | | | |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
(root@localhost) [(none)]> set global group_replication_bootstrap_group = 1;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [(none)]> start group_replication;
Query OK, 0 rows affected (3.17 sec)
(root@localhost) [(none)]> set global group_replication_bootstrap_group = 0;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+----------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+----------------------------------------+
| binlog.000002 | 462 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1 |
+---------------+----------+--------------+------------------+----------------------------------------+
1 row in set (0.00 sec)
我们发现产生了一个gtid,先放一放,看下集群情况,gtid的问题稍后看
(root@localhost) [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 0d703f6e-a498-11ea-8059-fa163ed6ef91 | master | 3306 | ONLINE | PRIMARY | 8.0.19 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.00 sec)
看到此时master的成员角色primary,状态时online
3.2 添加节点
slave1上操作如下:
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000002 | 151 | | | |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
(root@localhost) [(none)]> start group_replication;
Query OK, 0 rows affected (3.74 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000002 | 813 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-2 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)
看到gtid又+1了
查看节点情况:
(root@localhost) [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 6299f18f-a49e-11ea-b9e2-fa163ed6ef91 | master | 3306 | ONLINE | PRIMARY | 8.0.19 |
| group_replication_applier | 648c91f7-a49e-11ea-bd2b-fa163e6f3efc | slave1 | 3306 | ONLINE | SECONDARY | 8.0.19 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.01 sec)
此时两个节点,master时primary,slave1是secondary,两个节点都是online状态
再加一个节点,slave2操作同上,此处略过
完成后,会发现gtid又+1了,如下:
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000002 | 1164 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-3 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)
3.3 分析binlog
...
SET @@SESSION.GTID_NEXT= '745c2bc6-9fe4-11ea-8c89-fa163e98606f:1'/*!*/;
# at 233
#200602 15:00:25 server id 5 end_log_pos 295 Query thread_id=14 exec_time=4 error_code=0
SET TIMESTAMP=1591081225/*!*/;
SET @@session.pseudo_thread_id=14/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1168113696/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=255,@@session.collation_connection=255,@@session.collation_server=255/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
/*!80011 SET @@session.default_collation_for_utf8mb4=255*//*!*/;
BEGIN
/*!*/;
# at 295
#200602 15:00:25 server id 5 end_log_pos 394 View_change_log_event: view_id=15910812280466629:1
# at 394
#200602 15:00:25 server id 5 end_log_pos 462 Query thread_id=14 exec_time=4 error_code=0
SET TIMESTAMP=1591081225/*!*/;
COMMIT
/*!*/;
...
SET @@SESSION.GTID_NEXT= '745c2bc6-9fe4-11ea-8c89-fa163e98606f:2'/*!*/;
# at 544
#200602 15:00:25 server id 5 end_log_pos 606 Query thread_id=14 exec_time=50 error_code=0
SET TIMESTAMP=1591081225/*!*/;
BEGIN
/*!*/;
# at 606
#200602 15:00:25 server id 5 end_log_pos 745 View_change_log_event: view_id=15910812280466629:2
# at 745
#200602 15:00:25 server id 5 end_log_pos 813 Query thread_id=14 exec_time=50 error_code=0
SET TIMESTAMP=1591081225/*!*/;
COMMIT
/*!*/;
...
SET @@SESSION.GTID_NEXT= '745c2bc6-9fe4-11ea-8c89-fa163e98606f:3'/*!*/;
# at 895
#200602 15:00:25 server id 5 end_log_pos 957 Query thread_id=14 exec_time=265 error_code=0
SET TIMESTAMP=1591081225/*!*/;
BEGIN
/*!*/;
# at 957
#200602 15:00:25 server id 5 end_log_pos 1096 View_change_log_event: view_id=15910812280466629:3
# at 1096
#200602 15:00:25 server id 5 end_log_pos 1164 Query thread_id=14 exec_time=265 error_code=0
SET TIMESTAMP=1591081225/*!*/;
COMMIT
...
很直观地看到,添加节点会产生一个空事务,初始化节点在执行空事务之前做了一些参数设置
但是,这个空事务中是有一些门道的,这里先不展开,后面讲原理会提到
Ⅳ、模拟切换并写入数据观察gtid
鄙人总结如下:
- 所有节点online被看作初始状态,一旦出现这个状态,写入数据GTID都记录在第一个GTID区间
- 达到初始状态之后,当前主节点写入过数据,关闭主节点GR,在新主上写数据会记录到一个新的GTID区间
简单测试搂一眼:
由于三个节点,所有case都模拟全整理出来太冗余,故只做必要测试
case1:
主节点未写入数据,停掉主节点GR,在新主写数据
master:
(root@localhost) [(none)]> stop group_replication;
Query OK, 0 rows affected (4.06 sec)
新主节点slave1:
(root@localhost) [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | a6a83b07-a4b0-11ea-ace2-fa163e6f3efc | slave1 | 3306 | ONLINE | PRIMARY | 8.0.19 |
| group_replication_applier | a7dbfa5f-a4b0-11ea-b21b-fa163e98606f | slave2 | 3306 | ONLINE | SECONDARY | 8.0.19 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.01 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000002 | 1164 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-3 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)
(root@localhost) [(none)]> create database t;
Query OK, 1 row affected (0.01 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000002 | 1339 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-4 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)
发现GTID记录在第一个区间
case2:
主节点写入数据后停掉GR,在新主节点写入数据
此时新主节点slave1已写入数据,停掉GR,在slave2写入数据
slave1:
(root@localhost) [(none)]> stop group_replication;
Query OK, 0 rows affected (4.37 sec)
slave2:
(root@localhost) [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | a7dbfa5f-a4b0-11ea-b21b-fa163e98606f | slave2 | 3306 | ONLINE | PRIMARY | 8.0.19 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.00 sec)
(root@localhost) [(none)]> create table t.t(id int primary key);
Query OK, 0 rows affected (0.03 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+--------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+--------------------------------------------------+
| binlog.000002 | 1532 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-4:1000003 |
+---------------+----------+--------------+------------------+--------------------------------------------------+
1 row in set (0.00 sec)
(root@localhost) [(none)]> insert into t.t values(1);
Query OK, 1 row affected (0.01 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+----------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+----------------------------------------------------------+
| binlog.000002 | 1782 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-4:1000003-1000004 |
+---------------+----------+--------------+------------------+----------------------------------------------------------+
1 row in set (0.00 sec)
可以看到gtid记录到了新区间
case3:
保持三个节点都online
此时主节点写入数据GTID是记录在第二段区间
恢复至三个节点
master:
(root@localhost) [(none)]> start group_replication;
Query OK, 0 rows affected (3.87 sec)
slave1:
(root@localhost) [(none)]> start group_replication;
Query OK, 0 rows affected (4.28 sec)
slave2:
(root@localhost) [(none)]> insert into t.t values(2);
Query OK, 1 row affected (0.01 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+----------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+----------------------------------------------------------+
| binlog.000002 | 2848 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-7:1000003-1000004 |
+---------------+----------+--------------+------------------+----------------------------------------------------------+
1 row in set (0.00 sec)
看到gtid写回了第一个区间,但是为什么是7?前面解释过,加入节点会产生空事务,刚加了俩节点
case4:
连续停两次写入过数据的主节点
此时集群的状态已经到达初始状态,slave2是主节点
预测到最新主上写数据会产生第三个GTID区间
(root@localhost) [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | a50df330-a4b0-11ea-bfd3-fa163ed6ef91 | master | 3306 | ONLINE | SECONDARY | 8.0.19 |
| group_replication_applier | a6a83b07-a4b0-11ea-ace2-fa163e6f3efc | slave1 | 3306 | ONLINE | SECONDARY | 8.0.19 |
| group_replication_applier | a7dbfa5f-a4b0-11ea-b21b-fa163e98606f | slave2 | 3306 | ONLINE | PRIMARY | 8.0.19 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)
slave2:
(root@localhost) [(none)]> insert into t.t values(3);
Query OK, 1 row affected (0.00 sec)
(root@localhost) [(none)]> stop group_replication;
Query OK, 0 rows affected (4.14 sec)
master:
(root@localhost) [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | a50df330-a4b0-11ea-bfd3-fa163ed6ef91 | master | 3306 | ONLINE | PRIMARY | 8.0.19 |
| group_replication_applier | a6a83b07-a4b0-11ea-ace2-fa163e6f3efc | slave1 | 3306 | ONLINE | SECONDARY | 8.0.19 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)
(root@localhost) [(none)]> insert into t.t values(4);
Query OK, 1 row affected (0.00 sec)
(root@localhost) [(none)]> stop group_replication;
Query OK, 0 rows affected (4.44 sec)
slave1:
(root@localhost) [(none)]> insert into t.t values(5);
Query OK, 1 row affected (0.02 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+------------------------------------------------------------------+
| binlog.000002 | 3582 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-8:1000003-1000005:2000005 |
+---------------+----------+--------------+------------------+------------------------------------------------------------------+
1 row in set (0.00 sec)
(root@localhost) [(none)]> insert into t.t values(6);
Query OK, 1 row affected (0.01 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+--------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+--------------------------------------------------------------------------+
| binlog.000002 | 3832 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-8:1000003-1000005:2000005-2000006 |
+---------------+----------+--------------+------------------+--------------------------------------------------------------------------+
1 row in set (0.00 sec)
预测正确,写入到了第三个GTID区间
最后,我们再恢复到初始状态看一把
master:
(root@localhost) [(none)]> start group_replication;
Query OK, 0 rows affected (3.66 sec)
slave2:
(root@localhost) [(none)]> start group_replication;
Query OK, 0 rows affected (3.59 sec)
slave1:
(root@localhost) [(none)]> insert into t.t values(7);
Query OK, 1 row affected (0.01 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
| binlog.000002 | 4946 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-11:1000003-1000005:2000005-2000006 |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
1 row in set (0.00 sec)
(root@localhost) [(none)]> insert into t.t values(8);
Query OK, 1 row affected (0.02 sec)
(root@localhost) [(none)]> show master status;
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
| binlog.000002 | 5196 | | | 745c2bc6-9fe4-11ea-8c89-fa163e98606f:1-12:1000003-1000005:2000005-2000006 |
+---------------+----------+--------------+------------------+---------------------------------------------------------------------------+
1 row in set (0.00 sec)
又写回到了第一个GTID区间,英吹斯汀!
划重点
- 添加节点会产生一个空事务
- 主节点写入过数据并关闭gr后,新主节点写入数据会记录到新GTID区间
over~~~