Relay log read failure: Could not parse relay log event entry. mysql中继日志损坏.mysql relay log损坏

【1】故障情况

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.112.5.87
                  Master_User: rwuser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.012850
          Read_Master_Log_Pos: 308020745
               Relay_Log_File: relay-bin.038538
                Relay_Log_Pos: 126258173
        Relay_Master_Log_File: mysql-bin.012850
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: 
                   the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), 
                   the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), 
                   a network problem, or a bug in the master's or slave's MySQL code. 
                   If you want to check the master's binary log or slave's relay log, 
                   you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 126257963
              Relay_Log_Space: 308022882
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. 
               The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), 
               the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), 
               a network problem, or a bug in the master's or slave's MySQL code. 
               If you want to check the master's binary log or slave's relay log, 
               you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1535
                  Master_UUID: e311cc65-7528-11e8-9980-00163e008cff
             Master_Info_File: /data/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 210827 05:39:05
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: e311cc65-7528-11e8-9980-00163e008cff:1-5005310115
            Executed_Gtid_Set: e311cc65-7528-11e8-9980-00163e008cff:1-5005283608,ed864d86-7528-11e8-9981-00163e028878:1-140
                Auto_Position: 1
1 row in set (0.00 sec)

(1.1)非 gtid auto_postion=1

如果是非 gtid,或gtid下没有使用 auto_position=1;

则 关注:

Relay_Master_Log_File: mysql-bin.012850
Exec_Master_Log_Pos: 126257963

最好是查看错误日志

Version: '5.6.40-log'  socket: '/data/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)
2021-08-27 05:39:05 2502 [ERROR] Error in Log_event::read_log_event(): 'Event too small', data_len: 0, event_type: 0
2021-08-27 05:39:05 2502 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
2021-08-27 05:39:05 2502 [ERROR] Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594
2021-08-27 05:39:05 2502 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.012850' position 126257963
2021-08-27 05:41:36 7f7659e81700 InnoDB: Buffer pool(s) load completed at 210827  5:41:36

如上述代码,

 We stopped at log 'mysql-bin.012850' position 126257963

和上面的show slave 中的两个字段提示的文件与执行位置是一样的;

解决代码:

mysql> stop slave;
Query OK, 0 rows affected (0.02 sec)

mysql> change master to master_log_file='mysql-bin.012850' , master_log_pos=126257963; 
Query OK,
0 rows affected (0.28 sec)

mysql
> start slave;

(1.2)GTID auto_position=1

那么gtid_next依然在执行;

解决代码:

stop slave;
reset slave;
start slave;

如果还有错误,可能会有一条错误;则跳过

解决代码:

stop slave;
set @@session.gtid_next='e311cc65-7528-11e8-9980-00163e008cff:1-5005283609,ed864d86-7528-11e8-9981-00163e028878:1-140';
begin ; commit ; set @@session.gtid_next='automatic'; start slave;

 

关于MySQL的GTID和crash safe

在测试MySQL 5.6基于GTID的复制集群时遇到一些问题,于是看了下GTID的相关资料,整理了一个memo。

1. GTID的作用

GTID(Global Transaction ID)是MySQL5.6引入的功能,可以在集群全局范围标识事务,取代过去基于文件+POS的复制定位可大大简化复制集群的管理,并且能避免重复执行事务,增强了数据的一致性。

2. GTID的形式

官方的定义如下:

GTID = source_id:transaction_id

上面的source_id是MySQL实例的唯一标识,在MySQL第一次启动时自动生成并持久化到auto.cnf文件里,transaction_id是MySQL实例上执行的事务的唯一标识,通常从1开始递增。 例如:

3E11FA47-71CA-11E1-9E33-C80AA9429562:23

GTID 的集合(GTIDs)可以用source_id+transaction_id范围表示,例如

3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5

复杂一点的是:如果这组 GTIDs 来自不同的 source_id,各组 source_id 之间用逗号分隔;如果事务序号有多个范围区间,各组范围之间用冒号分隔,例如:

3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5:11-18, 2C256447-3F0D-431B-9A12-575BB20C1507:1-27

3. GTID相关的几个变量

MySQL通过几个局变量维持全局的GTID状态

  • GTID_EXECUTED
    表示已经在该实例上执行过的事务; 执行RESET MASTER 会将该变量置空; 我们还可以通过设置GTID_NEXT在执行一个空事务,来影响GTID_EXECUTED
  • GTID_PURGED 
    已经被删除了binlog的事务,它是GTID_EXECUTED的子集,只有在GTID_EXECUTED为空时才能设置该变量,修改GTID_PURGED会同时更新GTID_EXECUTED为和GTID_PURGED相同的值。
  • GTID_OWNED
    表示正在执行的事务的gtid以及对应的线程ID。
  • GTID_NEXT
    SESSION级别变量,表示下一个将被使用的GTID。

可以通过show命令查看gtid相关的变量

mysql> show global variables like 'gtid%'; 
+----------------------+----------------------------------------------+ | Variable_name | Value | +----------------------+----------------------------------------------+ | gtid_deployment_step | OFF | | gtid_executed | e10c75be-5c1b-11e6-ab7c-000c29603333:1-29358 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | e10c75be-5c1b-11e6-ab7c-000c29603333:1-29288 | +----------------------+----------------------------------------------+ 5 rows in set (0.05 sec) mysql> show variables like 'gtid_next'; +---------------+-----------+ | Variable_name | Value | +---------------+-----------+ | gtid_next | AUTOMATIC | +---------------+-----------+ 1 row in set (0.00 sec)

4. GTID如何产生?

GTID的生成受gtid_next控制。 在Master上,gtid_next是默认的AUTOMATIC,即GTID在每次事务提交时自动生成。它从当前已执行的GTID集合(即gtid_executed)中,找一个大于0的未使用的最小值作为下个事务GTID。同时将GTID写入到binlog(set gtid_next记录),在实际的更新事务记录之前。

在Slave上,从binlog先读取到主库的GTID(即set gtid_next记录),而后执行的事务采用该GTID。

这里的潜台词是GTID范围可能不是连续的,会有空洞,可能由于人工干预也可能由于多线程复制(MTS)的原因。

下面通过设置gtid_next,我们人为产生了gtid空洞。

mysql> show master status; +---------------+----------+--------------+------------------+----------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------+----------+--------------+------------------+----------------------------------------------+ | binlog.000008 | 2522 | | | e10c75be-5c1b-11e6-ab7c-000c29603333:1-29370 | +---------------+----------+--------------+------------------+----------------------------------------------+ 1 row in set (0.00 sec) mysql> set gtid_next='e10c75be-5c1b-11e6-ab7c-000c29603333:29374'; Query OK, 0 rows affected (0.00 sec) mysql> begin;commit; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.03 sec) mysql> set gtid_next='AUTOMATIC'; Query OK, 0 rows affected (0.00 sec) mysql> show master status; +---------------+----------+--------------+------------------+----------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------+----------+--------------+------------------+----------------------------------------------------+ | binlog.000008 | 2715 | | | e10c75be-5c1b-11e6-ab7c-000c29603333:1-29370:29374 | +---------------+----------+--------------+------------------+----------------------------------------------------+ 1 row in set (0.00 sec)

继续执行事务,MySQL会分配一个最小的未使用GTID,最终会把空洞填上。

mysql> insert into tba1 values(1); Query OK, 1 row affected (0.01 sec) mysql> show master status; +---------------+----------+--------------+------------------+----------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------+----------+--------------+------------------+----------------------------------------------------+ | binlog.000008 | 2953 | | | e10c75be-5c1b-11e6-ab7c-000c29603333:1-29371:29374 | +---------------+----------+--------------+------------------+----------------------------------------------------+ 1 row in set (0.00 sec)

这意味着不能通过GTID中的事务号精确的确定事务顺序,事务的顺序应由事务记录处于binlog中的位置决定。所以关于GTIDs最常见的运算是集合运算而不是事务号比较。

参考
http://dev.mysql.com/doc/refman/5.6/en/replication-gtids-concepts.html

The generation and lifecycle of a GTID consists of the following steps: A transaction is executed and committed on the master. This transaction is assigned a GTID using the master's UUID and the smallest nonzero transaction sequence number not yet used on this server; the GTID is written to the master's binary log (immediately preceding the transaction itself in the log). After the binary log data is transmitted to the slave and stored in the slave's relay log (using established mechanisms for this process—see Section 17.2, “Replication Implementation”, for details), the slave reads the GTID and sets the value of its gtid_next system variable as this GTID. This tells the slave that the next transaction must be logged using this GTID. The slave sets gtid_next in a session context. The slave checks to make sure that this GTID has not already been used to log a transaction in its own binary log. If and only if this GTID has not been used, the slave then writes the GTID and applies the transaction (and writes the transaction to its binary log). By reading and checking the transaction's GTID first, before processing the transaction itself, the slave guarantees not only that no previous transaction having this GTID has been applied on the slave, but also that no other session has already read this GTID but has not yet committed the associated transaction. In other words, multiple clients are not permitted to apply the same transaction concurrently. Because gtid_next is not empty, the slave does not attempt to generate a GTID for this transaction but instead writes the GTID stored in this variable—that is, the GTID obtained from the master—immediately preceding the transaction in its binary log.

5. GTID如何持久化?

5.6为支持GTID新增了2个事件

  • Previous_gtids_log_event
    在每个binlog文件的开头部分,记录在该binlog文件之前已执行的GTID集合。
  • Gtid_log_event
    在每个事务的前面,表明下一个事务的gtid。

MySQL服务器启动时,通过读binlog文件,初始化gtid_executed和gtid_purged。

  • gtid_executed为最新中的binlog文件中Previous_gtids_log_event和所有Gtid_log_event的并集。
  • gtid_purged为最老的binlog文件中Previous_gtids_log_event。

这样,通过读取binlog文件,重启mysql后,gtid_executed和gtid_purged的值能保持和上次一致。不过,在MySQL crash的场景,这一点只有在设置了sync_binlog = 1才能保证。

参考

  • http://dev.mysql.com/doc/refman/5.6/en/replication-options-gtids.html#sysvar_gtid_executed

    When the server starts, @@global.gtid_executed is initialized to the union of the following two sets: The GTIDs listed in the Previous_gtids_log_event of the newest binary log The GTIDs found in every Gtid_log_event in the newest binary log.
  • http://dev.mysql.com/doc/refman/5.6/en/replication-options-gtids.html#sysvar_gtid_purged

    When the server starts, the global value of gtid_purged is initialized to the set of GTIDs contained by the Previous_gtid_log_event of the oldest binary log. When a binary log is purged, gtid_purged is re-read from the binary log that has now become the oldest one.

6. 基于GTID的复制如何确定复制位点?

GTID的主要作用就是解决binlog复制的定位问题。在CHANGE MASTER TO时指定MASTER_AUTO_POSITION = 1,MySQL使用新增的 COM_BINLOG_DUMP_GTID协议进行复制。过程如下:

备库发起复制连接时,将自己的已接受和已执行的gtids的并集(后面称为slave_gtid_executed)发送给主库。即下面的集合:

UNION(@@global.gtid_executed, Retrieved_gtid_set - last_received_GTID)

主库将自己的gtid_executed和slave_gtid_executed的差集的binlog发送给Slave。主库的binlog dump过程如下:

  1. 检查slave_gtid_executed中source_id为主库的GTID集合是否是主库gtid_executed的子集,如否那么主备数据可能不一致,报错。
  2. 检查主库的purged_executed是否是slave_gtid_executed的子集,如否代表缺失备库需要的binlog,报错。
  3. 从最后一个Binlog开始扫描,获取文件头部的PREVIOUS_GTIDS_LOG_EVENT,如果它是slave_gtid_executed的子集,则这是需要发送给Slave的第一个binlog文件,否则继续向前扫描。
  4. 从第3步找到的binlog文件的开头读取binlog记录,判断binlog记录是否已被包含在slave_gtid_executed中,如果已包含跳过不发送。

从上面的过程可知,在指定MASTER_AUTO_POSITION = 1时,Master发送哪些binlog记录给Slave,取决于Slave的gtid_executed和Retrieved_Gtid_Set以及Master的gtid_executed,和relay_log_info以及master_log_info中保存的复制位点没有关系。

参考

  • http://dev.mysql.com/doc/refman/5.6/en/change-master-to.html
  • 代码:com_binlog_dump_gtid()->mysql_binlog_send()->find_first_log_not_in_gtid_set()

7. 如何确保基于GTID复制的crash safe slave?

有了前面的基础,可以考虑下如何配置才能保证crash safe slave。

要保证crash safe slave,就是要在slave crash后,下次可以接在crash前的地方继续应用日志。在非GTID的复制下,按照下面配置就可以了,这也是我们在各相关文章中见的最多的配置。

relay_log_info_repository = TABLE relay_log_recovery = ON

这样可行的原因是,relay_log_info_repository = TABLE时,apply event和更新relay_log_info表在同一个事务里,innodb要么让它们同时生效,要么同时不生效,保证位点信息和已经应用的事务精确匹配。

relay_log_recovery = ON时,会抛弃master_log_info中记录的复制位点,根据relay_log_info的执行位置重新从Master获取binlog,这就避免了relay 未写全导致的问题。

以上两点组合,这就可以确保slave crash后,还可以接在上次apply的位点继续接受binlog并apply。

但是,在基于GTID的复制下,规则变了。crash的Slave重启后,从binlog中解析的gtid_executed决定了要apply哪些binlog记录。所以binlog必须和innodb存储引擎的数据保持一致。另外mysql启动时,会从relay log文件中获取已接受的GTIDs并更新到读取Retrieved_Gtid_Set。由于relay log文件可能不完整,所以需要抛弃已接受的relay log文件。

这样,对于基于GTID的复制,保证crash safe slave的设置就是下面这样。

sync_binlog = 1 innodb_flush_log_at_trx_commit = 1 relay_log_recovery = ON

否则可能会出现重复应用binlog event或者遗漏应用binlog event的问题。我们在进行宕机模拟测试时发现有这个问题, 后来查找官方bug系统,发现这是一个已知的BUG 。

但是,这个设置带来了另一个问题,sync_binlog和innodb_flush_log_at_trx_commit都设置为1对性能的影响还是挺大的,有没有其它方法回避?

方法当然是有的,还是按下面这样设置:

relay_log_info_repository = TABLE relay_log_recovery = ON

但是,Slave crash后,按下面的步骤启动复制。

  1. 启动mysql,但不开启复制

    mysqld --skip-slave-start
  2. 在Slave上查看执行位点(Relay_Master_Log_File,Exec_Master_Log_Pos)

    show slave status\G
  3. 在Master上解析binlog文件检查该位点对应的GTID

    mysqlbinlog -v -v $binlogfile
  4. 在Slave设置gitd_purged为已执行位点的之前的GTID集合

    reset master; set global gitd_purged='xxxxxxxxx';
  5. 启动复制

    start slave

还可以采用另一个类似的方法,获取已执行的GTID集合 。

  1. 启动mysql,但不开启复制

    mysqld --skip-slave-start
  2. 在Slave上修改为基于文件POS的复制

    change master to MASTER_AUTO_POSITION = 0
  3. 启动slave IO线程(这里不能启动SQL线程,如果接受到的GTID已经在Slave的gtid_executed里了,会被Slave skip掉)

    start slave io_thread
  4. 检查binlog传输的开始位置(即Retrieved_Gtid_Set的值)

    show slave status\G
  5. 将gitd_purged设置为binlog传输位置的前面的GTID的集合。

    reset master; set global gitd_purged='xxxxxxxxx';
  6. 修改回auto position的复制

    change master to MASTER_AUTO_POSITION = 1
  7. 启动slave SQL线程

    start slave sql_thread

注意1 在发生过主备切换的场景下,mysql已执行的GTID集合可能包含多个source_id,这时设置备机的 gitd_purged时就要注意和主库保持一致,也包含多个source_id。

注意2 这种变通的方法不适合多线程复制。因为多线程复制可能产生gtid gap和Gap-free low-watermark position,这会导致Salve上重复apply已经apply过的event。后果就是数据不一致或者复制中断,除非设置binlog格式为row模式并且slave_exec_mode=IDEMPOTENT,slave_exec_mode=IDEMPOTENT允许Slave回放binlog时忽略重复键和找不到键的错误,使得binlog回放具有幂等性,但这也意味着真的出现了主备数据不一致也会被它忽略。

8. 多线程复制下如何保证crash safe slave?

前面提到了多线程复制,多线程复制下会产生gtid gap和Gap-free low-watermark position。 然而在启用GTID的情况下,这都不是事。Slave在apply event时会跳过gitd_executed中已经执行过的事件,所以,只要保证gitd_executed和innodb引擎的数据一致,并且从Master拉过来的binlog记录没有遗漏即可,出现重复没有关系。

即,在多线程复制+GTID的配置下,保证crash safe slave的设置仍然是:

sync_binlog = 1 innodb_flush_log_at_trx_commit = 1 relay_log_recovery = ON

但是,测试发现在开启多线程复制后,通过kill 虚机的方式模拟Slave宕机,当Slave再启动后start slave失败。 mysql日志中有如下错误消息:

--------------------------------- 2016-10-26 21:00:23 2699 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql-relay-bin' to avoid this problem. 2016-10-26 21:00:24 2699 [Note] Slave: MTS group recovery relay log info based on Worker-Id 1, group_relay_log_name ./mysql-relay-bin.000011, group_relay_log_pos 2017523 group_master_log_name binlog.000007, group_master_log_pos 2017363 2016-10-26 21:00:24 2699 [ERROR] Error looking for file after ./mysql-relay-bin.000012. 2016-10-26 21:00:24 2699 [ERROR] Failed to initialize the master info structure 2016-10-26 21:00:24 2699 [Note] Check error log for additional messages. You will not be able to start replication until the issue is resolved and the server restarted. 2016-10-26 21:00:24 2699 [Note] Event Scheduler: Loaded 0 events 2016-10-26 21:00:24 2699 [Note] mysqld: ready for connections. Version: '5.6.31-77.0-log' socket: '/data/mysql/mysql.sock' port: 3306 Percona Server (GPL), Release 77.0, Revision 5c1061c ---------------------------------

启动slave时同样报错

mysql> start slave; ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository

出现这种现象的原因在于,relay_log_recovery=1 且 slave_parallel_workers>1的情况下,mysql启动时会进入MTS Group恢复流程,即读取relay log,尝试填补由于多线程复制导致的gap。然后relay log文件由于不是实时刷新的,在relay log文件中找不到gap对应的relay log记录(低水位和高水位之间的relay log,低水位点即slave_relay_log_info.Relay_log_pos的值)就会报这个错。

实际上,在GTID模式下,slave在apply event的时候可以跳过重复事件,所以可以安全的从低水位点应用日志,没必要解析relay log文件。 这看上去是一个bug,于是提交了一个bug报告,目前还没有收到回复。

作为回避方法,可以通过reset slave清除relay log文件,跳过这个问题。执行步骤如下

reset slave; change master to MASTER_AUTO_POSITION = 1 start slave;

在这里,单纯的调reset slave不能把状态清理干净。reset slave可以把slave_master_info,slave_relay_log_info和slave_worker_info 3张表清空,但内部的Relay_log_info.inited标志位仍然处于未被初始化状态(由于前面的MTS group recovery出错导致Relay_log_info.inited未被初始化),所以直接调用start slave后面仍然会出现1594的错误。调用change master to时,会再调一次Relay_log_info,此时slave_relay_log_info为空,所以跳过了MTS group recovery,所以Relay_log_info初始化成功。

Last_SQL_Errno: 1594 Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

9. 如果Master不是crash safe会有什么后果?

Master要想保持crash safe需要按下面的参数进行设置,否则gtid_executed会和实际的innodb存储引擎中的数据不一致。

sync_binlog = 1 innodb_flush_log_at_trx_commit = 1

在复制的场景下,如果发生了failover,crash的旧Master不能直接新的Master建立复制关系,必须从备份恢复做起,否则很可能发生数据不一致。在这两个参数被设置为双1的情况下,到是可以尝试以MASTER_AUTO_POSITION = 1和新Master建立复制关系,如果旧Master比新Master多几个crash时的事务,可以被新Master检测出来,这时再从备份恢复也不迟。

如果没有发生failover,crash的Master很快恢复了,那么Slave继续连接到这个Master上可能会发生什么下面的事情

  1. 复制中断

    Master由于是弱持久化,丢失了一部分binlog,导致Slave上的数据(slave_gtid_executed)比Master(gtid_executed)多,复制中断。

  2. 主从数据不一致

    Master的innodb引擎中丢失了一部分数据,而Slave中有这些数据,主从不一致。甚至于,Master基于被截断的binlog上解析出的gtid_executed继续生成GTID,结果新事务的GTID可能是在crash前已经存在的,即同一个GTID代表了不同的事务。

非GTID的复制其实存在类似的问题。所以,在Master配置为弱持久性的情况下,Master发生OS或物理机宕机,安全的做法是切换到备机,或者重新同步主从数据。

10. 其它

  • SRL和RBL下--replicate-do-db和--replicate-ignore-db的语义不同

    • Statement-based logging评估是基于缺省DB的,而不管实际修改的DB,这个行为有些奇葩,也就是说DB匹配与否,在use的时候已经定了,而和执行的SQL无关。
    • Row-based logging 评估是基于实际修改的DB,这个比较合理。

    需要注意的是,实际的binlog的格式是SBL还是RBL,除了依赖binlog_format设置还看具体的SQL,比如DDL都是SBL的。 另外,被忽略的只是实际的更新语句,更新语句前面的SET @@SESSION.GTID_NEXT='xxx'还是被执行了的,所以Slave即使没有应用更新,gtid_executed还是和Master保持一致的。

  • 查看RBL的binlog记录中SQL

    mysqlbinlog -v --base64-output=DECODE-ROWS

    mysqlbinlog -v -v

11. 参考

12. 附录:相关代码

基于GTID复制的位置定位

Slave线程生成的gtid_executed和Retrieved_Gtid_Set的并集,然后往Master发送dump请求。

Rpl_slave.cc static int request_dump(THD *thd, MYSQL* mysql, Master_info* mi, bool *suppress_warnings) { ... enum_server_command command= mi->is_auto_position() ? COM_BINLOG_DUMP_GTID : COM_BINLOG_DUMP; ... if (command == COM_BINLOG_DUMP_GTID) { ... if (gtid_executed.add_gtid_set(mi->rli->get_gtid_set()) != RETURN_STATUS_OK || gtid_executed.add_gtid_set(gtid_state->get_logged_gtids()) != RETURN_STATUS_OK) //生成的gtid_executed和Retrieved_Gtid_Set的并集 { global_sid_lock->unlock(); goto err; } ... /* Note: binlog_flags is always 0. However, in versions up to 5.6 RC, the master would check the lowest bit and do something unexpected if it was set; in early versions of 5.6 it would also use the two next bits. Therefore, for backward compatibility, if we ever start to use the flags, we should leave the three lowest bits unused. */ int2store(ptr_buffer, binlog_flags); ptr_buffer+= ::BINLOG_FLAGS_INFO_SIZE; int4store(ptr_buffer, server_id); ptr_buffer+= ::BINLOG_SERVER_ID_INFO_SIZE; int4store(ptr_buffer, BINLOG_NAME_INFO_SIZE); ptr_buffer+= ::BINLOG_NAME_SIZE_INFO_SIZE; memset(ptr_buffer, 0, BINLOG_NAME_INFO_SIZE); //BINLOG文件名固定为NULL ptr_buffer+= BINLOG_NAME_INFO_SIZE; int8store(ptr_buffer, 4LL); //POS固定为4 ptr_buffer+= ::BINLOG_POS_INFO_SIZE; int4store(ptr_buffer, encoded_data_size); ptr_buffer+= ::BINLOG_DATA_SIZE_INFO_SIZE; gtid_executed.encode(ptr_buffer); //填充Slave的gtid_executed ptr_buffer+= encoded_data_size; command_size= ptr_buffer - command_buffer; DBUG_ASSERT(command_size == (allocation_size - 1)); } else { ... }

Master在slavegtidexecuted不为空的情况下,将未包含在slavegtidexecuted中的binlog记录发给Slave。

Rpl_master.cc void mysql_binlog_send(THD* thd, char* log_ident, my_off_t pos, const Gtid_set* slave_gtid_executed, int flags) 
{ ... bool using_gtid_protocol= slave_gtid_executed != NULL; ... name= search_file_name;
if (log_ident[0]) mysql_bin_log.make_log_name(search_file_name, log_ident);
else { if (using_gtid_protocol) { ... Sid_map* slave_sid_map= slave_gtid_executed->get_sid_map();
DBUG_ASSERT(slave_sid_map); global_sid_lock->wrlock();
const rpl_sid &server_sid= gtid_state->get_server_sid();
rpl_sidno subset_sidno= slave_sid_map->sid_to_sidno(server_sid);
if (!slave_gtid_executed->is_subset_for_sid(gtid_state->get_logged_gtids(),
gtid_state->get_server_sidno(), subset_sidno)) { errmsg= ER(ER_SLAVE_HAS_MORE_GTIDS_THAN_MASTER);
my_errno= ER_MASTER_FATAL_ERROR_READING_BINLOG; global_sid_lock->unlock(); GOTO_ERR; } ...
if (!gtid_state->get_lost_gtids()->is_subset(slave_gtid_executed)) { errmsg= ER(ER_MASTER_HAS_PURGED_REQUIRED_GTIDS);
my_errno= ER_MASTER_FATAL_ERROR_READING_BINLOG; global_sid_lock->unlock(); GOTO_ERR; } global_sid_lock->unlock();
first_gtid.clear();
if (mysql_bin_log.find_first_log_not_in_gtid_set(name, slave_gtid_executed, &first_gtid, &errmsg)) { my_errno= ER_MASTER_FATAL_ERROR_READING_BINLOG; GOTO_ERR; } }
else name= 0; // Find first log } ...

//后面根据binlog文件名和Pos发送binlog,但在GTID模式下,从头(POS=4)读取binlog但跳过Slave已经执行过的GTID。 ... switch (event_type)
{ ... case GTID_LOG_EVENT: if (using_gtid_protocol) { ... skip_group= slave_gtid_executed->contains_gtid(gtid_ev.get_sidno(sid_map), gtid_ev.get_gno()); ... }

 

MTS Group恢复报错

开启多线程复制(MTS)的情况下,Slave crash后再启动时在MTS Group恢复读取relay log文件时报错。相关代码如下:

调用栈

Relay_log_info::rli_init_info() ->init_recovery() //relay_log_recovery=1的时候执行 ->mts_recovery_groups() //slave_parallel_workers>1的时候执行

mts_recovery_groups()的主要处理就是查找gap范围内哪些event已经被执行过了,结果保存在rli->recovery_groups中。在读取relay log文件的过程中,由于crash时relay log不全,找不到完整的事务记录,报错! 如下:

int mts_recovery_groups(Relay_log_info *rli) { MY_BITMAP *groups= &rli->recovery_groups; ... for (uint it_job= 0; it_job < above_lwm_jobs.elements; it_job++) //遍历所以执行位点在lwm之后的worker(lwm也称为checkpoint,保存在slave_relay_log_info.Relay_log_pos中,lwm之前的event不存在gap) { Slave_worker *w= ((Slave_job_group *) dynamic_array_ptr(&above_lwm_jobs, it_job))->worker; LOG_POS_COORD w_last= { const_cast(w->get_group_master_log_name()), w->get_group_master_log_pos() }; ... while (not_reached_commit && (ev= Log_event::read_log_event(&log, 0, p_fdle, opt_slave_sql_verify_checksum))) { ... LOG_POS_COORD ev_coord= { (char *) rli->get_group_master_log_name(), ev->log_pos }; ... if ((ret= mts_event_coord_cmp(&ev_coord, &w_last)) == 0) //找到worker执行位点对应的relay记录,分配给这个worker的从checkpoint到这个位点的event都已经应用了,设置rli->recovery_groups。 { ... DBUG_PRINT("mts", ("Doing a shift ini(%lu) end(%lu).", (w->checkpoint_seqno + 1) - recovery_group_cnt, w->checkpoint_seqno)); for (uint i= (w->checkpoint_seqno + 1) - recovery_group_cnt, j= 0; i <= w->checkpoint_seqno; i++, j++) { if (bitmap_is_set(&w->group_executed, i)) { DBUG_PRINT("mts", ("Setting bit %u.", j)); bitmap_fast_test_and_set(groups, j); } } not_reached_commit= false; } ... if (not_reached_commit && rli->relay_log.find_next_log(&linfo, 1)) //由于crash时relay log不全,找不到完整的事务记录,报错! { error= TRUE; sql_print_error("Error looking for file after %s.", linfo.log_file_name); goto err; } ... }

然而在GTID模式下,先填补gap是没有必要的,Slave在apply event时可以自动跳过已执行的事务(即gtid_executed)。 而且init_recovery()代码里也有下面这段逻辑,把刚才mts_recovery_groups()费尽心思找出来的clear_mts_recovery_groups又给清空了,这更说明了mts_recovery_groups()是在做无用功。在启动mysql时通过debug跳过临时mtsrecoverygroups()的调用,发现确实可以解决问题,启动后数据也是一致的。

int init_recovery(Master_info* mi, const char** errmsg) { ... if (rli->recovery_parallel_workers) { /* This is not idempotent and a crash after this function and before the recovery is actually done may lead the system to an inconsistent state. This may happen because the gap is not persitent stored anywhere and eventually old relay log files will be removed and further calculations on the gaps will be impossible. We need to improve this. /Alfranio. */ error= mts_recovery_groups(rli); if (rli->mts_recovery_group_cnt) { if (gtid_mode == GTID_MODE_ON) { rli->recovery_parallel_workers= 0; rli->clear_mts_recovery_groups(); //清空rli->recovery_groups } else { error= 1; sql_print_error("--relay-log-recovery cannot be executed when the slave " "was stopped with an error or killed in MTS mode; " "consider using RESET SLAVE or restart the server " "with --relay-log-recovery = 0 followed by " "START SLAVE UNTIL SQL_AFTER_MTS_GAPS"); } } } ... }
posted @ 2021-08-27 16:30  郭大侠1  阅读(519)  评论(0编辑  收藏  举报