MySQL并行复制的一个坑
早上巡检数据库,发现一个延迟从库的sql_thread中断了。
Last_SQL_Errno: 1755
Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./oracle-relay-bin.000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
检查performance_schema下的replication_applier_status_by_worker表,除了GTID之外也没有更具体的信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 | "root@localhost:mysql3308.sock [(none)]> select * from performance_schema.replication_applier_status_by_worker; + --------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+ | CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP | + --------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+ | | 1 | NULL | OFF | 0b961fcc-41c2-11e7-84fd-286ed488c7da:156369774 | 0 | | 0000-00-00 00:00:00 | | | 2 | NULL | OFF | | 0 | | 0000-00-00 00:00:00 | | | 3 | NULL | OFF | | 0 | | 0000-00-00 00:00:00 | | | 4 | NULL | OFF | | 0 | | 0000-00-00 00:00:00 | | | 5 | NULL | OFF | | 0 | | 0000-00-00 00:00:00 | | | 6 | NULL | OFF | | 0 | | 0000-00-00 00:00:00 | | | 7 | NULL | OFF | | 0 | | 0000-00-00 00:00:00 | | | 8 | NULL | OFF | | 0 | | 0000-00-00 00:00:00 | + --------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+ |
既然relay_log的位置信息都有了,那就去日志里看看吧:
解析Binlog文件:
1 | mysqlbinlog - v -- base64 -output=decode-rows oracle-relay-bin.000093 >1.sql |
找到152912092位置点附近的日志:
检查了一下数据库中这个表ID为14816035的数据确实是不存在的。
另外除了这条日志,其它日志的last_committed和sequence_number都为0,last_committed表示事务提交的时候,上次事务提交的编号。last_committed和sequence_number代表的就是所谓的LOGICAL_CLOCK。
猜测如果手动把这条数据插入延迟从库,并且使用注入一个空事务跳过这个GTID的方法重启sql_thread,相信这个错误也能被解决。
但既然带了LOGICAL_CLOCK的事务就会出错,跳过事务的方法很难保证以后不会出错。
注意到这条日志的last_committed是一个异常大的值,且错误信息中有提到The master event is logically timestamped incorrectly。我怀疑是不是并行配置的问题。
从库配置:
1 2 3 4 5 6 7 | "root@localhost:mysql3308.sock [(none)]>show variables like '%para%' ; +------------------------+---------------+ | Variable_name | Value | +------------------------+---------------+ | slave_parallel_type | LOGICAL_CLOCK | | slave_parallel_workers | 8 | +------------------------+---------------+ |
再检查主库配置:
1 2 3 4 5 6 | (root@localhost:mysql.sock) [(none)]>show variables like '%para%' ; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | slave_parallel_workers | 0 | +------------------------+-------+ |
发现主库根本就没有slave_parallel_type这项配置。想起来主库是mysql5.6了。
1 2 3 4 5 6 | (root@localhost:mysql.sock) [(none)]> select version(); +------------+ | version() | +------------+ | 5.6.35-log | +------------+ |
那么问题基本上就知道了,主库5.6只支持基于DATABASE的并行复制,而5.7的从库配置成LOGICAL_CLOCK导致了异常。
明白了问题所在,那就好解决了,把从库的slave_parallel_type改为DATABASE,再起sql_thread问题应该就解决了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | "root@localhost:mysql3308.sock [none]> set global slave_parallel_type= 'DATABASE' ; Query OK, 0 rows affected (0.00 sec) "root@localhost:mysql3308.sock [none]>show global variables like '%slave_parallel_type%' ; +---------------------+----------+ | Variable_name | Value | +---------------------+----------+ | slave_parallel_type | DATABASE | +---------------------+----------+ 1 row in set (0.00 sec) "root@localhost:mysql3308.sock [none]>show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: master Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: binlog.000104 Read_Master_Log_Pos: 160115307 Relay_Log_File: oracle-relay-bin.000093 Relay_Log_Pos: 152912092 Relay_Master_Log_File: binlog.000100 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1755 Last_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name . /oracle-relay-bin .000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly.. Skip_Counter: 0 Exec_Master_Log_Pos: 152911925 Relay_Log_Space: 4455094667 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1755 Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name . /oracle-relay-bin .000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly.. Replicate_Ignore_Server_Ids: Master_Server_Id: 50 Master_UUID: 0b961fcc-41c2-11e7-84fd-286ed488c7da Master_Info_File: mysql.slave_master_info SQL_Delay: 3600 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 180716 18:02:56 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:111060115-163843604 Executed_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:1-156369774 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) "root@localhost:mysql3308.sock [none]>stop slave sql_thread; Query OK, 0 rows affected, 1 warning (0.00 sec) "root@localhost:mysql3308.sock [none]>start slave sql_thread; Query OK, 0 rows affected (0.01 sec) "root@localhost:mysql3308.sock [none]>show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: master Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: binlog.000104 Read_Master_Log_Pos: 160161836 Relay_Log_File: oracle-relay-bin.000093 Relay_Log_Pos: 169205552 Relay_Master_Log_File: binlog.000100 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 169205385 Relay_Log_Space: 4455141196 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 5351 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 50 Master_UUID: 0b961fcc-41c2-11e7-84fd-286ed488c7da Master_Info_File: mysql.slave_master_info SQL_Delay: 3600 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for Slave Worker to release partition Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:111060115-163843692 Executed_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:1-156400100 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) |
打完收工。
转载请注明出处。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南