随笔 - 74  文章 - 0  评论 - 94  阅读 - 12万

MySQL并行复制的一个坑

早上巡检数据库,发现一个延迟从库的sql_thread中断了。

Last_SQL_Errno: 1755
Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./oracle-relay-bin.000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..

检查performance_schema下的replication_applier_status_by_worker表,除了GTID之外也没有更具体的信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
"root@localhost:mysql3308.sock  [(none)]>select * from performance_schema.replication_applier_status_by_worker;
+--------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+
| CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION                          | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP |
+--------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+
|              |         1 |      NULL | OFF           | 0b961fcc-41c2-11e7-84fd-286ed488c7da:156369774 |                 0 |                    | 0000-00-00 00:00:00  |
|              |         2 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         3 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         4 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         5 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         6 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         7 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         8 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
+--------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+

既然relay_log的位置信息都有了,那就去日志里看看吧:

解析Binlog文件:

1
mysqlbinlog -v --base64-output=decode-rows oracle-relay-bin.000093 >1.sql

找到152912092位置点附近的日志:

检查了一下数据库中这个表ID为14816035的数据确实是不存在的。

另外除了这条日志,其它日志的last_committed和sequence_number都为0,last_committed表示事务提交的时候,上次事务提交的编号。last_committed和sequence_number代表的就是所谓的LOGICAL_CLOCK。

猜测如果手动把这条数据插入延迟从库,并且使用注入一个空事务跳过这个GTID的方法重启sql_thread,相信这个错误也能被解决。

但既然带了LOGICAL_CLOCK的事务就会出错,跳过事务的方法很难保证以后不会出错。

注意到这条日志的last_committed是一个异常大的值,且错误信息中有提到The master event is logically timestamped incorrectly。我怀疑是不是并行配置的问题。

从库配置:

1
2
3
4
5
6
7
"root@localhost:mysql3308.sock  [(none)]>show variables like '%para%';
+------------------------+---------------+
| Variable_name          | Value         |
+------------------------+---------------+
| slave_parallel_type    | LOGICAL_CLOCK |
| slave_parallel_workers | 8             |
+------------------------+---------------+

 再检查主库配置:

1
2
3
4
5
6
(root@localhost:mysql.sock) [(none)]>show variables like '%para%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| slave_parallel_workers | 0     |
+------------------------+-------+

 发现主库根本就没有slave_parallel_type这项配置。想起来主库是mysql5.6了。

1
2
3
4
5
6
(root@localhost:mysql.sock) [(none)]>select version();
+------------+
| version()  |
+------------+
| 5.6.35-log |
+------------+

 那么问题基本上就知道了,主库5.6只支持基于DATABASE的并行复制,而5.7的从库配置成LOGICAL_CLOCK导致了异常。

明白了问题所在,那就好解决了,把从库的slave_parallel_type改为DATABASE,再起sql_thread问题应该就解决了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
"root@localhost:mysql3308.sock  [none]>set global slave_parallel_type='DATABASE';
Query OK, 0 rows affected (0.00 sec)
 
"root@localhost:mysql3308.sock  [none]>show global variables like '%slave_parallel_type%';
+---------------------+----------+
| Variable_name       | Value    |
+---------------------+----------+
| slave_parallel_type | DATABASE |
+---------------------+----------+
1 row in set (0.00 sec)
 
"root@localhost:mysql3308.sock  [none]>show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master
                  Master_User: rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000104
          Read_Master_Log_Pos: 160115307
               Relay_Log_File: oracle-relay-bin.000093
                Relay_Log_Pos: 152912092
        Relay_Master_Log_File: binlog.000100
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1755
                   Last_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./oracle-relay-bin.000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 152911925
              Relay_Log_Space: 4455094667
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1755
               Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./oracle-relay-bin.000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 50
                  Master_UUID: 0b961fcc-41c2-11e7-84fd-286ed488c7da
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 3600
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 180716 18:02:56
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:111060115-163843604
            Executed_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:1-156369774
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)
 
"root@localhost:mysql3308.sock  [none]>stop slave sql_thread;
Query OK, 0 rows affected, 1 warning (0.00 sec)
 
"root@localhost:mysql3308.sock  [none]>start slave sql_thread;
Query OK, 0 rows affected (0.01 sec)
 
"root@localhost:mysql3308.sock  [none]>show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master
                  Master_User: rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000104
          Read_Master_Log_Pos: 160161836
               Relay_Log_File: oracle-relay-bin.000093
                Relay_Log_Pos: 169205552
        Relay_Master_Log_File: binlog.000100
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 169205385
              Relay_Log_Space: 4455141196
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 5351
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 50
                  Master_UUID: 0b961fcc-41c2-11e7-84fd-286ed488c7da
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 3600
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Waiting for Slave Worker to release partition
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:111060115-163843692
            Executed_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:1-156400100
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

打完收工。

转载请注明出处。

本文地址:https://www.cnblogs.com/ajiangg/p/9399892.html

posted on   万剑齐发  阅读(2769)  评论(1编辑  收藏  举报
编辑推荐:
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

点击右上角即可分享
微信分享提示