介绍
pt-slave-restart是一款mysql主从异常处理工具,监控replica端SQL_THREAD是否正常,如果异常pt-slave-restart尝试跳过导致错误的事务,然后重新启动SQL_THREAD已达到修复SQL_THREAD的目的。
测试
MySQL主从复制source端
mysql> select * from t1;
+----+-------+
| id | name |
+----+-------+
| 1 | name1 |
| 2 | name2 |
| 3 | name3 |
+----+-------+
MySQL主从复制replica端
mysql> select * from t1;
+----+-------+
| id | name |
+----+-------+
| 1 | name1 |
| 2 | name2 |
| 3 | name3 |
+----+-------+
执行pt-slave-restart监控replica端
[root]# pt-slave-restart h=192.168.1.73,P=3312,u=root,p=m密码
MySQL主从复制source端
mysql> set sql_log_bin=0;
mysql> insert into t1 values(4,'name4');
mysql> set sql_log_bin=1;
mysql> delete from t1 where id=4;
MySQL主从复制replica端,此时SQL_THREAD线程报错1062报错:
mysql> show replica status\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 192.168.1.73
Source_User: repuser
Source_Port: 3311
Connect_Retry: 60
Source_Log_File: binlog.000045
Read_Source_Log_Pos: 1021064795
Relay_Log_File: 64562e05689a-relay-bin.000056
Relay_Log_Pos: 1095227
Relay_Source_Log_File: binlog.000045
Replica_IO_Running: Yes
Replica_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000045, end_log_pos 1021061873. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Source_Log_Pos: 1021061586
Relay_Log_Space: 1021065568
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Source_SSL_Allowed: No
Source_SSL_CA_File:
Source_SSL_CA_Path:
Source_SSL_Cert:
Source_SSL_Cipher:
Source_SSL_Key:
Seconds_Behind_Source: NULL
Source_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000045, end_log_pos 1021061873. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Source_Server_Id: 88
Source_UUID: e28ee878-f92e-11ed-b68e-0242ac110003
Source_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Replica_SQL_Running_State:
Source_Retry_Count: 86400
Source_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 230607 09:04:33
Source_SSL_Crl:
Source_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Source_TLS_Version:
Source_public_key_path:
Get_Source_public_key: 0
Network_Namespace:
稍等一会儿,pt-slave-restart会自动修复replica端主从异常
[root]# pt-slave-restart h=192.168.1.73,P=3312,u=root,p=Gaoyu@029
2023-06-07T17:01:10 P=3312,h=192.168.1.73,p=...,u=root 64562e05689a-relay-bin.000056 1004491 1062
注释:
时间戳:2023-06-07T17:01:10
replica端信息:P=3312,h=192.168.1.73,p=...,u=root
relay log:64562e05689a-relay-bin.000056
relay log位置:1004491
主从复制最后报错码:1062
查看replica端主从复制信息,此时主从复制恢复正常
mysql> show replica status\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 192.168.1.73
Source_User: repuser
Source_Port: 3311
Connect_Retry: 60
Source_Log_File: binlog.000045
Read_Source_Log_Pos: 1021515791
Relay_Log_File: 64562e05689a-relay-bin.000056
Relay_Log_Pos: 1549019
Relay_Source_Log_File: binlog.000045
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Source_Log_Pos: 1021515378
Relay_Log_Space: 1021516564
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Source_SSL_Allowed: No
Source_SSL_CA_File:
Source_SSL_CA_Path:
Source_SSL_Cert:
Source_SSL_Cipher:
Source_SSL_Key:
Seconds_Behind_Source: 0
Source_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Source_Server_Id: 88
Source_UUID: e28ee878-f92e-11ed-b68e-0242ac110003
Source_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
Source_Retry_Count: 86400
Source_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Source_SSL_Crl:
Source_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Source_TLS_Version:
Source_public_key_path:
Get_Source_public_key: 0
Network_Namespace:
此时source端与replica端t1表数据是一致的,因为source端先insert后delete。
参考:https://docs.percona.com/percona-toolkit/pt-slave-restart.html