基于GTID Replication主从数据不一致操作
基本的M-S结构
现在master与slave主机数据一致:
mysql> select * from t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 4 |
+------+
3 rows in set (0.00 sec)
+------+
| id |
+------+
| 1 |
| 2 |
| 4 |
+------+
3 rows in set (0.00 sec)
我们来模拟故障现象:
在master上,通过设置sql_log_bin来控制命令是否写入二进制日志中,运行命令:
set sql_log_bin=OFF; insert into t1 values(5); set sql_log_bin=ON; insert into t1 values(6);
这样数据在master上是这样的:
mysql> select * from t1; +------+ | id | +------+ | 1 | | 2 | | 4 | | 5 | | 6 | +------+ 5 rows in set (0.00 sec)
而在slave上的数据是这样的:
mysql> select * from testdb.t1; +------+ | id | +------+ | 1 | | 2 | | 4 | | 6 | +------+ 4 rows in set (0.00 sec)
此时我们在master上执行操作:
update t1 set id=7 where id=5; insert into t1 values(8); mysql> select * from t1; +------+ | id | +------+ | 1 | | 2 | | 4 | | 7 | | 6 | | 8 | +------+ 6 rows in set (0.00 sec)
我们查询一下slave上的数据:
mysql> select * from testdb.t1; +------+ | id | +------+ | 1 | | 2 | | 4 | | 6 | +------+ 4 rows in set (0.00 sec)
发现数据并没有被同步过来,我们来show一下slave
mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.203.135 Master_User: replmonitor Master_Port: 3306 Connect_Retry: 60 Master_Log_File: host2-bin.000002 Read_Master_Log_Pos: 1146 Relay_Log_File: host3-relay-bin.000003 Relay_Log_Pos: 648 Relay_Master_Log_File: host2-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1032 Last_Error: Could not execute Update_rows event on table testdb.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log host2-bin.000002, end_log_pos 875 Skip_Counter: 0 Exec_Master_Log_Pos: 660 Relay_Log_Space: 1595 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1032 Last_SQL_Error: Could not execute Update_rows event on table testdb.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log host2-bin.000002, end_log_pos 875 Replicate_Ignore_Server_Ids: Master_Server_Id: 2 Master_UUID: cd32a980-2a97-11e5-a344-000c2954ccde Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 150717 23:51:51 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: cd32a980-2a97-11e5-a344-000c2954ccde:1-3 Executed_Gtid_Set: cd32a980-2a97-11e5-a344-000c2954ccde:1 Auto_Position: 1 1 row in set (0.00 sec)
此时我们查看一下错误日志:
2015-07-17 23:51:51 30750 [ERROR] Slave SQL: Could not execute Update_rows event on table testdb.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log host2-bin.000002, end_log_pos 875, Error_code: 1032
2015-07-17 23:51:51 30750 [Warning] Slave: Can't find record in 't1' Error_code: 1032
2015-07-17 23:51:51 30750 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'host2-bin.000002' position 660
2015-07-17 23:51:51 30750 [Warning] Slave: Can't find record in 't1' Error_code: 1032
2015-07-17 23:51:51 30750 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'host2-bin.000002' position 660
提示我们在修改错误并重新启动slave,但关键在于如何修正错误,一般我们采用如下方法:
在从机slave上执行:
mysql> stop slave; Query OK, 0 rows affected (0.02 sec) mysql> set gtid_next='cd32a980-2a97-11e5-a344-000c2954ccde:2'; Query OK, 0 rows affected (0.00 sec) mysql> begin; Query OK, 0 rows affected (0.00 sec) mysql> commit; Query OK, 0 rows affected (0.00 sec) mysql> set gtid_next='AUTOMATIC'; Query OK, 0 rows affected (0.00 sec) mysql> start slave; Query OK, 0 rows affected (0.01 sec) 即跳过有错误的GTID编号
mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.203.135 Master_User: replmonitor Master_Port: 3306 Connect_Retry: 60 Master_Log_File: host2-bin.000002 Read_Master_Log_Pos: 1146 Relay_Log_File: host3-relay-bin.000005 Relay_Log_Pos: 688 Relay_Master_Log_File: host2-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1146 Relay_Log_Space: 1429 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2 Master_UUID: cd32a980-2a97-11e5-a344-000c2954ccde Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: cd32a980-2a97-11e5-a344-000c2954ccde:1-3 Executed_Gtid_Set: cd32a980-2a97-11e5-a344-000c2954ccde:1-3 Auto_Position: 1 1 row in set (0.00 sec) mysql> select * from testdb.t1; +------+ | id | +------+ | 1 | | 2 | | 4 | | 6 | | 8 | +------+ 5 rows in set (0.00 sec)
可以发现事务已经全部同步了,但现在数据却是不一致的。
这种适合在数据要求不高的情况下使用,如果后续还有其它数据操作,造成主从数据差异过大,就得用从主库的一个完整备份过来,再从某点开始做replication。