诡异的复制错误
最近在线上遇到一个非常诡异的复制错误,困惑了好几天,今天终于知道原因了。打算分享给大家。当时复制报错信息类似如下:
151125 13:42:46 [ERROR] Slave SQL: Query caused different errors on master and slave. Error on master: message (format)='%s %s does not exist' error code=1305 ; Error on slave: actual message='Duplicate entry '2-test' for key 'PRIMARY'', error code=1062. Default database: 'test'. Query: 'insert into t1 ( id,name,monery) values (1,'test',100)', Error_code: 0
上面的错误是我彻底明白以后模拟出来的。下面再次进行重现。
在test库有2张表,t1,t2,还有触发器,存储过程。
t1,t2表结构如下:
mysql> show create table t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `id` int(11) DEFAULT NULL, `name` char(20) NOT NULL DEFAULT '', `uid` int(11) NOT NULL AUTO_INCREMENT, `monery` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`uid`,`name`) ) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=utf8 1 row in set (0.00 sec) mysql> show create table t2\G *************************** 1. row *************************** Table: t2 Create Table: CREATE TABLE `t2` ( `id` int(11) DEFAULT NULL, `name` char(20) NOT NULL DEFAULT '', `uid` int(11) NOT NULL AUTO_INCREMENT, `monery` int(11) NOT NULL DEFAULT '0', `age` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`uid`,`name`) ) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 1 row in set (0.00 sec)
存储过程如下:
\d // CREATE PROCEDURE `t1_t2`(OPERATION VARCHAR(20),id_NEW INT,name_NEW char(20),uid_NEW int,monery_NEW int,age_NEW int) BEGIN IF OPERATION = 'insert' THEN INSERT INTO t2(id,name,uid,monery,age) VALUES (id_NEW,name_NEW,uid_NEW,monery_NEW,age_NEW); END IF; END //
触发器如下:
\d // CREATE TRIGGER `insert_t1` AFTER INSERT ON `t1` FOR EACH ROW begin CALL t1_t2('insert', NEW.id, NEW.name,NEW.uid, NEW.monery,0); end; //
其实就是在对t1表进行插入以后,调用存储过程操作t2。
在主库对t1表插入1条记录:
mysql> insert into t1 ( id,name,monery) values (1,'abc',100); Query OK, 1 row affected (0.01 sec) mysql> select * from t1; +------+------+-----+--------+ | id | name | uid | monery | +------+------+-----+--------+ | 1 | abc | 1 | 100 | +------+------+-----+--------+ 1 row in set (0.00 sec) mysql> select * from t2; +------+------+-----+--------+-----+ | id | name | uid | monery | age | +------+------+-----+--------+-----+ | 1 | abc | 1 | 100 | 0 | +------+------+-----+--------+-----+ 1 row in set (0.00 sec) mysql>
现在查看从库,肯定数据是一致的。
mysql> select * from t1; +------+------+-----+--------+ | id | name | uid | monery | +------+------+-----+--------+ | 1 | abc | 1 | 100 | +------+------+-----+--------+ 1 row in set (0.00 sec) mysql> select * from t2; +------+------+-----+--------+-----+ | id | name | uid | monery | age | +------+------+-----+--------+-----+ | 1 | abc | 1 | 100 | 0 | +------+------+-----+--------+-----+ 1 row in set (0.00 sec)
关键时刻到了。首先在从库的t2表手动插入一条记录:
mysql> insert into t2 ( id,name,monery,age) values (1,'test',100,0); Query OK, 1 row affected (0.00 sec) mysql> select * from t2; +------+------+-----+--------+-----+ | id | name | uid | monery | age | +------+------+-----+--------+-----+ | 1 | abc | 1 | 100 | 0 | | 1 | test | 2 | 100 | 0 | +------+------+-----+--------+-----+
可以看到现在t2表多了1条记录,且uid等于2,从表结构可以看到name,uid是联合主键。
下面在主库删掉存储过程,再插入1条记录。
mysql> set sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) mysql> drop PROCEDURE t1_t2; Query OK, 0 rows affected (0.00 sec)
mysql> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into t1 ( id,name,monery) values (1,'test',100); ERROR 1305 (42000): PROCEDURE test.t1_t2 does not exist mysql>
可以看到插入提示存储过程不存在,这个也是复制报错的关键。这个错误会在binlog里面记录错误编号,错误编号是1305。那我们看看t1表是否更新成功。
mysql> select * from t1; +------+------+-----+--------+ | id | name | uid | monery | +------+------+-----+--------+ | 1 | abc | 1 | 100 | | 1 | test | 2 | 100 | +------+------+-----+--------+ 2 rows in set (0.00 sec)
可以看见t1表已经更新成功了。
我们再来解析主库的binlog看看。
#151125 15:38:53 server id 25173 end_log_pos 4560 Intvar SET INSERT_ID=2/*!*/; # at 4560 #151125 15:38:53 server id 25173 end_log_pos 4677 Query thread_id=1310 exec_time=0 error_code=1305 SET TIMESTAMP=1448437133/*!*/; insert into t1 ( id,name,monery) values (1,'test',100) /*!*/;
可以看见提示错误,error_code=1305,
我们现在看看复制状态:
ysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.69.25.173 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000340 Read_Master_Log_Pos: 4746 Relay_Log_File: relaylog.000048 Relay_Log_Pos: 579 Relay_Master_Log_File: mysql-bin.000340 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Query caused different errors on master and slave. Error on master: message (format)='%s %s does not exist' error code=1305 ; Error on slave: actual message='no error', error code=0. Default database: 'test'. Query: 'insert into t1 ( id,name,monery) values (1,'test',100)' Skip_Counter: 0 Exec_Master_Log_Pos: 4464 Relay_Log_Space: 1010 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Query caused different errors on master and slave. Error on master: message (format)='%s %s does not exist' error code=1305 ; Error on slave: actual message='no error', error code=0. Default database: 'test'. Query: 'insert into t1 ( id,name,monery) values (1,'test',100)' Replicate_Ignore_Server_Ids: Master_Server_Id: 25173 1 row in set (0.00 sec)
可以看见已经提示错误了,我们再看看错误日志:
151125 13:42:46 [ERROR] Slave SQL: Query caused different errors on master and slave. Error on master: message (format)='%s %s does not exist' error code=1305 ; Error on slave: actual message='Duplicate entry '2-test' for key 'PRIMARY'', error code=1062. Default database: 'test'. Query: 'insert into t1 ( id,name,monery) values (1,'test',100)', Error_code: 0
再看看主库的t1表是否更新成功:
mysql> select * from t1; +------+------+-----+--------+ | id | name | uid | monery | +------+------+-----+--------+ | 1 | abc | 1 | 100 | | 1 | test | 2 | 100 | +------+------+-----+--------+ 2 rows in set (0.00 sec)
可以看见t1是更新成功了,那么t2表是否更新成功呢?当然无法更新,别忘记有主键约束,这也就是为什么从库会有1062错误。
mysql> select * from t2; +------+------+-----+--------+-----+ | id | name | uid | monery | age | +------+------+-----+--------+-----+ | 1 | abc | 1 | 100 | 0 | | 1 | test | 2 | 100 | 0 | +------+------+-----+--------+-----+
那么最终的原因就是如下:
在主库发生了一个错误,在从库发生了一个错误。主库发生的错误是由于存储过程不存在,但是更新主表成功了,错误编号便记录在了binlog,从库发生错误是由于主键冲突。当遇到2个错误时,日志展现出来的错误最终就是如下:
151125 13:42:46 [ERROR] Slave SQL: Query caused different errors on master and slave. Error on master: message (format)='%s %s does not exist' error code=1305 ; Error on slave: actual message='Duplicate entry '2-test' for key 'PRIMARY'', error code=1062. Default database: 'test'. Query: 'insert into t1 ( id,name,monery) values (1,'test',100)', Error_code: 0
总结:
触发器和存储过程是双刃剑,需要谨慎使用。