诡异的复制错误

最近在线上遇到一个非常诡异的复制错误,困惑了好几天,今天终于知道原因了。打算分享给大家。当时复制报错信息类似如下:

151125 13:42:46 [ERROR] Slave SQL: Query caused different errors on master and slave.     Error on master: message (format)='%s %s does not exist' error code=1305 ; Error on slave: actual message='Duplicate entry '2-test' for key 'PRIMARY'', error code=1062. Default database: 'test'. Query: 'insert into t1 ( id,name,monery) values (1,'test',100)', Error_code: 0

上面的错误是我彻底明白以后模拟出来的。下面再次进行重现。

在test库有2张表,t1,t2,还有触发器,存储过程。

t1,t2表结构如下:

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) DEFAULT NULL,
  `name` char(20) NOT NULL DEFAULT '',
  `uid` int(11) NOT NULL AUTO_INCREMENT,
  `monery` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`uid`,`name`)
) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int(11) DEFAULT NULL,
  `name` char(20) NOT NULL DEFAULT '',
  `uid` int(11) NOT NULL AUTO_INCREMENT,
  `monery` int(11) NOT NULL DEFAULT '0',
  `age` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`uid`,`name`)
) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

存储过程如下:

\d //
CREATE PROCEDURE `t1_t2`(OPERATION VARCHAR(20),id_NEW INT,name_NEW char(20),uid_NEW int,monery_NEW int,age_NEW int)
BEGIN

     
        IF OPERATION = 'insert' THEN
                INSERT INTO t2(id,name,uid,monery,age) VALUES (id_NEW,name_NEW,uid_NEW,monery_NEW,age_NEW);
    END IF;
    END
//

触发器如下:

\d //
CREATE TRIGGER `insert_t1` AFTER INSERT ON `t1` FOR EACH ROW begin
        CALL t1_t2('insert', NEW.id, NEW.name,NEW.uid, NEW.monery,0);
    end;
//

其实就是在对t1表进行插入以后,调用存储过程操作t2。
在主库对t1表插入1条记录:

mysql> insert into t1 ( id,name,monery) values (1,'abc',100);
Query OK, 1 row affected (0.01 sec)

mysql> select * from t1;
+------+------+-----+--------+
| id   | name | uid | monery |
+------+------+-----+--------+
|    1 | abc  |   1 |    100 |
+------+------+-----+--------+
1 row in set (0.00 sec)

mysql> select * from t2;
+------+------+-----+--------+-----+
| id   | name | uid | monery | age |
+------+------+-----+--------+-----+
|    1 | abc  |   1 |    100 |   0 |
+------+------+-----+--------+-----+
1 row in set (0.00 sec)

mysql> 

现在查看从库,肯定数据是一致的。

mysql> select * from t1;
+------+------+-----+--------+
| id   | name | uid | monery |
+------+------+-----+--------+
|    1 | abc  |   1 |    100 |
+------+------+-----+--------+
1 row in set (0.00 sec)

mysql> select * from t2;
+------+------+-----+--------+-----+
| id   | name | uid | monery | age |
+------+------+-----+--------+-----+
|    1 | abc  |   1 |    100 |   0 |
+------+------+-----+--------+-----+
1 row in set (0.00 sec)

关键时刻到了。首先在从库的t2表手动插入一条记录:

mysql> insert into t2 ( id,name,monery,age) values (1,'test',100,0);
Query OK, 1 row affected (0.00 sec)

mysql> select * from t2;
+------+------+-----+--------+-----+
| id   | name | uid | monery | age |
+------+------+-----+--------+-----+
|    1 | abc  |   1 |    100 |   0 |
|    1 | test |   2 |    100 |   0 |
+------+------+-----+--------+-----+

可以看到现在t2表多了1条记录,且uid等于2,从表结构可以看到name,uid是联合主键。
下面在主库删掉存储过程,再插入1条记录。

mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

mysql> drop PROCEDURE t1_t2;
Query OK, 0 rows affected (0.00 sec)

mysql> set sql_log_bin=1;

Query OK, 0 rows affected (0.00 sec)

mysql
> insert into t1 ( id,name,monery) values (1,'test',100); ERROR 1305 (42000): PROCEDURE test.t1_t2 does not exist mysql>

可以看到插入提示存储过程不存在,这个也是复制报错的关键。这个错误会在binlog里面记录错误编号,错误编号是1305。那我们看看t1表是否更新成功。

mysql> select * from t1;
+------+------+-----+--------+
| id   | name | uid | monery |
+------+------+-----+--------+
|    1 | abc  |   1 |    100 |
|    1 | test |   2 |    100 |
+------+------+-----+--------+
2 rows in set (0.00 sec)

可以看见t1表已经更新成功了。

我们再来解析主库的binlog看看。

#151125 15:38:53 server id 25173  end_log_pos 4560      Intvar
SET INSERT_ID=2/*!*/;
# at 4560
#151125 15:38:53 server id 25173  end_log_pos 4677      Query   thread_id=1310  exec_time=0     error_code=1305
SET TIMESTAMP=1448437133/*!*/;
insert into t1 ( id,name,monery) values (1,'test',100)
/*!*/;

可以看见提示错误,error_code=1305,
我们现在看看复制状态:

ysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.69.25.173
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000340
          Read_Master_Log_Pos: 4746
               Relay_Log_File: relaylog.000048
                Relay_Log_Pos: 579
        Relay_Master_Log_File: mysql-bin.000340
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: Query caused different errors on master and slave.     Error on master: message (format)='%s %s does not exist' error code=1305 ; Error on slave: actual message='no error', error code=0. Default database: 'test'. Query: 'insert into t1 ( id,name,monery) values (1,'test',100)'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 4464
              Relay_Log_Space: 1010
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: Query caused different errors on master and slave.     Error on master: message (format)='%s %s does not exist' error code=1305 ; Error on slave: actual message='no error', error code=0. Default database: 'test'. Query: 'insert into t1 ( id,name,monery) values (1,'test',100)'
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 25173
1 row in set (0.00 sec)

可以看见已经提示错误了,我们再看看错误日志:

151125 13:42:46 [ERROR] Slave SQL: Query caused different errors on master and slave.     Error on master: message (format)='%s %s does not exist' error code=1305 ; Error on slave: actual message='Duplicate entry '2-test' for key 'PRIMARY'', error code=1062. Default database: 'test'. Query: 'insert into t1 ( id,name,monery) values (1,'test',100)', Error_code: 0

再看看主库的t1表是否更新成功:

mysql> select * from t1;
+------+------+-----+--------+
| id   | name | uid | monery |
+------+------+-----+--------+
|    1 | abc  |   1 |    100 |
|    1 | test |   2 |    100 |
+------+------+-----+--------+
2 rows in set (0.00 sec)

可以看见t1是更新成功了,那么t2表是否更新成功呢?当然无法更新,别忘记有主键约束,这也就是为什么从库会有1062错误。

mysql> select * from t2;
+------+------+-----+--------+-----+
| id   | name | uid | monery | age |
+------+------+-----+--------+-----+
|    1 | abc  |   1 |    100 |   0 |
|    1 | test |   2 |    100 |   0 |
+------+------+-----+--------+-----+

 

那么最终的原因就是如下:

在主库发生了一个错误,在从库发生了一个错误。主库发生的错误是由于存储过程不存在,但是更新主表成功了,错误编号便记录在了binlog,从库发生错误是由于主键冲突。当遇到2个错误时,日志展现出来的错误最终就是如下:

151125 13:42:46 [ERROR] Slave SQL: Query caused different errors on master and slave.     Error on master: message (format)='%s %s does not exist' error code=1305 ; Error on slave: actual message='Duplicate entry '2-test' for key 'PRIMARY'', error code=1062. Default database: 'test'. Query: 'insert into t1 ( id,name,monery) values (1,'test',100)', Error_code: 0

 

总结:

触发器和存储过程是双刃剑,需要谨慎使用。

posted @ 2015-11-25 15:48  yayun  阅读(1636)  评论(3编辑  收藏  举报