MySQL 5.7 基于复制线程SQL_Thread加快恢复的尝试 - Rayment

公告

1. MySQL 数据恢复常用办法

MySQL恢复的方法一般有三种：

1. 官方推荐的基于全备+binlog ，通常做法是先恢复最近一次的全备，然后通过mysqlbiinlog --start-position --stop-position binlog.000xxx | mysql -uroot -p xxx -S database 恢复到目标数据库做恢复

2. 基于主从同步恢复数据，通常做法是先恢复最近一次的全备，然后恢复后的实例做slave 挂载到现有的master 上面，通过 start slave sql_thread until master_log_pos 恢复到故障前的一个pos。

现在尝试第三种恢复方式，通过原来主库上面的binlog 把数据都恢复到slave 上。

处理思路：

因为relaylog和binlog本质实际上是一样的，所以是否可以利用MySQL自身的sql_thread来增量binlog

1）重新初始化一个实例，恢复全量备份文件。
2）找到第一个binlog文件的position，和剩下所有的binlog。
3）将binlog伪装成relaylog，通过sql thread增量恢复。

应用场景：

1. 最近的一次全备离故障位置比较远，通过上面两种方式的恢复时间太慢

2. 双主keepalived的集群，由于keepalived没有像MHA 那样有日志补全机制，出故障是有可能会有数据丢失的，万一同步有严重的复制延时出现故障切换到slave，这样数据就不一致，需要做日志补全

2. 实验步骤

1. 建立基于主从同步（这里实验基于传统的pos，其实GTID 也一样可行）

M1 ：

root@localhost:mysql3307.sock [(none)]>select * from restore.t1;
+----+------+
| id | c1   |
+----+------+
|  1 | 1    |
|  2 | 3    |
|  3 | 2    |
|  4 | 3    |
|  5 | 6    |
|  6 | 7    |
|  7 | 9    |
| 10 | NULL |
| 11 | 10   |
+----+------+
9 rows in set (0.00 sec)

　M2：（slave）

root@localhost:mysql3307.sock [(none)]>select * from restore.t1;
+----+------+
| id | c1   |
+----+------+
|  1 | 1    |
|  2 | 3    |
|  3 | 2    |
|  4 | 3    |
|  5 | 6    |
|  6 | 7    |
|  7 | 9    |
| 10 | NULL |
| 11 | 10   |
+----+------+
9 rows in set (0.00 sec)

root@localhost:mysql3307.sock [restore]>show slave status\G	
*************************** 1. row ***************************	
               Slave_IO_State: Waiting for master to send event	
                  Master_Host: m1	
                  Master_User: repl	
                  Master_Port: 3307	
                Connect_Retry: 60	
              Master_Log_File: 3307-binlog.000002	
          Read_Master_Log_Pos: 154	
               Relay_Log_File: M2-relay-bin.000004	
                Relay_Log_Pos: 371	
        Relay_Master_Log_File: 3307-binlog.000002	
             Slave_IO_Running: Yes	
            Slave_SQL_Running: Yes	
              Replicate_Do_DB: 	
          Replicate_Ignore_DB: 	
           Replicate_Do_Table: 	
       Replicate_Ignore_Table: 	
      Replicate_Wild_Do_Table: 	
  Replicate_Wild_Ignore_Table: 	
                   Last_Errno: 0	
                   Last_Error: 	
                 Skip_Counter: 0	
          Exec_Master_Log_Pos: 154	
              Relay_Log_Space: 624	
              Until_Condition: None	
               Until_Log_File: 	
                Until_Log_Pos: 0	
           Master_SSL_Allowed: No	
           Master_SSL_CA_File: 	
           Master_SSL_CA_Path: 	
              Master_SSL_Cert: 	
            Master_SSL_Cipher: 	
               Master_SSL_Key: 	
        Seconds_Behind_Master: 0	
Master_SSL_Verify_Server_Cert: No	
                Last_IO_Errno: 0	
                Last_IO_Error: 	
               Last_SQL_Errno: 0	
               Last_SQL_Error: 	
  Replicate_Ignore_Server_Ids: 	
             Master_Server_Id: 13307	
                  Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3	
             Master_Info_File: /data/mysql/3307/data/master.info	
                    SQL_Delay: 0	
          SQL_Remaining_Delay: NULL	
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates	
           Master_Retry_Count: 86400	
                  Master_Bind: 	
      Last_IO_Error_Timestamp: 	
     Last_SQL_Error_Timestamp: 	
               Master_SSL_Crl: 	
           Master_SSL_Crlpath: 	
           Retrieved_Gtid_Set: 	
            Executed_Gtid_Set: 	
                Auto_Position: 0	
         Replicate_Rewrite_DB: 	
                 Channel_Name: 	
           Master_TLS_Version: 	
1 row in set (0.00 sec)

　记录此时slave 的 relay-log 信息

[root@M2 data]# more M2-relay-bin.index 
./M2-relay-bin.000003
./M2-relay-bin.000004

[root@M2 data]# more relay-log.info 
7
./M2-relay-bin.000004
371
3307-binlog.000002
154
0
0
1

　2. 使用sysbench 模拟数据不同步

[root@M1 logs]# mysqladmin create sbtest

[root@M1 sysbench]# sysbench --db-driver=mysql --mysql-host=m1 --mysql-port=3307 --mysql-user=sbtest --mysql-password='sbtest' /usr/share/sysbench/oltp_common.lua --tables=4 --table-size=100000 --threads=2 --time=60 --report-interval=10 prepare

　　在主库导入数据的时候在slave端停止同步，制造数据不一致

root@localhost:mysql3307.sock [mysql]>stop slave

　3. 等sysbench执行完，查看主库的数据和slave 的数据

主库：

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest1;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest2;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

　　slave 端：

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4;
+----------+
| count(1) |
+----------+
|    67550 |
+----------+
1 row in set (0.06 sec)

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3;
+----------+
| count(1) |
+----------+
|    70252 |
+----------+
1 row in set (0.04 sec)

　　可以看到主从不同步。

4. 此时查看slave 的status：

root@localhost:mysql3307.sock [(none)]>show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: m1
                  Master_User: repl
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: 3307-binlog.000002
          Read_Master_Log_Pos: 76364214
               Relay_Log_File: M2-relay-bin.000004
                Relay_Log_Pos: 64490301
        Relay_Master_Log_File: 3307-binlog.000002
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 64490084
              Relay_Log_Space: 76364861
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3
             Master_Info_File: /data/mysql/3307/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

　由于本地的relay log 没有执行完毕，为了保证实验准确性，我们先让本地的relaylog 执行完 , start slave sql_thread

再次检查：

*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: m1
                  Master_User: repl
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: 3307-binlog.000002
          Read_Master_Log_Pos: 76364214
               Relay_Log_File: M2-relay-bin.000005
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 3307-binlog.000002
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 76364214
              Relay_Log_Space: 154
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3
             Master_Info_File: /data/mysql/3307/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

　　本地relaylog 已经全部执行完毕，此时记录最新的relay log 信息：

[root@M2 data]# more relay-log.info 
7
./M2-relay-bin.000005
4
3307-binlog.000002
76364214
0
0
1

0
0
1

　　上面这个信息很重要，说明了从库执行到主库的000002 的binlog的76364214 这个位置，我们下面将主库的binlog 拷贝过来模拟relaylog，并从这个位置开始恢复

5. 拷贝binlog 到目标端，并模拟成relay log

拷贝前先关闭从库，并修改cnf (skip-slave-start)让slave 不会重启后自动开始复制

[root@M2 data]# ll
total 185248
-rw-r----- 1 root  root       461 Oct 24 17:14 3307-binlog.000001
-rw-r----- 1 root  root  76364609 Oct 24 17:14 3307-binlog.000002
-rw-r----- 1 root  root       203 Oct 24 17:14 3307-binlog.000003
-rw-r----- 1 root  root       419 Oct 24 17:14 3307-binlog.000004
-rw-r----- 1 root  root       164 Oct 24 17:14 3307-binlog.index
-rw-r----- 1 mysql mysql       56 Oct 24 15:08 auto.cnf
-rw-r----- 1 mysql mysql     4720 Oct 24 17:14 ib_buffer_pool
-rw-r----- 1 mysql mysql 12582912 Oct 24 17:14 ibdata1
-rw-r----- 1 mysql mysql 50331648 Oct 24 17:14 ib_logfile0
-rw-r----- 1 mysql mysql 50331648 Oct 24 17:11 ib_logfile1
-rw-r----- 1 mysql mysql      177 Oct 24 17:14 M2-relay-bin.000005
-rw-r----- 1 mysql mysql       22 Oct 24 17:11 M2-relay-bin.index
-rw-r----- 1 mysql mysql      122 Oct 24 17:14 master.info
drwxr-x--- 2 mysql mysql     4096 Oct 24 15:07 mysql
-rw------- 1 root  root         0 Oct 24 15:08 nohup.out
drwxr-x--- 2 mysql mysql     4096 Oct 24 15:07 performance_schema
-rw-r----- 1 mysql mysql       68 Oct 24 17:14 relay-log.info
drwxr-x--- 2 mysql mysql     4096 Oct 24 15:07 restore
drwxr-x--- 2 mysql mysql     4096 Oct 24 16:47 sbtest
drwxr-x--- 2 mysql mysql    12288 Oct 24 15:07 sys
-rw-r----- 1 mysql mysql       24 Oct 24 15:07 xtrabackup_binlog_pos_innodb
-rw-r----- 1 mysql mysql      577 Oct 24 15:07 xtrabackup_info

　改名为relay log

[root@M2 data]# cp 3307-binlog.000001 relay.000001
[root@M2 data]# cp 3307-binlog.000002 relay.000002
[root@M2 data]# cp 3307-binlog.000003 relay.000003
[root@M2 data]# cp 3307-binlog.000004 relay.000004

改权限属性

[root@M2 data]# chown mysql.mysql -R *

　修改relay log index 文件，让系统能识别

[root@M2 data]# cat M2-relay-bin.index	
	./relay.000001
	./relay.000002
	./relay.000003
	./relay.000004

　修改relay log info 文件，告诉系统从哪个位置开始复制

[root@M2 data]# cat relay-log.info	
	7
	./relay.000002
	76364214
	3307-binlog.000002
	76364214
	0
	0
	1
	
	0
	0
	1

　最后开起sql_thread 进程开始快速恢复

start slave sql_thread

　6. 检查数据是否一致

slave:

oot@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

　可以看到slave 已经把缺失的数据都全部恢复了。

posted on 2017-10-24 18:06 Rayment 阅读(2998) 评论(1) 收藏举报

刷新页面返回顶部