1. MySQL 数据恢复常用办法
MySQL恢复的方法一般有三种:
1. 官方推荐的基于全备+binlog , 通常做法是先恢复最近一次的全备,然后通过mysqlbiinlog --start-position --stop-position binlog.000xxx | mysql -uroot -p xxx -S database 恢复到目标数据库做恢复
2. 基于主从同步恢复数据,通常做法是先恢复最近一次的全备,然后恢复后的实例做slave 挂载到现有的master 上面,通过 start slave sql_thread until master_log_pos 恢复到故障前的一个pos。
现在尝试第三种恢复方式, 通过原来主库上面的binlog 把数据都恢复到slave 上。
处理思路:
因为relaylog和binlog本质实际上是一样的,所以是否可以利用MySQL自身的sql_thread来增量binlog
1)重新初始化一个实例,恢复全量备份文件。
2)找到第一个binlog文件的position,和剩下所有的binlog。
3)将binlog伪装成relaylog,通过sql thread增量恢复。
应用场景:
1. 最近的一次全备离故障位置比较远,通过上面两种方式的恢复时间太慢
2. 双主keepalived的集群,由于keepalived没有像MHA 那样有日志补全机制,出故障是有可能会有数据丢失的,万一同步有严重的复制延时出现故障切换到slave,这样数据就不一致,需要做日志补全
2. 实验步骤
1. 建立基于主从同步(这里实验基于传统的pos, 其实GTID 也一样可行)
M1 :
root@localhost:mysql3307.sock [(none)]>select * from restore.t1; +----+------+ | id | c1 | +----+------+ | 1 | 1 | | 2 | 3 | | 3 | 2 | | 4 | 3 | | 5 | 6 | | 6 | 7 | | 7 | 9 | | 10 | NULL | | 11 | 10 | +----+------+ 9 rows in set (0.00 sec)
M2:(slave)
root@localhost:mysql3307.sock [(none)]>select * from restore.t1; +----+------+ | id | c1 | +----+------+ | 1 | 1 | | 2 | 3 | | 3 | 2 | | 4 | 3 | | 5 | 6 | | 6 | 7 | | 7 | 9 | | 10 | NULL | | 11 | 10 | +----+------+ 9 rows in set (0.00 sec)
root@localhost:mysql3307.sock [restore]>show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: m1 Master_User: repl Master_Port: 3307 Connect_Retry: 60 Master_Log_File: 3307-binlog.000002 Read_Master_Log_Pos: 154 Relay_Log_File: M2-relay-bin.000004 Relay_Log_Pos: 371 Relay_Master_Log_File: 3307-binlog.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 154 Relay_Log_Space: 624 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 13307 Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3 Master_Info_File: /data/mysql/3307/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)
记录此时slave 的 relay-log 信息
[root@M2 data]# more M2-relay-bin.index ./M2-relay-bin.000003 ./M2-relay-bin.000004 [root@M2 data]# more relay-log.info 7 ./M2-relay-bin.000004 371 3307-binlog.000002 154 0 0 1
2. 使用sysbench 模拟数据不同步
[root@M1 logs]# mysqladmin create sbtest
[root@M1 sysbench]# sysbench --db-driver=mysql --mysql-host=m1 --mysql-port=3307 --mysql-user=sbtest --mysql-password='sbtest' /usr/share/sysbench/oltp_common.lua --tables=4 --table-size=100000 --threads=2 --time=60 --report-interval=10 prepare
在主库导入数据的时候在slave端停止同步,制造数据不一致
root@localhost:mysql3307.sock [mysql]>stop slave
3. 等sysbench执行完,查看主库的数据和slave 的数据
主库:
root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest1; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest2; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec)
slave 端:
root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4; +----------+ | count(1) | +----------+ | 67550 | +----------+ 1 row in set (0.06 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3; +----------+ | count(1) | +----------+ | 70252 | +----------+ 1 row in set (0.04 sec)
可以看到主从不同步。
4. 此时查看slave 的status:
root@localhost:mysql3307.sock [(none)]>show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: m1 Master_User: repl Master_Port: 3307 Connect_Retry: 60 Master_Log_File: 3307-binlog.000002 Read_Master_Log_Pos: 76364214 Relay_Log_File: M2-relay-bin.000004 Relay_Log_Pos: 64490301 Relay_Master_Log_File: 3307-binlog.000002 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 64490084 Relay_Log_Space: 76364861 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 0 Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3 Master_Info_File: /data/mysql/3307/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)
由于本地的relay log 没有执行完毕,为了保证实验准确性,我们先让本地的relaylog 执行完 , start slave sql_thread
再次检查:
*************************** 1. row *************************** Slave_IO_State: Master_Host: m1 Master_User: repl Master_Port: 3307 Connect_Retry: 60 Master_Log_File: 3307-binlog.000002 Read_Master_Log_Pos: 76364214 Relay_Log_File: M2-relay-bin.000005 Relay_Log_Pos: 4 Relay_Master_Log_File: 3307-binlog.000002 Slave_IO_Running: No Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 76364214 Relay_Log_Space: 154 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 0 Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3 Master_Info_File: /data/mysql/3307/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)
本地relaylog 已经全部执行完毕,此时记录最新的relay log 信息:
[root@M2 data]# more relay-log.info
7
./M2-relay-bin.000005
4
3307-binlog.000002
76364214
0
0
1
0
0
1
上面这个信息很重要,说明了从库执行到主库的000002 的binlog的76364214 这个位置,我们下面将主库的binlog 拷贝过来模拟relaylog, 并从这个位置开始恢复
5. 拷贝binlog 到目标端,并模拟成relay log
拷贝前先关闭从库,并修改cnf (skip-slave-start)让slave 不会重启后自动开始复制
[root@M2 data]# ll
total 185248
-rw-r----- 1 root root 461 Oct 24 17:14 3307-binlog.000001
-rw-r----- 1 root root 76364609 Oct 24 17:14 3307-binlog.000002
-rw-r----- 1 root root 203 Oct 24 17:14 3307-binlog.000003
-rw-r----- 1 root root 419 Oct 24 17:14 3307-binlog.000004
-rw-r----- 1 root root 164 Oct 24 17:14 3307-binlog.index
-rw-r----- 1 mysql mysql 56 Oct 24 15:08 auto.cnf
-rw-r----- 1 mysql mysql 4720 Oct 24 17:14 ib_buffer_pool
-rw-r----- 1 mysql mysql 12582912 Oct 24 17:14 ibdata1
-rw-r----- 1 mysql mysql 50331648 Oct 24 17:14 ib_logfile0
-rw-r----- 1 mysql mysql 50331648 Oct 24 17:11 ib_logfile1
-rw-r----- 1 mysql mysql 177 Oct 24 17:14 M2-relay-bin.000005
-rw-r----- 1 mysql mysql 22 Oct 24 17:11 M2-relay-bin.index
-rw-r----- 1 mysql mysql 122 Oct 24 17:14 master.info
drwxr-x--- 2 mysql mysql 4096 Oct 24 15:07 mysql
-rw------- 1 root root 0 Oct 24 15:08 nohup.out
drwxr-x--- 2 mysql mysql 4096 Oct 24 15:07 performance_schema
-rw-r----- 1 mysql mysql 68 Oct 24 17:14 relay-log.info
drwxr-x--- 2 mysql mysql 4096 Oct 24 15:07 restore
drwxr-x--- 2 mysql mysql 4096 Oct 24 16:47 sbtest
drwxr-x--- 2 mysql mysql 12288 Oct 24 15:07 sys
-rw-r----- 1 mysql mysql 24 Oct 24 15:07 xtrabackup_binlog_pos_innodb
-rw-r----- 1 mysql mysql 577 Oct 24 15:07 xtrabackup_info
改名为relay log
[root@M2 data]# cp 3307-binlog.000001 relay.000001 [root@M2 data]# cp 3307-binlog.000002 relay.000002 [root@M2 data]# cp 3307-binlog.000003 relay.000003 [root@M2 data]# cp 3307-binlog.000004 relay.000004
改权限属性
[root@M2 data]# chown mysql.mysql -R *
修改relay log index 文件,让系统能识别
[root@M2 data]# cat M2-relay-bin.index ./relay.000001 ./relay.000002 ./relay.000003 ./relay.000004
修改relay log info 文件,告诉系统从哪个位置开始复制
[root@M2 data]# cat relay-log.info 7 ./relay.000002 76364214 3307-binlog.000002 76364214 0 0 1 0 0 1
最后开起sql_thread 进程开始快速恢复
start slave sql_thread
6. 检查数据是否一致
slave:
oot@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec) root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3; +----------+ | count(1) | +----------+ | 100000 | +----------+ 1 row in set (0.05 sec)
可以看到slave 已经把缺失的数据都全部恢复了。