主从复制中定位回放慢涉及的表
主从复制中定位回放慢涉及的表
一、前提
世界千奇百怪,每个人都有自己独立的思想,有些事情即使你附耳告知,也可能如风般吹过,进而消逝,为了性能为了不延迟,表要加索引嘛,然而在某业务场景,业务表数千张,无索引的表几百张,这些表都是上百万的数据。
二、现象
在 GreatSQL 主从架构中,某天在系统资源充足的情况下,主从突然延迟,而且持续增长,我们通过SHOW PROCESSLIST
和 SHOW SLAVE STATUS
观察是由于回放 DELETE
事务造成的,但是 GTID``
在不断地增长,不过增长的非常缓慢,但是平时的时候是没有如此缓慢的,我们该如何快速的定位这些回放缓慢的 GTID
的涉及表呢,接下来就在测试环境演示下如何定位。
三、模拟测试
1、主节点创建库
greatsql> CREATE DATABASE fcmark; Query OK, 1 row affected (0.03 sec)
2、准备数据
$ sysbench --db-driver=mysql --mysql-host=192.168.139.230 --mysql-port=3307 --mysql-user=greatsql --mysql-password=greatsql@2024 --mysql-db=fcmark --table_size=10000000 --tables=5 oltp_write_only prepare sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3) Creating table 'sbtest1'... Inserting 10000000 records into 'sbtest1' Creating a secondary index on 'sbtest1'... Creating table 'sbtest2'... Inserting 10000000 records into 'sbtest2' Creating a secondary index on 'sbtest2'... Creating table 'sbtest3'... Inserting 10000000 records into 'sbtest3' Creating a secondary index on 'sbtest3'... Creating table 'sbtest4'... Inserting 10000000 records into 'sbtest4' Creating a secondary index on 'sbtest4'... Creating table 'sbtest5'... Inserting 10000000 records into 'sbtest5' Creating a secondary index on 'sbtest5'...
3、主节点修改表结构,将表修改为无索引表
greatsql> ALTER TABLE sbtest1 MODIFY id int not null; Query OK, 10000000 rows affected (3 min 2.62 sec) Records: 10000000 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest1 DROP key `k_1`; Query OK, 0 rows affected (0.15 sec) Records: 0 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest1 DROP PRIMARY KEY; Query OK, 10000000 rows affected (1 min 50.19 sec) Records: 10000000 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest2 MODIFY id int not null; Query OK, 10000000 rows affected (3 min 2.62 sec) Records: 10000000 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest2 DROP key `k_2`; Query OK, 0 rows affected (0.15 sec) Records: 0 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest2 DROP PRIMARY KEY; Query OK, 10000000 rows affected (1 min 50.19 sec) Records: 10000000 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest3 MODIFY id int not null; Query OK, 10000000 rows affected (3 min 2.72 sec) Records: 10000000 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest3 DROP key `k_3`; Query OK, 0 rows affected (0.15 sec) Records: 0 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest3 DROP PRIMARY KEY; Query OK, 10000000 rows affected (1 min 52.19 sec) Records: 10000000 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest4 MODIFY id int not null; Query OK, 10000000 rows affected (3 min 3.64 sec) Records: 10000000 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest4 DROP key `k_4`; Query OK, 0 rows affected (0.15 sec) Records: 0 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest4 DROP PRIMARY KEY; Query OK, 10000000 rows affected (1 min 53.19 sec) Records: 10000000 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest5 MODIFY id int not null; Query OK, 10000000 rows affected (3 min 5.62 sec) Records: 10000000 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest5 DROP key `k_5`; Query OK, 0 rows affected (0.15 sec) Records: 0 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest5 DROP PRIMARY KEY; Query OK, 10000000 rows affected (1 min 51 sec) Records: 10000000 Duplicates: 0 Warnings: 0
4、主库操作数据模拟延迟
greatsql> DELETE FROM sbtest1 WHERE k<200; Query OK, 214 rows affected (41.14 sec) greatsql> DELETE FROM sbtest2 WHERE k<200; Query OK, 193 rows affected (41.58 sec) greatsql> DELETE FROM sbtest3 WHERE k<200; Query OK, 177 rows affected (40.00 sec) greatsql> DELETE FROM sbtest4 WHERE k<200; Query OK, 184 rows affected (39.38 sec) greatsql> DELETE FROM sbtest5 WHERE k<200; Query OK, 212 rows affected (37.56 sec) greatsql> DELETE FROM sbtest1 WHERE k<400; Query OK, 182 rows affected (39.09 sec) greatsql> DELETE FROM sbtest2 WHERE k<400; Query OK, 193 rows affected (38.21 sec) greatsql> DELETE FROM sbtest3 WHERE k<400; Query OK, 215 rows affected (34.45 sec) greatsql> DELETE FROM sbtest4 WHERE k<400; Query OK, 219 rows affected (37.45 sec) greatsql> DELETE FROM sbtest5 WHERE k<400; Query OK, 224 rows affected (34.63 sec) greatsql> DELETE FROM sbtest1 WHERE k<600; Query OK, 185 rows affected (34.11 sec) greatsql> DELETE FROM sbtest2 WHERE k<600; Query OK, 245 rows affected (35.99 sec) greatsql> DELETE FROM sbtest3 WHERE k<600; Query OK, 173 rows affected (36.08 sec) greatsql> DELETE FROM sbtest4 WHERE k<600; Query OK, 230 rows affected (36.18 sec) greatsql> DELETE FROM sbtest5 WHERE k<600; Query OK, 186 rows affected (38.00 sec)
等待执行完
5、观察从库:查看延迟及GTID
通过 SHOW PROCESSLIST
观察到回放数据的时候是Applying batch of row changes (delete)
通过SHOW SLAVE STATUS
观察到:数据库一直在应用133171 GTID
,主库的binlog位置对应的位点是binlog.000064
和714314051
注意:
- Relay_Master_Log_File:SQL线程执行的最新事件对应在主库上的binlog日志文件名称
- Exec_Master_Log_Pos:SQL县城已经读取和执行过程的中继日志,对应主库binlog日志文件的位点
- Retrieved_Gtid_Set:从库已经接受到的GTID集合(I/O线程)
- Executed_Gtid_Set:已经被写进binlog的GTID集合(SQL线程)
greatsql> SHOW PROCESSLIST; +--------+-----------------+----------------------+--------+---------+---------+---------------------------------------------+------------------+------------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined | +--------+-----------------+----------------------+--------+---------+---------+---------------------------------------------+------------------+------------+-----------+---------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 2356467 | Waiting on empty queue | NULL | 2356467469 | 0 | 0 | | 17 | system user | connecting host | NULL | Connect | 2356460 | Waiting for source to send event | NULL | 2356459985 | 0 | 0 | | 18 | system user | | NULL | Query | 99 | Waiting for dependent transaction to commit | NULL | 99081 | 0 | 0 | | 19 | system user | | fcmark | Query | 697 | Applying batch of row changes (delete) | NULL | 32977 | 0 | 0 | | 20 | system user | | NULL | Query | 3244 | Waiting for an event from Coordinator | NULL | 3242579 | 0 | 0 | | 21 | system user | | NULL | Connect | 2356460 | Waiting for an event from Coordinator | NULL | 2356459981 | 0 | 0 | | 22 | system user | | NULL | Connect | 2356460 | Waiting for an event from Coordinator | NULL | 2356459980 | 0 | 0 | | 540493 | greatsql | 172.17.136.93:34298 | NULL | Sleep | 83 | | NULL | 83365 | 0 | 0 | | 540593 | greatsql | 172.17.139.230:51514 | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 | | 540660 | greatsql | 172.17.136.93:46212 | NULL | Sleep | 383 | | NULL | 383304 | 0 | 0 | | 540730 | greatsql | 172.17.136.93:51128 | NULL | Sleep | 143 | | NULL | 143319 | 0 | 0 | | 540758 | greatsql | 172.17.136.93:53112 | NULL | Sleep | 23 | | NULL | 23205 | 0 | 0 | | 540763 | greatsql | 172.17.139.230:52204 | NULL | Sleep | 1 | | NULL | 1246 | 0 | 0 | +--------+-----------------+----------------------+--------+---------+---------+---------------------------------------------+------------------+------------+-----------+---------------+ 13 rows in set (0.00 sec) greatsql> SHOW SLAVE STATUS \G *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: 172.17.139.230 Master_User: greatsql Master_Port: 3307 Connect_Retry: 60 Master_Log_File: binlog.000064 Read_Master_Log_Pos: 714784459 Relay_Log_File: relaylog.000189 Relay_Log_Pos: 714314261 Relay_Master_Log_File: binlog.000064 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 714314051 Relay_Log_Space: 714784956 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 700 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1000403307 Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for dependent transaction to commit Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-133182 Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-133170 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0.00 sec) greatsql> SHOW SLAVE STATUS \G *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: 172.17.139.230 Master_User: greatsql Master_Port: 3307 Connect_Retry: 60 Master_Log_File: binlog.000064 Read_Master_Log_Pos: 714784459 Relay_Log_File: relaylog.000189 Relay_Log_Pos: 714314261 Relay_Master_Log_File: binlog.000064 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 714314051 Relay_Log_Space: 714784956 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 704 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1000403307 Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for dependent transaction to commit Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-133182 Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-133170 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0.00 sec)
6、分析数据
主节点操作,根据从节点观察的信息,在主节点上观察 binlog 信息,回放内容是关于 fcmark.sbtest4``
表的多个DELETE
事件,观察该表结构是无索引,并且数据量接近 1千万
greatsql> SHOW BINLOG EVENTS IN 'binlog.000064' FROM 714314051 LIMIT 50; +---------------+-----------+-------------+------------+-------------+------------------------------------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +---------------+-----------+-------------+------------+-------------+------------------------------------------------------------------------+ | binlog.000064 | 714314051 | Gtid | 1000403307 | 714314130 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133171' | | binlog.000064 | 714314130 | Query | 1000403307 | 714314207 | BEGIN | | binlog.000064 | 714314207 | Table_map | 1000403307 | 714314274 | table_id: 207 (fcmark.sbtest4) | | binlog.000064 | 714314274 | Delete_rows | 1000403307 | 714322479 | table_id: 207 | | binlog.000064 | 714322479 | Delete_rows | 1000403307 | 714330684 | table_id: 207 | | binlog.000064 | 714330684 | Delete_rows | 1000403307 | 714338889 | table_id: 207 | | binlog.000064 | 714338889 | Delete_rows | 1000403307 | 714347094 | table_id: 207 | | binlog.000064 | 714347094 | Delete_rows | 1000403307 | 714349409 | table_id: 207 flags: STMT_END_F | | binlog.000064 | 714349409 | Xid | 1000403307 | 714349440 | COMMIT /* xid=6241839 */ | | binlog.000064 | 714349440 | Gtid | 1000403307 | 714349519 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133172' | | binlog.000064 | 714349519 | Query | 1000403307 | 714349596 | BEGIN | | binlog.000064 | 714349596 | Table_map | 1000403307 | 714349663 | table_id: 210 (fcmark.sbtest5) | | binlog.000064 | 714349663 | Delete_rows | 1000403307 | 714357868 | table_id: 210 | | binlog.000064 | 714357868 | Delete_rows | 1000403307 | 714366073 | table_id: 210 | | binlog.000064 | 714366073 | Delete_rows | 1000403307 | 714374278 | table_id: 210 | | binlog.000064 | 714374278 | Delete_rows | 1000403307 | 714382483 | table_id: 210 | | binlog.000064 | 714382483 | Delete_rows | 1000403307 | 714390118 | table_id: 210 flags: STMT_END_F | | binlog.000064 | 714390118 | Xid | 1000403307 | 714390149 | COMMIT /* xid=6241935 */ | | binlog.000064 | 714390149 | Gtid | 1000403307 | 714390228 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133173' | | binlog.000064 | 714390228 | Query | 1000403307 | 714390305 | BEGIN | | binlog.000064 | 714390305 | Table_map | 1000403307 | 714390372 | table_id: 198 (fcmark.sbtest1) | | binlog.000064 | 714390372 | Delete_rows | 1000403307 | 714398577 | table_id: 198 | | binlog.000064 | 714398577 | Delete_rows | 1000403307 | 714406782 | table_id: 198 | | binlog.000064 | 714406782 | Delete_rows | 1000403307 | 714414987 | table_id: 198 | | binlog.000064 | 714414987 | Delete_rows | 1000403307 | 714423192 | table_id: 198 | | binlog.000064 | 714423192 | Delete_rows | 1000403307 | 714425127 | table_id: 198 flags: STMT_END_F | | binlog.000064 | 714425127 | Xid | 1000403307 | 714425158 | COMMIT /* xid=6242038 */ | | binlog.000064 | 714425158 | Gtid | 1000403307 | 714425237 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133174' | | binlog.000064 | 714425237 | Query | 1000403307 | 714425314 | BEGIN | | binlog.000064 | 714425314 | Table_map | 1000403307 | 714425381 | table_id: 201 (fcmark.sbtest2) | | binlog.000064 | 714425381 | Delete_rows | 1000403307 | 714433586 | table_id: 201 | | binlog.000064 | 714433586 | Delete_rows | 1000403307 | 714441791 | table_id: 201 | | binlog.000064 | 714441791 | Delete_rows | 1000403307 | 714449996 | table_id: 201 | | binlog.000064 | 714449996 | Delete_rows | 1000403307 | 714458201 | table_id: 201 | | binlog.000064 | 714458201 | Delete_rows | 1000403307 | 714462226 | table_id: 201 flags: STMT_END_F | | binlog.000064 | 714462226 | Xid | 1000403307 | 714462257 | COMMIT /* xid=6242139 */ | | binlog.000064 | 714462257 | Gtid | 1000403307 | 714462336 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133175' | | binlog.000064 | 714462336 | Query | 1000403307 | 714462413 | BEGIN | | binlog.000064 | 714462413 | Table_map | 1000403307 | 714462480 | table_id: 204 (fcmark.sbtest3) | | binlog.000064 | 714462480 | Delete_rows | 1000403307 | 714470685 | table_id: 204 | | binlog.000064 | 714470685 | Delete_rows | 1000403307 | 714478890 | table_id: 204 | | binlog.000064 | 714478890 | Delete_rows | 1000403307 | 714487095 | table_id: 204 | | binlog.000064 | 714487095 | Delete_rows | 1000403307 | 714495300 | table_id: 204 | | binlog.000064 | 714495300 | Delete_rows | 1000403307 | 714503505 | table_id: 204 flags: STMT_END_F | | binlog.000064 | 714503505 | Xid | 1000403307 | 714503536 | COMMIT /* xid=6242232 */ | | binlog.000064 | 714503536 | Gtid | 1000403307 | 714503615 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133176' | | binlog.000064 | 714503615 | Query | 1000403307 | 714503692 | BEGIN | | binlog.000064 | 714503692 | Table_map | 1000403307 | 714503759 | table_id: 207 (fcmark.sbtest4) | | binlog.000064 | 714503759 | Delete_rows | 1000403307 | 714511964 | table_id: 207 | | binlog.000064 | 714511964 | Delete_rows | 1000403307 | 714520169 | table_id: 207 | +---------------+-----------+-------------+------------+-------------+------------------------------------------------------------------------+ 50 rows in set (0.00 sec) greatsql> SHOW CREATE TABLE fcmark.sbtest4\G *************************** 1. row *************************** Table: sbtest4 Create Table: CREATE TABLE `sbtest4` ( `id` int NOT NULL, `k` int NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '' ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec) greatsql>SELECT count(1) FROM fcmark.sbtest4; +----------+ | count(1) | +----------+ | 9999367 | +----------+ 1 row in set (11.35 sec)
四、总结
使用SHOW PROCESSLIST
和SHOW SLAVE STATUS
观察是数据库正在回放数据,找到正在回放的主库的binlog位点以及正在回放的GTID,在主库上使用 SHOW BINLOG EVENTS IN binlog文件 FROM 位点 LIMIT 步长
,来找到对应的表结构和事件类型,进而观察表结构,该表无索引且数据量大,所以导致从库回放的时候比较慢,至于处理办法,可以临时KILL 掉回放线程,然后加上索引,在重启回放线程,具体细节就不详细展开了。
Enjoy GreatSQL 😃
关于 GreatSQL
GreatSQL是适用于金融级应用的国内自主开源数据库,具备高性能、高可靠、高易用性、高安全等多个核心特性,可以作为MySQL或Percona Server的可选替换,用于线上生产环境,且完全免费并兼容MySQL或Percona Server。
相关链接: GreatSQL社区 Gitee GitHub Bilibili
GreatSQL社区:
社区博客有奖征稿详情:https://greatsql.cn/thread-100-1-1.html
技术交流群:
微信:扫码添加
GreatSQL社区助手
微信好友,发送验证信息加群
。
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek “源神”启动!「GitHub 热点速览」
· 我与微信审核的“相爱相杀”看个人小程序副业
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· C# 集成 DeepSeek 模型实现 AI 私有化(本地部署与 API 调用教程)
· spring官宣接入deepseek,真的太香了~
2023-02-17 故障案例:MySQL唯一索引有重复值,官方却说This is not a bug