记一次 RR 与 RC 死锁问题排查
记录一下开发过程中遇到的死锁问题和解决方案
首先是隔离级别,mysql有四个隔离级别,从宽松到严格依次是
- 读未提交 Read Uncommitted
- 读提交 Read Committed (线上环境使用的隔离级别)
- 可重复读 Repeatable Read (Mysql 默认的隔离级别)
- 串行化 Serializable
第一次死锁
背景: 在db中无记录时,并发插入会产生死锁
首先是第一次遇到的死锁问题, 通过命令
show engine innodb status |
查看死锁日志如下:
===================================== 2021-11-15 14:45:15 0x70000908f000 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 53 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 436 srv_active, 0 srv_shutdown, 136216 srv_idle srv_master_thread log flush and writes: 0 ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 14531 OS WAIT ARRAY INFO: signal count 12688 RW-shared spins 0, rounds 0, OS waits 0 RW-excl spins 0, rounds 0, OS waits 0 RW-sx spins 0, rounds 0, OS waits 0 Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2021-11-15 14:37:11 0x700004455000 *** (1) TRANSACTION: TRANSACTION 47513, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 1 MySQL thread id 26344, OS thread handle 123145497554944, query id 441640 localhost 127.0.0.1 root update INSERT IGNORE INTO `wms_product` (`app_id`,`wms_warehouse_id`,`sku_id`,`amount`,`period_amount`,`trans_amount`,`version`,`update_time`,`create_time`) VALUES (2,1,'C-调拨',1,0,0,1,'2021-11-15 14:37:11.698119','2021-11-15 14:37:11.698119') *** (1) HOLDS THE LOCK(S): RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47513 lock mode S waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 8; hex 432de8b083e68ba8; asc C- ;; 1: len 8; hex 8000000000000001; asc ;; 2: len 4; hex 80000002; asc ;; 3: len 8; hex 8000000000000056; asc V;; *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47513 lock mode S waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 8; hex 432de8b083e68ba8; asc C- ;; 1: len 8; hex 8000000000000001; asc ;; 2: len 4; hex 80000002; asc ;; 3: len 8; hex 8000000000000056; asc V;; *** (2) TRANSACTION: TRANSACTION 47503, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 24 lock struct(s), heap size 3488, 21 row lock(s), undo log entries 4 MySQL thread id 26392, OS thread handle 123145430462464, query id 441871 localhost 127.0.0.1 root update INSERT IGNORE INTO `wms_product` (`app_id`,`wms_warehouse_id`,`sku_id`,`amount`,`period_amount`,`trans_amount`,`version`,`update_time`,`create_time`) VALUES (2,1,'B并发入库',1,0,0,1,'2021-11-15 14:37:11.682939','2021-11-15 14:37:11.682939') *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47503 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 8; hex 432de8b083e68ba8; asc C- ;; 1: len 8; hex 8000000000000001; asc ;; 2: len 4; hex 80000002; asc ;; 3: len 8; hex 8000000000000056; asc V;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47503 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 8; hex 432de8b083e68ba8; asc C- ;; 1: len 8; hex 8000000000000001; asc ;; 2: len 4; hex 80000002; asc ;; 3: len 8; hex 8000000000000056; asc V;; *** WE ROLL BACK TRANSACTION (1) ------------ TRANSACTIONS ------------ Trx id counter 47828 Purge done for trx's n:o < 47828 undo n:o < 0 state: running but idle History list length 0 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 281479673057184, not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION 281479673056392, not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION 281479673055600, not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION 281479673054808, not started 0 lock struct(s), heap size 1128, 0 row lock(s) -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (read thread) I/O thread 4 state: waiting for i/o request (read thread) I/O thread 5 state: waiting for i/o request (read thread) I/O thread 6 state: waiting for i/o request (write thread) I/O thread 7 state: waiting for i/o request (write thread) I/O thread 8 state: waiting for i/o request (write thread) I/O thread 9 state: waiting for i/o request (write thread) Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] , ibuf aio reads:, log i/o's:, sync i/o's: Pending flushes (fsync) log: 0; buffer pool: 2 1644 OS file reads, 161796 OS file writes, 113308 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 5.41 writes/s, 2.28 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 0, seg size 2, 0 merges merged operations: insert 0, delete mark 0, delete 0 discarded operations: insert 0, delete mark 0, delete 0 Hash table size 34679, node heap has 1 buffer(s) Hash table size 34679, node heap has 1 buffer(s) Hash table size 34679, node heap has 1 buffer(s) Hash table size 34679, node heap has 2 buffer(s) Hash table size 34679, node heap has 2 buffer(s) Hash table size 34679, node heap has 2 buffer(s) Hash table size 34679, node heap has 2 buffer(s) Hash table size 34679, node heap has 5 buffer(s) 0.00 hash searches/s, 0.00 non-hash searches/s --- LOG --- Log sequence number 56679186 Log buffer assigned up to 56679186 Log buffer completed up to 56679186 Log written up to 56679186 Log flushed up to 56679186 Added dirty pages up to 56679186 Pages flushed up to 56679186 Last checkpoint at 56679186 86480 log i/o's done, 0.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 0 Dictionary memory allocated 504140 Buffer pool size 8191 Free buffers 5147 Database pages 3028 Old database pages 1097 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 668, not young 149 0.00 youngs/s, 0.00 non-youngs/s Pages read 1086, created 1943, written 58798 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 3028, unzip_LRU len: 0 I/O sum[229]:cur[0], unzip sum[0]:cur[0] -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue 0 read views open inside InnoDB Process ID=555, Main thread ID=0x7000048f0000 , state=sleeping Number of rows inserted 21319, updated 39868, deleted 3791, read 180746 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s Number of system rows inserted 837, updated 3436, deleted 524, read 39358 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================
可以看到是针对仅一张表`wms`.`wms_product`进行插入操作时产生的死锁。
代码中关于 表`wms`.`wms_product`的操作示意是
- Select * from wms_product for update –- 尝试加锁
- Insert xxx (在未查询到的情况下插入一条记录)
原因: 此时是在本地环境,隔离级别是默认的RR (特别提醒,直接在mysql通过命令修改隔离级别不会生效/仅对当前会话生效, 需要修改配置文件后重启)
在RR级别下 mysql为解决 幻读 的问题,引入了间隙锁 gap lock. 间隙锁锁住的不是一行记录,而是行与行之间的间隙 (所谓幻读就是指在一次事务中, 前后两次查询同一个范围的时候,后一次查询看到了前一次查询没有看到的记录. 成因是S或X锁只能锁住已存在的行,不能阻止别人插入)
间隙锁与间隙锁之间不互斥,间隙锁仅与插入操作互斥
理解了上面的间隙锁就很容易理解死锁产生的原因了,简单的画了下时序图:
时刻
|
Session1
|
Session2
|
备注
|
---|---|---|---|
T1 | 获取Gap lock | 获取Gap lock | 同时获取间隙锁,不互斥 |
T2 | Insert xxx (等待Session2的 间隙锁) | Insert xxx (等待Session1的 间隙锁) | 互相等待 |
T3 | 死锁 |
备注: 这里经过和雷神大佬的讨论,单纯针对这种情况而不考虑业务下第一次的for update应该可以不加,因为后面的 Insert 或 Update 都会有X锁
第二次死锁
背景: 在db中无记录时,并发插入会产生死锁
通过修改配置文件并重启成功更新了本地的隔离级别为RC, 不再有间隙锁的问题。 但业务方面也有一些变化,引入了 product_batchno 批次表,操作时需要对两张表同时进行更新.
死锁日志如下:
===================================== 2021-11-30 10:24:59 0x70000e150000 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 42 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 45 srv_active, 0 srv_shutdown, 129307 srv_idle srv_master_thread log flush and writes: 0 ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 592 OS WAIT ARRAY INFO: signal count 478 RW-shared spins 0, rounds 0, OS waits 0 RW-excl spins 0, rounds 0, OS waits 0 RW-sx spins 0, rounds 0, OS waits 0 Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2021-11-30 10:24:32 0x70000cdf6000 *** (1) TRANSACTION: TRANSACTION 59401, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 2 MySQL thread id 328, OS thread handle 123145563062272, query id 8008 localhost 127.0.0.1 root update INSERT INTO `wms_product_batchno` (`app_id`,`wms_warehouse_id`,`sku_id`,`batch_no`,`amount`,`period_amount`,`expire_time`,`update_time`,`create_time`) VALUES (2,1,'C-调拨','C11234124124129999999',1,0,'2021-12-10 10:24:32','2021-11-30 10:24:32.471672','2021-11-30 10:24:32.471672') *** (1) HOLDS THE LOCK(S): RECORD LOCKS space id 307 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 59401 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 8; hex 432de8b083e68ba8; asc C- ;; 1: len 8; hex 8000000000000001; asc ;; 2: len 4; hex 80000002; asc ;; 3: len 8; hex 8000000000000001; asc ;; *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 305 page no 5 n bits 72 index uk_skuid_wmsid_app_id_batchno of table `wms`.`wms_product_batchno` trx id 59401 lock mode S waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0 0: len 8; hex 432de8b083e68ba8; asc C- ;; 1: len 8; hex 8000000000000001; asc ;; 2: len 4; hex 80000002; asc ;; 3: len 21; hex 433131323334313234313234313239393939393939; asc C11234124124129999999;; 4: len 8; hex 8000000000000001; asc ;; *** (2) TRANSACTION: TRANSACTION 59467, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s) MySQL thread id 390, OS thread handle 123145633349632, query id 8233 localhost 127.0.0.1 root statistics SELECT * FROM `wms_product` WHERE app_id = 2 and sku_id = 'C-调拨' and wms_warehouse_id = 1 FOR UPDATE *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 305 page no 5 n bits 72 index uk_skuid_wmsid_app_id_batchno of table `wms`.`wms_product_batchno` trx id 59467 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0 0: len 8; hex 432de8b083e68ba8; asc C- ;; 1: len 8; hex 8000000000000001; asc ;; 2: len 4; hex 80000002; asc ;; 3: len 21; hex 433131323334313234313234313239393939393939; asc C11234124124129999999;; 4: len 8; hex 8000000000000001; asc ;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 307 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 59467 lock_mode X locks rec but not gap waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 8; hex 432de8b083e68ba8; asc C- ;; 1: len 8; hex 8000000000000001; asc ;; 2: len 4; hex 80000002; asc ;; 3: len 8; hex 8000000000000001; asc ;; *** WE ROLL BACK TRANSACTION (2) ------------ TRANSACTIONS ------------ Trx id counter 59502 Purge done for trx's n:o < 59502 undo n:o < 0 state: running but idle History list length 15 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 281479685464760, not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION 281479685463968, not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION 281479685463176, not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION 281479685462384, not started 0 lock struct(s), heap size 1128, 0 row lock(s) ---TRANSACTION 281479685461592, not started 0 lock struct(s), heap size 1128, 0 row lock(s) -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (read thread) I/O thread 4 state: waiting for i/o request (read thread) I/O thread 5 state: waiting for i/o request (read thread) I/O thread 6 state: waiting for i/o request (write thread) I/O thread 7 state: waiting for i/o request (write thread) I/O thread 8 state: waiting for i/o request (write thread) I/O thread 9 state: waiting for i/o request (write thread) Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] , ibuf aio reads:, log i/o's:, sync i/o's: Pending flushes (fsync) log: 0; buffer pool: 0 2294 OS file reads, 6630 OS file writes, 4541 OS fsyncs 0.02 reads/s, 16384 avg bytes/read, 12.31 writes/s, 6.83 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 0, seg size 2, 0 merges merged operations: insert 0, delete mark 0, delete 0 discarded operations: insert 0, delete mark 0, delete 0 Hash table size 34679, node heap has 1 buffer(s) Hash table size 34679, node heap has 1 buffer(s) Hash table size 34679, node heap has 1 buffer(s) Hash table size 34679, node heap has 1 buffer(s) Hash table size 34679, node heap has 1 buffer(s) Hash table size 34679, node heap has 1 buffer(s) Hash table size 34679, node heap has 2 buffer(s) Hash table size 34679, node heap has 4 buffer(s) 4.60 hash searches/s, 31.50 non-hash searches/s --- LOG --- Log sequence number 66935527 Log buffer assigned up to 66935527 Log buffer completed up to 66935527 Log written up to 66935527 Log flushed up to 66935527 Added dirty pages up to 66935527 Pages flushed up to 66935527 Last checkpoint at 66935527 2437 log i/o's done, 3.10 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 0 Dictionary memory allocated 451775 Buffer pool size 8192 Free buffers 6001 Database pages 2179 Old database pages 785 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 1728, created 451, written 3091 0.00 reads/s, 0.00 creates/s, 0.00 writes/s Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000 Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 2179, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue 0 read views open inside InnoDB Process ID=581, Main thread ID=0x70000d291000 , state=sleeping Number of rows inserted 267, updated 445, deleted 0, read 1668 0.17 inserts/s, 0.69 updates/s, 0.00 deletes/s, 4.90 reads/s Number of system rows inserted 152, updated 1836, deleted 76, read 14284 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 7.69 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================
代码中针对两张表的操作顺序如下
- Select * from productBatchNo for update // 同样尝试加锁
- Select * from product for update //同样尝试加锁
- 更新 productBatchNo 和 product
- Commit
具体情况如下:
注意下表中的操作都发生在t1时刻
操作
|
Session1
|
Session2
|
Session3
|
备注
|
---|---|---|---|---|
查询 ProdcutBatchNo 并加锁 | 已完成, 但由于db中无记录,没有加上锁 | 已完成, 但由于db中无记录 没有加上锁 | 未开始 | 针对Session2, 由于Session1没有提交,Session2在RC级别下看不到数据,因为没有加锁 |
查询 Product 并加锁 | 已完成, 但由于db中无记录,没有加上锁 | 准备执行 | ||
更新逻辑(BatchNo 和 Product) | 已完成 | |||
Commit 事务提交 | 未提交 |
当进入t2时刻后:
操作
|
Session1
|
Session2
|
Session3
|
备注
|
---|---|---|---|---|
查询 ProdcutBatchNo 并加锁 | 已完成, 但由于db中无记录,没有加上锁 | 已完成, 但由于db中无记录 没有加上锁 | db中有记录,针对ProductBatchNo 加上 X锁 | |
查询 Product 并加锁 | 已完成, 但由于db中无记录,没有加上锁 | db 中有记录,针对product 加上X 锁 | 准备执行 | |
更新逻辑(BatchNo 和 Product) | 已完成 | 准备执行 | ||
Commit 事务提交 | 已完成 成功提交 |
当进入t3时刻后
操作
|
Session1
|
Session2
|
Session3
|
备注
|
---|---|---|---|---|
查询 ProdcutBatchNo 并加锁 | 已完成, 但由于db中无记录,没有加上锁 | 已完成, 但由于db中无记录 没有加上锁 | db中有记录,针对Product 加上 X锁 | |
查询 Product 并加锁 | 已完成, 但由于db中无记录,没有加上锁 | db 中有记录,针对product 加上X 锁 | 由于Prodct 被Session2 加了X锁 等待释放 | |
更新逻辑(BatchNo 和 Product) | 已完成 | 准备执行, 由于BatchNo上被Session3 加了X锁,等待释放 | ||
Commit 事务提交 | 已完成 成功提交 | Session2 和 Session3 互相等待,死锁 |
根源还是在不存在的记录无法锁上.
解决方案:
在查询 ProductBatchNo 或 Product 之后判断DB中是否有记录,如果DB中没有记录就直接结束这次的事务。并向DB中插入对应的空记录。(关键信息如Sku_ID, BatchNo等有值,商品数量为0)
通过这种方式消除了不存在的行的影响,这样后续更新操作中,一旦某个Session拿到了锁,就会直接阻塞别的 Session.
参考资料: