记一次 RR 与 RC 死锁问题排查

记录一下开发过程中遇到的死锁问题和解决方案

首先是隔离级别,mysql有四个隔离级别,从宽松到严格依次是

  • 读未提交       Read Uncommitted
  • 读提交           Read Committed   (线上环境使用的隔离级别)
  • 可重复读       Repeatable Read  (Mysql 默认的隔离级别)
  • 串行化           Serializable

 

第一次死锁

背景: 在db中无记录时,并发插入会产生死锁

首先是第一次遇到的死锁问题,  通过命令  

show engine innodb status

查看死锁日志如下:

 

=====================================
2021-11-15 14:45:15 0x70000908f000 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 53 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 436 srv_active, 0 srv_shutdown, 136216 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 14531
OS WAIT ARRAY INFO: signal count 12688
RW-shared spins 0, rounds 0, OS waits 0
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2021-11-15 14:37:11 0x700004455000
*** (1) TRANSACTION:
TRANSACTION 47513, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 1
MySQL thread id 26344, OS thread handle 123145497554944, query id 441640 localhost 127.0.0.1 root update
INSERT IGNORE INTO `wms_product` (`app_id`,`wms_warehouse_id`,`sku_id`,`amount`,`period_amount`,`trans_amount`,`version`,`update_time`,`create_time`) VALUES (2,1,'C-调拨',1,0,0,1,'2021-11-15 14:37:11.698119','2021-11-15 14:37:11.698119')

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47513 lock mode S waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000056; asc        V;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47513 lock mode S waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000056; asc        V;;


*** (2) TRANSACTION:
TRANSACTION 47503, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 24 lock struct(s), heap size 3488, 21 row lock(s), undo log entries 4
MySQL thread id 26392, OS thread handle 123145430462464, query id 441871 localhost 127.0.0.1 root update
INSERT IGNORE INTO `wms_product` (`app_id`,`wms_warehouse_id`,`sku_id`,`amount`,`period_amount`,`trans_amount`,`version`,`update_time`,`create_time`) VALUES (2,1,'B并发入库',1,0,0,1,'2021-11-15 14:37:11.682939','2021-11-15 14:37:11.682939')

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47503 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000056; asc        V;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47503 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000056; asc        V;;

*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 47828
Purge done for trx's n:o < 47828 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479673057184, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479673056392, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479673055600, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479673054808, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (read thread)
I/O thread 4 state: waiting for i/o request (read thread)
I/O thread 5 state: waiting for i/o request (read thread)
I/O thread 6 state: waiting for i/o request (write thread)
I/O thread 7 state: waiting for i/o request (write thread)
I/O thread 8 state: waiting for i/o request (write thread)
I/O thread 9 state: waiting for i/o request (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 2
1644 OS file reads, 161796 OS file writes, 113308 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 5.41 writes/s, 2.28 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 5 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number          56679186
Log buffer assigned up to    56679186
Log buffer completed up to   56679186
Log written up to            56679186
Log flushed up to            56679186
Added dirty pages up to      56679186
Pages flushed up to          56679186
Last checkpoint at           56679186
86480 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 0
Dictionary memory allocated 504140
Buffer pool size   8191
Free buffers       5147
Database pages     3028
Old database pages 1097
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 668, not young 149
0.00 youngs/s, 0.00 non-youngs/s
Pages read 1086, created 1943, written 58798
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 3028, unzip_LRU len: 0
I/O sum[229]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=555, Main thread ID=0x7000048f0000 , state=sleeping
Number of rows inserted 21319, updated 39868, deleted 3791, read 180746
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 837, updated 3436, deleted 524, read 39358
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

 

 

可以看到是针对仅一张表`wms`.`wms_product`进行插入操作时产生的死锁。

 

代码中关于 表`wms`.`wms_product`的操作示意是

  1. Select * from wms_product for update    –- 尝试加锁
  2. Insert xxx  (在未查询到的情况下插入一条记录)

 

原因:  此时是在本地环境,隔离级别是默认的RR  (特别提醒,直接在mysql通过命令修改隔离级别不会生效/仅对当前会话生效, 需要修改配置文件后重启)

在RR级别下 mysql为解决 幻读 的问题,引入了间隙锁 gap lock.  间隙锁锁住的不是一行记录,而是行与行之间的间隙 (所谓幻读就是指在一次事务中, 前后两次查询同一个范围的时候,后一次查询看到了前一次查询没有看到的记录.  成因是S或X锁只能锁住已存在的行,不能阻止别人插入)

间隙锁与间隙锁之间不互斥,间隙锁仅与插入操作互斥

理解了上面的间隙锁就很容易理解死锁产生的原因了,简单的画了下时序图:

时刻
Session1
Session2
备注
T1 获取Gap lock 获取Gap lock 同时获取间隙锁,不互斥
T2 Insert xxx (等待Session2的 间隙锁) Insert xxx (等待Session1的 间隙锁) 互相等待
T3     死锁

 

备注:  这里经过和雷神大佬的讨论,单纯针对这种情况而不考虑业务下第一次的for update应该可以不加,因为后面的 Insert 或 Update 都会有X锁

 

 

第二次死锁

背景: 在db中无记录时,并发插入会产生死锁

通过修改配置文件并重启成功更新了本地的隔离级别为RC, 不再有间隙锁的问题。 但业务方面也有一些变化,引入了 product_batchno 批次表,操作时需要对两张表同时进行更新.

死锁日志如下:

 

=====================================
2021-11-30 10:24:59 0x70000e150000 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 42 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 45 srv_active, 0 srv_shutdown, 129307 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 592
OS WAIT ARRAY INFO: signal count 478
RW-shared spins 0, rounds 0, OS waits 0
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2021-11-30 10:24:32 0x70000cdf6000
*** (1) TRANSACTION:
TRANSACTION 59401, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 2
MySQL thread id 328, OS thread handle 123145563062272, query id 8008 localhost 127.0.0.1 root update
INSERT INTO `wms_product_batchno` (`app_id`,`wms_warehouse_id`,`sku_id`,`batch_no`,`amount`,`period_amount`,`expire_time`,`update_time`,`create_time`) VALUES (2,1,'C-调拨','C11234124124129999999',1,0,'2021-12-10 10:24:32','2021-11-30 10:24:32.471672','2021-11-30 10:24:32.471672')

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 307 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 59401 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000001; asc         ;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 305 page no 5 n bits 72 index uk_skuid_wmsid_app_id_batchno of table `wms`.`wms_product_batchno` trx id 59401 lock mode S waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 21; hex 433131323334313234313234313239393939393939; asc C11234124124129999999;;
 4: len 8; hex 8000000000000001; asc         ;;


*** (2) TRANSACTION:
TRANSACTION 59467, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s)
MySQL thread id 390, OS thread handle 123145633349632, query id 8233 localhost 127.0.0.1 root statistics
SELECT * FROM `wms_product` WHERE app_id = 2 and sku_id = 'C-调拨' and wms_warehouse_id = 1 FOR UPDATE

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 305 page no 5 n bits 72 index uk_skuid_wmsid_app_id_batchno of table `wms`.`wms_product_batchno` trx id 59467 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 21; hex 433131323334313234313234313239393939393939; asc C11234124124129999999;;
 4: len 8; hex 8000000000000001; asc         ;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 307 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 59467 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000001; asc         ;;

*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 59502
Purge done for trx's n:o < 59502 undo n:o < 0 state: running but idle
History list length 15
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479685464760, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479685463968, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479685463176, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479685462384, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479685461592, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (read thread)
I/O thread 4 state: waiting for i/o request (read thread)
I/O thread 5 state: waiting for i/o request (read thread)
I/O thread 6 state: waiting for i/o request (write thread)
I/O thread 7 state: waiting for i/o request (write thread)
I/O thread 8 state: waiting for i/o request (write thread)
I/O thread 9 state: waiting for i/o request (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
2294 OS file reads, 6630 OS file writes, 4541 OS fsyncs
0.02 reads/s, 16384 avg bytes/read, 12.31 writes/s, 6.83 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 4 buffer(s)
4.60 hash searches/s, 31.50 non-hash searches/s
---
LOG
---
Log sequence number          66935527
Log buffer assigned up to    66935527
Log buffer completed up to   66935527
Log written up to            66935527
Log flushed up to            66935527
Added dirty pages up to      66935527
Pages flushed up to          66935527
Last checkpoint at           66935527
2437 log i/o's done, 3.10 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 0
Dictionary memory allocated 451775
Buffer pool size   8192
Free buffers       6001
Database pages     2179
Old database pages 785
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 1728, created 451, written 3091
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2179, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=581, Main thread ID=0x70000d291000 , state=sleeping
Number of rows inserted 267, updated 445, deleted 0, read 1668
0.17 inserts/s, 0.69 updates/s, 0.00 deletes/s, 4.90 reads/s
Number of system rows inserted 152, updated 1836, deleted 76, read 14284
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 7.69 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

 

 

 

代码中针对两张表的操作顺序如下

  1. Select * from productBatchNo  for update            // 同样尝试加锁
  2. Select * from product for update                           //同样尝试加锁
  3. 更新 productBatchNo 和 product
  4. Commit

 

具体情况如下:

注意下表中的操作都发生在t1时刻

操作
Session1
Session2
Session3
备注
查询 ProdcutBatchNo 并加锁  已完成, 但由于db中无记录,没有加上锁 已完成, 但由于db中无记录 没有加上锁 未开始 针对Session2, 由于Session1没有提交,Session2在RC级别下看不到数据,因为没有加锁
查询 Product 并加锁  已完成, 但由于db中无记录,没有加上锁 准备执行    
更新逻辑(BatchNo 和 Product) 已完成      
Commit 事务提交 未提交      

 

当进入t2时刻后:

操作
Session1
Session2
Session3
备注
查询 ProdcutBatchNo 并加锁  已完成, 但由于db中无记录,没有加上锁 已完成, 但由于db中无记录 没有加上锁 db中有记录,针对ProductBatchNo 加上 X锁  
查询 Product 并加锁  已完成, 但由于db中无记录,没有加上锁 db 中有记录,针对product 加上X 锁 准备执行  
更新逻辑(BatchNo 和 Product) 已完成 准备执行    
Commit 事务提交 已完成 成功提交      

 

当进入t3时刻后

操作
Session1
Session2
Session3
备注
查询 ProdcutBatchNo 并加锁  已完成, 但由于db中无记录,没有加上锁 已完成, 但由于db中无记录 没有加上锁 db中有记录,针对Product 加上 X锁  
查询 Product 并加锁  已完成, 但由于db中无记录,没有加上锁 db 中有记录,针对product 加上X 锁 由于Prodct 被Session2 加了X锁 等待释放  
更新逻辑(BatchNo 和 Product) 已完成 准备执行, 由于BatchNo上被Session3 加了X锁,等待释放    
Commit 事务提交 已完成 成功提交     Session2 和 Session3 互相等待,死锁

根源还是在不存在的记录无法锁上.

解决方案:

在查询 ProductBatchNo 或 Product 之后判断DB中是否有记录,如果DB中没有记录就直接结束这次的事务。并向DB中插入对应的空记录。(关键信息如Sku_ID, BatchNo等有值,商品数量为0)

通过这种方式消除了不存在的行的影响,这样后续更新操作中,一旦某个Session拿到了锁,就会直接阻塞别的 Session.

 

参考资料:

https://www.cnblogs.com/paul8339/p/6877729.html

posted @ 2021-12-04 15:26  DogTwo  阅读(1072)  评论(0编辑  收藏  举报