死锁及常见死锁模型

Ⅰ、死锁的概念

两个或两个以上的事务在执行过程中,因争夺资源而造成的一种互相等待的现象

最常见:AB-BA

稍微复杂:A-B B-C C-A形成如下图的循环

tips:
lock wait time out 和 dead lock不是一回事

1.1 死锁的处理机制

  • 锁超时
    --innodb_lock_wait_timeout 让一个超时另一个执行,但这个机制在数据库中是不用的
  • 自动死锁检测
    通过锁的信息链表和事务等待链表构造出一个等待图(wait-for graph),如下:

    t1 t2 t3 t4 是事务列表,四个事务之间的边表示等待关系——edge,每个节点(事务)和每个edge加进去的时候会判断下有没有回路,如果有,就那啥,懂的吧?

检测到了后会很聪明地选择其中一个事务回滚,那选择哪个呢?

根据undo的量判断,回滚量少的,不记得是5.5还是5.6开始才有这个机制,之前的数据库版本是直接回滚后面一个事务,比较挫

5.6版本对图的死锁检测部分的内核算法进行了优化,原来是递归的方式做的,现在通过重写,非递归,提升了性能,所以大并发时5.6性能比5.5好很多,这块也有功劳

Ⅱ、死锁演示

2.1 先模拟下场景

begin:
session1:
select a for update;

session2:
begin:
select b for update;
select a for update;
此时等待。

session1:
select b for update;

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

(root@localhost) [test]> show engine innodb status\G
...
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-06-15 01:27:47 0x7f2cb6acc700
*** (1) TRANSACTION:
TRANSACTION 31220816, ACTIVE 25 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 1448, OS thread handle 139830020597504, query id 8810 localhost root statistics
select * from l where a = 4 for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1358 page no 3 n bits 80 index PRIMARY of table `test`.`l` trx id 31220816 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 4; hex 80000004; asc     ;;
 1: len 6; hex 000001c1b93a; asc      :;;
 2: len 7; hex e1000001a90110; asc        ;;
 3: len 4; hex 80000006; asc     ;;
 4: len 4; hex 80000008; asc     ;;
 5: len 4; hex 8000000a; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 31220817, ACTIVE 11 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 1449, OS thread handle 139830020065024, query id 8811 localhost root statistics
select * from l where a = 10 for update
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1358 page no 3 n bits 80 index PRIMARY of table `test`.`l` trx id 31220817 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 4; hex 80000004; asc     ;;
 1: len 6; hex 000001c1b93a; asc      :;;
 2: len 7; hex e1000001a90110; asc        ;;
 3: len 4; hex 80000006; asc     ;;
 4: len 4; hex 80000008; asc     ;;
 5: len 4; hex 8000000a; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1358 page no 3 n bits 80 index PRIMARY of table `test`.`l` trx id 31220817 lock_mode X locks rec but not gap waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 4; hex 8000000a; asc     ;;
 1: len 6; hex 000001dc637f; asc     c ;;
 2: len 7; hex b30000019d0110; asc        ;;
 3: len 4; hex 8000000c; asc     ;;
 4: len 4; hex 8000000e; asc     ;;
 5: len 4; hex 80000010; asc     ;;

*** WE ROLL BACK TRANSACTION (2)
...

上面这个只能记录最近一次死锁,记录所有死锁信息到错误日志中可打开下面这个参数

innodb_print_all_deadlocks     建议用起来

2.2 5.7版本一个新参数

set global innodb_deadlock_detect=0    不检测死锁

不检测之后,这数据库怎么解决死锁呢,那就是等待锁超时咯,默认50s

两边锁超时,两边事务依然无法继续进行,不会回滚,处于未知状态,需要人为操作,要么commit,要么rollback,否则继续执行还是被锁

而dead lock时,其中一个事务是回滚的

通常来说事务中执行一个操作失败是不会回滚的,由用户决定是回滚还是提交,只有死锁的情况会回滚,不回滚的话,事务占用的锁的资源是不释放的。

什么时候需要把死锁检测调为0?

这个需求最早是淘宝提给官方的,官方也接受了

秒杀场景下设为0是有意义的,反正都在等,不要死锁检测,设了性能有一点点提升,意义不大

2.3 8.0中锁的两个新语法

session1:

(root@localhost) [test]> begin;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [test]> select * from l where a = 2 for update;
+---+------+------+------+
| a | b    | c    | d    |
+---+------+------+------+
| 2 |    4 |    6 |    8 |
+---+------+------+------+
1 row in set (0.00 sec)

session2:

(root@localhost) [test]> begin;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [test]> select * from l where a = 2 for update nowait;
ERROR 3572 (HY000): Statement aborted because lock(s) could not be acquired immediately and NOWAIT is set.

如果session2用另一个语法则不会报错,

(root@localhost) [test]> begin;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [test]> select * from l where a = 2 for update skip locked;
Empty set (0.00 sec)

发现记录上有锁,则跳过,返回空,不报错

这两个语法比较有用,sqlserver,pg也有,业务里可以用这个可以代替上面淘宝说出的那个,秒杀的时候如果返回空,那前端就等待

Ⅲ、死锁经典案例

3.1 购物车死锁——扣库存操作

sku_id(每件商品的真正id,最小分类下的)

session1:
begin;
update stock set count=count-1 where sku_id=1;
-------------------------------------------=2;
------------------------------------------=30;

session2:
update stock set count=count-1 where sku_id=30;
--------------------------------------------=1;
--------------------------------------------=2;

并发的时候就死锁了,这个问题数据库层解决不了

死锁并不是问题,是数据库的正常现象,只有当死锁影响到业务时,这时候才需要dba介入处理

解决:
前端或者接口层把订单中的商品id排序再发送到数据库层,排序后不会死锁,但会发生等待,锁等待调成3s,不要用默认的50s,但是如果并发很大,性能就会比较差

很多业务突然就变成热点,想不到的,这时候消息队列就不行了,这就要用线程池限流来解决,业务非常大,线程池是绕不过去的,强烈建议,应用层用消息队列,数据库层开线程池

这一步是电商中最关键的,也是并发最大的,不可以写缓存,全是insert和update操作,还有少部分select for update锁定库存

3.2 唯一索引死锁

先搞清楚唯一索引的插入

(root@localhost) [test]> show create table l\G
*************************** 1. row ***************************
       Table: l
Create Table: CREATE TABLE `l` (
  `a` int(11) NOT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  `d` int(11) DEFAULT NULL,
  PRIMARY KEY (`a`),
  UNIQUE KEY `c` (`c`),
  KEY `b` (`b`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

(root@localhost) [test]> select * from l;
+---+------+------+------+
| a | b    | c    | d    |
+---+------+------+------+
| 2 |    4 |    6 |    8 |
| 4 |    6 |    8 |   10 |
| 6 |    8 |   10 |   12 |
| 8 |   10 |   12 |   14 |
+---+------+------+------+
4 rows in set (0.00 sec)

(root@localhost) [test]> show variables like 'tx_isolation';  
+---------------+----------------+
| Variable_name | Value          |
+---------------+----------------+
| tx_isolation  | READ-COMMITTED |
+---------------+----------------+
1 row in set (0.00 sec)

session1:

(root@localhost) [test]> begin;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [test]> delete from l where c = 10;
Query OK, 1 row affected (0.00 sec)

session2:

(root@localhost) [test]> insert into l values (10,12,10,16);
hang~~~

session3:

(root@localhost) [(none)]> show engine innodb status\G
...
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421305875783280, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 31220861, ACTIVE 11 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 1561, OS thread handle 139830452774656, query id 8980 localhost root update
insert into l values(10,12,10,16)
------- TRX HAS BEEN WAITING 11 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1358 page no 4 n bits 80 index c of table `test`.`l` trx id 31220861 lock mode S waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 8000000a; asc     ;;
 1: len 4; hex 80000006; asc     ;;

------------------
TABLE LOCK table `test`.`l` trx id 31220861 lock mode IX
RECORD LOCKS space id 1358 page no 4 n bits 80 index c of table `test`.`l` trx id 31220861 lock mode S waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 8000000a; asc     ;;
 1: len 4; hex 80000006; asc     ;;

---TRANSACTION 31220860, ACTIVE 18 sec
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 1560, OS thread handle 139830453040896, query id 8978 localhost root
TABLE LOCK table `test`.`l` trx id 31220860 lock mode IX
RECORD LOCKS space id 1358 page no 4 n bits 80 index c of table `test`.`l` trx id 31220860 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 8000000a; asc     ;;
 1: len 4; hex 80000006; asc     ;;

RECORD LOCKS space id 1358 page no 3 n bits 80 index PRIMARY of table `test`.`l` trx id 31220860 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
 0: len 4; hex 80000006; asc     ;;
 1: len 6; hex 000001dc647c; asc     d|;;
 2: len 7; hex 3800000ff21de8; asc 8      ;;
 3: len 4; hex 80000008; asc     ;;
 4: len 4; hex 8000000a; asc     ;;
 5: len 4; hex 8000000c; asc     ;;
...

等待10上面的这把锁,锁的类型是S锁,好奇怪

10上面有个S lock,现在插入9也会被阻塞,所以,所以说,虽然是rc事务隔离级别,但只要有唯一索引,那依然存在gap锁

lock share mode不怎么在业务层(锁库存)使用,但是在数据库层保证唯一性

如果这张表上没有唯一索引,如果有,那也是个主键,并且是自增的,也就是说插入不会冲突,这种情况下所有的插入在rc的事务隔离级别下都是并行的,不会被阻塞

原因:rc情况下,没有gap锁,插入的时候不锁范围,那就可以并行插入,不会被阻塞,主键又唯一,又是自增的,不会冲突,所以不会有问题。
但是如果表中除了主键还有其他唯一索引,插入就会发生等待

对于insert,它是怎么插入的呢?
举例:
1 3 5 7 insert into 3

①先找大于3的第一条记录(next_rec)

②看此记录是否有gap锁或者next-locking,有不能插,没有可以插,insert-intention lock和record lock都可以插

如果有唯一索引,那就还有下面步骤

③previous_rec=current_rec,就是冲突了,这样就检测了唯一性

这时候如果previous_rec上没锁,那就马上告诉你冲突了,duplicate key

如果previous_rec上有锁,那这时候插入3,需要对这条记录加一个S。不能直接报duplicate key,在这个例子中,可能另一个事务是delete 3,它成功了,我就可以插了,所以5之前这条记录产生S lock,如果3上面的锁被释放了,就会唤醒这个S lock

这样这个S lock产生的原因就讲清楚了

总结:
唯一索引的插入需要额外一步检查(唯一约束的检查),即使rc,依然有gap锁,不能掉以轻心

接下来我们最小化模拟一下这个死锁

(root@localhost) [test]> show create table dl\G
*************************** 1. row ***************************
       Table: dl
Create Table: CREATE TABLE `dl` (
  `a` int(11) NOT NULL,
  UNIQUE KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

(root@localhost) [test]> select * from dl;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

(root@localhost) [test]> show variables like 'tx_isolation';  
+---------------+----------------+
| Variable_name | Value          |
+---------------+----------------+
| tx_isolation  | READ-COMMITTED |
+---------------+----------------+
1 row in set (0.00 sec)

session1:

(root@localhost) [test]> begin;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [test]> delete from l where a = 1;
Query OK, 0 rows affected (0.00 sec)

session2:

(root@localhost) [test]> begin;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [test]> insert into dl values(1);
hang~~~

session3:

(root@localhost) [test]> begin;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [test]> insert into dl values(1);
hang~~~

session1:

(root@localhost) [test]> commit;
Query OK, 0 rows affected (0.01 sec)

session2:

Query OK, 1 row affected (29.35 sec)

session3:

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

看下死锁信息

(root@localhost) [test]> show engine innodb status\G
...
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-06-15 02:39:12 0x7f2cd008e700
*** (1) TRANSACTION:
TRANSACTION 31220875, ACTIVE 30 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 1561, OS thread handle 139830452774656, query id 9000 localhost root update
insert into dl values(1)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1630 page no 3 n bits 72 index a of table `test`.`dl` trx id 31220875 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 000001dc6486; asc     d ;;
 2: len 7; hex 4100000fce1353; asc A     S;;

*** (2) TRANSACTION:
TRANSACTION 31220876, ACTIVE 4 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 1562, OS thread handle 139830445532928, query id 9010 localhost root update
insert into dl values (1)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1630 page no 3 n bits 72 index a of table `test`.`dl` trx id 31220876 lock mode S locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 000001dc6486; asc     d ;;
 2: len 7; hex 4100000fce1353; asc A     S;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1630 page no 3 n bits 72 index a of table `test`.`dl` trx id 31220876 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 000001dc6486; asc     d ;;
 2: len 7; hex 4100000fce1353; asc A     S;;

*** WE ROLL BACK TRANSACTION (2)
...

delete 的时候 加了一个X锁,insert 的时候 加了一个S锁,insert 的时候又加了一个S锁,commit后第一个insert成功,第二个死锁

梳理一下这个流程

   thd1                         thd2                                        thd3       
 del--->X
                           ins--->S--->wait
                                                                        ins--->S--->wait
 commit--->释放

						ins--->需要X--->等待thd3的S      <===>       ins--->需要X--->等待thd2的S
                                                        死锁

遇到死锁大部分都是唯一索引引起的,看下show engine innodb status\G 死锁里面锁的索引是不是唯一索引
记住S锁就是用来做唯一性检测的,其他用的比较少

tips:
unique key必须是not null,否则这个死锁搞不出来,不知道为什么,没研究

posted @ 2018-06-14 03:08  91洲际哥  阅读(962)  评论(0编辑  收藏  举报