Fork me on GitHub

MySQL 死锁问题排查

1.监控日志

通过监控发现如下异常,尾随其后的还有报错相应的堆栈信息,指出了具体是哪个SQL语句发生了死锁

com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at com.***.***.im.service.platform.dao.impl.ImMessageDaoImpl.insert(ImMessageDaoImpl.java:50)
at com.***.***.im.service.platform.service.impl.ImMessageServiceImpl.saveNewSessionMessage(ImMessageServiceImpl.java:543)

通过日志查看代码,觉得不大可能是同一个事务并发执行导致的死锁

2.查看隔离级别

select @@tx_isolation;  //当前session隔离级别
select @@global.tx_isolation;  //全局回话隔离级别

业务代码有可能使用默认的隔离级别,默认的级别就是全局的隔离级别;业务也可能设置了当前事物的隔离级别,我们使用的默认级别,是RR(可重复读)

3.查看最近一次innoDB监测的死锁

联系DBA,查看发生死锁的业务对应的数据库,和innodb记录的死锁日志

show engine innodb status;

查询得到最近的一次死锁日志为:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-04-01 23:32:49 0x7f6306adb700
*** (1) TRANSACTION:
TRANSACTION 23734694036, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 7 lock struct(s), heap size 1136, 25 row lock(s)
MySQL thread id 7109502, OS thread handle 140046693021440, query id 5270358204 172.31.21.66 im_w1 updating

update im_servicer_session
		set unread_count=0
		where session_id=142298 and servicer_id=8708

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5351 page no 18 n bits 224 index PRIMARY of table `im`.`im_servicer_session` trx id 23734694036 
lock_mode X locks rec but not gap waiting
Record lock, heap no 148 PHYSICAL RECORD: n_fields 11; compact format; info bits 0
 0: len 8; hex 00000000000006a4; asc         ;;
 1: len 6; hex 000586b2b07f; asc       ;;
 2: len 7; hex 27000002141d37; asc '     7;;
 3: len 8; hex 0000000000022bda; asc       + ;;
 4: len 8; hex 0000000000002204; asc       " ;;
 5: len 1; hex 00; asc  ;;
 6: len 5; hex 9943c20000; asc  C   ;;
 7: len 1; hex 00; asc  ;;
 8: len 4; hex 00000003; asc     ;;
 9: len 5; hex 99a2c37642; asc    vB;;
 10: len 5; hex 99a2c37830; asc    x0;;

*** (2) TRANSACTION:
TRANSACTION 23734694015, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
MySQL thread id 7108183, OS thread handle 140063290537728, query id 5270358482 172.31.35.143 im_w1 update

insert into im_message_0_34
         ( chat_id,
            message_type,
            message,
            house_id,
            send_time,
            send_status,
            receive_status,
            show_type ) 
         values ( '4NzP0DZO7wngS5YiGFcJTKu0L2Xrhan7zpbBBO/1KdQ=',
            0,
            '嗯嗯',
            106874,
            '2019-04-01 23:32:48.113',
            0,
            1,
            0 )

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 5351 page no 18 n bits 224 index PRIMARY of table `im`.`im_servicer_session` trx id 23734694015 
lock_mode X locks rec but not gap
Record lock, heap no 148 PHYSICAL RECORD: n_fields 11; compact format; info bits 0
 0: len 8; hex 00000000000006a4; asc         ;;
 1: len 6; hex 000586b2b07f; asc       ;;
 2: len 7; hex 27000002141d37; asc '     7;;
 3: len 8; hex 0000000000022bda; asc       + ;;
 4: len 8; hex 0000000000002204; asc       " ;;
 5: len 1; hex 00; asc  ;;
 6: len 5; hex 9943c20000; asc  C   ;;
 7: len 1; hex 00; asc  ;;
 8: len 4; hex 00000003; asc     ;;
 9: len 5; hex 99a2c37642; asc    vB;;
 10: len 5; hex 99a2c37830; asc    x0;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5388 page no 1531 n bits 264 index idx_chat_id of table `im`.`im_message_0_34` trx id 23734694015 
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 110 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 30; hex 344f69384254415559786c496a483947657577705071365a3764794f546e; asc 4Oi8BTAUYxlIjH9GeuwpPq6Z7dyOTn; (total 44 bytes);
 1: len 8; hex 00000000000069a0; asc       i ;;

*** WE ROLL BACK TRANSACTION (2)

从日志中可以看到只是简单的记录排它锁(X lock),并非间隙锁(gap lock)。还能发现第一个事务阻塞在了更新会话的SQL语句中,经查询得到是更新消息为已读的SQL,第二个事务阻塞在了保存消息的SQL语句中,死锁发生的两个事务的代码分别如下:
TRANSACTION 23734694036

 //更新会话时间
imServicerSessionService.updateSessionTime(sessionVo.getSessionId(), EnumServicerSessionState.IN_SESSION);
//...时间较长的请求
if (md.getMessageId() != null && md.getMessageId() > 0) {
    logger.info("修改消息");
    imMessageDao.update(md);
}else{
    imMessageDao.insert(md);
}

TRANSACTION 23734694015

if (LoginUserUtil.isServicer()) {
    imMessageDao.markServicerMessageRead(chatId,baseSubTable.getTableName(),houseId, loginInfo.getAccountId());
    imServicerSessionService.resetUnreadCount(imSessionVoList.get(0).getSessionId(), loginInfo.getAccountId());
}

4.会话过程

5.解决办法

  1. 解决死锁可以从死锁发生的条件入手,最容易解决的就是更改获取资源的顺序,在这个案例中可以更改的是事务TRANSACTION 23734694015里面两个SQL执行的顺序,因为他们没有依赖关系
  2. 其次是避免长事务,让事务执行的时间尽可能少,让事务的覆盖范围尽可能小,长事务会导致并发度降低,且会有更多的SQL查询延迟
  3. 给整个方法加事务是否是必须的?可以不加事务的尽量不加
posted @ 2019-04-15 15:30  gitmoji  阅读(3678)  评论(0编辑  收藏  举报