MySQL 死锁问题排查
1.监控日志
通过监控发现如下异常,尾随其后的还有报错相应的堆栈信息,指出了具体是哪个SQL语句发生了死锁
com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at com.***.***.im.service.platform.dao.impl.ImMessageDaoImpl.insert(ImMessageDaoImpl.java:50)
at com.***.***.im.service.platform.service.impl.ImMessageServiceImpl.saveNewSessionMessage(ImMessageServiceImpl.java:543)
通过日志查看代码,觉得不大可能是同一个事务并发执行导致的死锁
2.查看隔离级别
select @@tx_isolation; //当前session隔离级别
select @@global.tx_isolation; //全局回话隔离级别
业务代码有可能使用默认的隔离级别,默认的级别就是全局的隔离级别;业务也可能设置了当前事物的隔离级别,我们使用的默认级别,是RR(可重复读)
3.查看最近一次innoDB监测的死锁
联系DBA,查看发生死锁的业务对应的数据库,和innodb记录的死锁日志
show engine innodb status;
查询得到最近的一次死锁日志为:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-04-01 23:32:49 0x7f6306adb700
*** (1) TRANSACTION:
TRANSACTION 23734694036, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 7 lock struct(s), heap size 1136, 25 row lock(s)
MySQL thread id 7109502, OS thread handle 140046693021440, query id 5270358204 172.31.21.66 im_w1 updating
update im_servicer_session
set unread_count=0
where session_id=142298 and servicer_id=8708
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5351 page no 18 n bits 224 index PRIMARY of table `im`.`im_servicer_session` trx id 23734694036
lock_mode X locks rec but not gap waiting
Record lock, heap no 148 PHYSICAL RECORD: n_fields 11; compact format; info bits 0
0: len 8; hex 00000000000006a4; asc ;;
1: len 6; hex 000586b2b07f; asc ;;
2: len 7; hex 27000002141d37; asc ' 7;;
3: len 8; hex 0000000000022bda; asc + ;;
4: len 8; hex 0000000000002204; asc " ;;
5: len 1; hex 00; asc ;;
6: len 5; hex 9943c20000; asc C ;;
7: len 1; hex 00; asc ;;
8: len 4; hex 00000003; asc ;;
9: len 5; hex 99a2c37642; asc vB;;
10: len 5; hex 99a2c37830; asc x0;;
*** (2) TRANSACTION:
TRANSACTION 23734694015, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
MySQL thread id 7108183, OS thread handle 140063290537728, query id 5270358482 172.31.35.143 im_w1 update
insert into im_message_0_34
( chat_id,
message_type,
message,
house_id,
send_time,
send_status,
receive_status,
show_type )
values ( '4NzP0DZO7wngS5YiGFcJTKu0L2Xrhan7zpbBBO/1KdQ=',
0,
'嗯嗯',
106874,
'2019-04-01 23:32:48.113',
0,
1,
0 )
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 5351 page no 18 n bits 224 index PRIMARY of table `im`.`im_servicer_session` trx id 23734694015
lock_mode X locks rec but not gap
Record lock, heap no 148 PHYSICAL RECORD: n_fields 11; compact format; info bits 0
0: len 8; hex 00000000000006a4; asc ;;
1: len 6; hex 000586b2b07f; asc ;;
2: len 7; hex 27000002141d37; asc ' 7;;
3: len 8; hex 0000000000022bda; asc + ;;
4: len 8; hex 0000000000002204; asc " ;;
5: len 1; hex 00; asc ;;
6: len 5; hex 9943c20000; asc C ;;
7: len 1; hex 00; asc ;;
8: len 4; hex 00000003; asc ;;
9: len 5; hex 99a2c37642; asc vB;;
10: len 5; hex 99a2c37830; asc x0;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5388 page no 1531 n bits 264 index idx_chat_id of table `im`.`im_message_0_34` trx id 23734694015
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 110 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 30; hex 344f69384254415559786c496a483947657577705071365a3764794f546e; asc 4Oi8BTAUYxlIjH9GeuwpPq6Z7dyOTn; (total 44 bytes);
1: len 8; hex 00000000000069a0; asc i ;;
*** WE ROLL BACK TRANSACTION (2)
从日志中可以看到只是简单的记录排它锁(X lock),并非间隙锁(gap lock)。还能发现第一个事务阻塞在了更新会话的SQL语句中,经查询得到是更新消息为已读的SQL,第二个事务阻塞在了保存消息的SQL语句中,死锁发生的两个事务的代码分别如下:
TRANSACTION 23734694036
//更新会话时间
imServicerSessionService.updateSessionTime(sessionVo.getSessionId(), EnumServicerSessionState.IN_SESSION);
//...时间较长的请求
if (md.getMessageId() != null && md.getMessageId() > 0) {
logger.info("修改消息");
imMessageDao.update(md);
}else{
imMessageDao.insert(md);
}
TRANSACTION 23734694015
if (LoginUserUtil.isServicer()) {
imMessageDao.markServicerMessageRead(chatId,baseSubTable.getTableName(),houseId, loginInfo.getAccountId());
imServicerSessionService.resetUnreadCount(imSessionVoList.get(0).getSessionId(), loginInfo.getAccountId());
}
4.会话过程
5.解决办法
- 解决死锁可以从死锁发生的条件入手,最容易解决的就是更改获取资源的顺序,在这个案例中可以更改的是事务TRANSACTION 23734694015里面两个SQL执行的顺序,因为他们没有依赖关系
- 其次是避免长事务,让事务执行的时间尽可能少,让事务的覆盖范围尽可能小,长事务会导致并发度降低,且会有更多的SQL查询延迟
- 给整个方法加事务是否是必须的?可以不加事务的尽量不加