KingbaseES 中select for update语句引起的锁问题
背景
客户现场执行压测时候,发生周期性的TPS大幅下降,通过查看kwr报告发现DBcpu时间占DBtime时间很少,百分之90的DBtime花费在tuple锁等待上,等待事件类型是lock。
等待时间最多的语句是select fd_id,ctid,xmin from ... for update
含义
select for update语句目的在查询时,避免其他用户对该表进行dml等操作,造成表的不一致性。
select for update语句需要等待表锁,以及表对应行锁释放之后才能返回查询结果。
表级锁模式
常见锁模式以及应用场景:
ACCESS SHARE:select操作获取该模式锁资源,通常情况下所有只读取不修改表的查询都会获取该模式锁资源
ROW SHARE: select for update 和 select for share 命令获取该模式锁资源
ROW EXCLUSIVE: DML操作通过会获取该模式锁资源,通常情况下任何需要修改表数据的操作都会持有该模式锁资源
SHARE UPDATE EXCLUSIVE:对于 vacuum(非full)、create index concurrnetly、reindex concurrently、create statistics、alter index相关、alter table相关操作会持有该模式锁资源,该模式锁资源主要是为了范围并发对同一张表DDL变更以及vacuum操作
SHARE: create index (非concurrently)操作会持有该模式锁资源,该模式锁资源可阻止并发对表的创建索引操作
SHARE ROW EXCLUSIVE: cerate trigger、一些alter table操作会持有该模式锁资源
EXCLUSIVE: refresh materialized view concurrently操作会持有该模式锁资源
ACCESS EXCLUSIVE:lock table模式锁定模式,drop table、truncate、reindex、 cluster、vacuum full、refresh materialized view (without concurrently) 还有一些alter table、alter index操作需要获取该模式锁资源,该模式锁资源保证了持有者唯一访问表
测试
测试环境隔离级别是 read committed,数据库版本是KingbaseESV8R6C7
1.update语句阻塞select for update语句
开启一个会话,不提交
begin;
update t3 set name='qite' where id=1;
开启另一个会话执行,发现被阻塞
select id,name from t3 for update;
在另一个会话查看锁情况。
查看表t3有关的锁,这里表t3持有了tuple上的AccessExclusiveLock锁,这是最高级别的锁,阻塞一切sql语句。
pid4203持有的锁row share表示正在执行select for update语句,AccessExclusiveLock最高级别锁也是pid4203需要获取的。
pid4205持有的RowExclusiveLock锁表示执行dml语句需要获取的锁,这里可以看出pid4205会话很可能执行了update语句。
SELECT
pg_locks.pid,
a.datname,
locktype,
virtualtransaction,
transactionid,
nspname,
relname,
mode,
granted,
cast(date_trunc('second',query_start) AS timestamp) AS query_start
FROM
pg_locks
LEFT OUTER JOIN pg_class ON (pg_locks.relation = pg_class.oid)
LEFT OUTER JOIN pg_namespace ON (pg_namespace.oid = pg_class.relnamespace),
pg_stat_activity a
WHERE NOT pg_locks.pid = pg_backend_pid()
AND pg_locks.pid=a.pid
and pg_class.relname='t3';
pid | datname | locktype | virtualtransaction | transactionid | nspname | relname | mode | granted | query_start
------+---------+----------+--------------------+---------------+---------+---------+---------------------+---------+---------------------
4203 | test | tuple | 4/43 | | public | t3 | AccessExclusiveLock | t | 2023-05-16 15:39:11
4203 | test | relation | 4/43 | | public | t3 | RowShareLock | t | 2023-05-16 15:39:11
4205 | test | relation | 5/10 | | public | t3 | RowExclusiveLock | t | 2023-05-16 15:39:08
4205 | test | relation | 5/10 | | public | t3 | AccessShareLock | t | 2023-05-16 15:39:08
(4 rows)
执行另一个更直观的sql语句查看锁情况,开启第三个会话执行如下sql。
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS current_statement_in_blocking_process
FROM sys_catalog.sys_locks blocked_locks
JOIN sys_catalog.sys_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN sys_catalog.sys_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN sys_catalog.sys_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement | current_statement_in_blocking_process
-------------+--------------+--------------+---------------+------------------------------------+---------------------------------------
4203 | system | 4205 | system | select id,name from t3 for update; | update t3 set name='qite' where id=1;
(1 row)
在操作系统查看有关pid,4203pid中的select语句被阻塞,阻塞源头是第一个会话执行的update语句,pid是4205。
kingbas+ 4203 4154 0 15:27 ? 00:00:00 kingbase: system test [local] SELECT waiting
kingbas+ 4205 4154 0 15:27 ? 00:00:00 kingbase: system test [local] idle in transaction
第一个会话中的update语句commit或rollback后,事务结束后,第二个会话中的select for update语句可以执行成功,原因是blocking_pid被释放了。
2.select for update语句阻塞update语句
开启一个会话执行如下sql
begin;
select id,name from t3 for update;
开启另一个会话执行,发现被阻塞
update t3 set name='qitu' where id=1;
在另一个会话查看锁情况
pid4203对应RowExclusiveLock锁表示执行dml语句需要获取的锁,pid4205持有的RowShareLock锁说明此会话在执行select for update语句。
SELECT
pg_locks.pid,
a.datname,
locktype,
virtualtransaction,
transactionid,
nspname,
relname,
mode,
granted,
cast(date_trunc('second',query_start) AS timestamp) AS query_start
FROM
pg_locks
LEFT OUTER JOIN pg_class ON (pg_locks.relation = pg_class.oid)
LEFT OUTER JOIN pg_namespace ON (pg_namespace.oid = pg_class.relnamespace),
pg_stat_activity a
WHERE NOT pg_locks.pid = pg_backend_pid()
AND pg_locks.pid=a.pid
and pg_class.relname='t3';
pid | datname | locktype | virtualtransaction | transactionid | nspname | relname | mode | granted | query_start
------+---------+----------+--------------------+---------------+---------+---------+------------------+---------+---------------------
4203 | test | tuple | 4/44 | | public | t3 | ExclusiveLock | t | 2023-05-16 15:54:16
4203 | test | relation | 4/44 | | public | t3 | RowExclusiveLock | t | 2023-05-16 15:54:16
4203 | test | relation | 4/44 | | public | t3 | AccessShareLock | t | 2023-05-16 15:54:16
4205 | test | relation | 5/11 | | public | t3 | RowShareLock | t | 2023-05-16 15:54:00
(4 rows)
执行如下sql,果然阻塞源是pig4205执行的语句select for update。
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS current_statement_in_blocking_process
FROM sys_catalog.sys_locks blocked_locks
JOIN sys_catalog.sys_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN sys_catalog.sys_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN sys_catalog.sys_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement | current_statement_in_blocking_process
-------------+--------------+--------------+---------------+---------------------------------------+---------------------------------------
4203 | system | 4205 | system | update t3 set name='qitu' where id=1; | select id,name from t3 for update;
(1 row)
操作系统查看对应pid情况,pid4203执行update语句被阻塞,阻塞源是pid4205。
kingbas+ 4203 4154 0 15:27 ? 00:00:00 kingbase: system test [local] UPDATE waiting
kingbas+ 4205 4154 0 15:27 ? 00:00:00 kingbase: system test [local] idle in transaction
解决方法是持有锁会话结束事务(提交或回滚),随后被阻塞的update语句执行成功,或杀掉持有锁的进程。
总结
select for update几种其他用法:
select * from t for update nowait 不等待行锁释放,提示锁冲突,不返回结果。
select * from t for update wait 5 等待5秒,若行锁仍未释放,则提示锁冲突,不返回结果。
select * from t for update skip locked 查询返回查询结果,但忽略有行锁的记录。
使用FOR UPDATE WAIT”子句有几个优点:
1.防止无限期地等待被锁定的行。
2.允许应用程序中对锁的等待时间进行控制。
3.在不确定sql执行前表是否被锁,wait关键字起到关键作用。
4.若使用了skip locked,则可以越过锁定的行,而不会报错。
此案例提供了数据库中锁阻塞的排查方法,如生产环境中遇到其他类型lock重型锁阻塞,也可用以上方法分析。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!