Currtid 函数与性能问题

对于Oracle ，一条tuple 的 rowid正常是不会变化的（引发row movement的操作除外，如：跨分区迁移update，表收缩），因此，应用设计上可以方便的使用rowid，加快访问速度。对于KingbaseES，同样也有ctid，格式 “(blockid,slotid)”，通过ctid也能快速的访问数据。但问题在于KingbaseES的MVCC机制，使得ctid会随update操作变化，这种情况下，使用ctid有可能访问不到数据。

一、rowid 与 ctid 的差异

与oracle 不同，KingbaseES ctid 会因为 update 操作而变化，因此，在实际使用较少 ctid 。举个例子：

A用户	B用户
select ctid from t1 where id=1;返回 (0,1)
	select ctid from t1 where id=1;返回 (0,1)
update t1 set name='aa' where ctid='(0,1)';
select ctid from t1 where id=1;返回 (0,2)
	select * from t1 where ctid='(0,1)'; 无返回

可以看到，在有并发的情况下，用ctid访问是不可靠的。例子中，B用户通过ctid 访问时，就会发现找不到数据。

二、使用currtid

我们知道，PG的update操作实际delete and insert 的结合体。对于update操作完成后，在vacuum 之前，原始tuple是包含指向新tuple的ctid。函数 currtid 可以取得updated tuple的最新ctid。具体见以下例子：

test=# insert into t1 values(1,'a');
INSERT 0 1
test=# select ctid from t1 where id=1;
 ctid  
-------
 (0,1)
(1 row)

test=# update t1 set name='aa' where id=1;
UPDATE 1
test=# select ctid from t1 where id=1;
 ctid  
-------
 (0,2)
(1 row)

test=# select * from t1 where ctid='(0,1)';
 id | name 
----+------
(0 rows)

test=# select currtid('t1'::regclass,'(0,1)');
 currtid 
---------
 (0,2)
(1 row)

test=# select * from t1 where ctid=currtid('t1'::regclass,'(0,1)');
 id |   name    
----+-----------
  1 | aa       
(1 row)

可以看到，通过将初始的 ctid 传递给 currtid 函数，可以取得最新的 ctid

三、存在问题

从以上例子可以看到，使用currtid 可以避免期间数据被修改的问题。但实际上，这里有个性能的问题。请看实际例子：

test=# explain select * from t1 where ctid=currtid('t1'::regclass,'(0,1)');
                       QUERY PLAN                       
--------------------------------------------------------
 Seq Scan on t1  (cost=0.00..26.95 rows=1 width=44)
   Filter: (ctid = currtid('16387'::oid, '(0,1)'::tid))
(2 rows)

test=# explain select * from t1 where ctid='(0,2)';
                    QUERY PLAN                     
---------------------------------------------------
 Tid Scan on t1  (cost=0.00..4.01 rows=1 width=44)
   TID Cond: (ctid = '(0,2)'::tid)
(2 rows)

可以看到，对于 ctid=currtid('t1'::regclass,'(0,1)') ，实际上采取的是 seqscan 。currtid('t1'::regclass,'(0,1)') 是在等式右边的，不涉及 ctid 的转换，为什么无法使用 Tid Scan ? 我们来看currtid 函数属性：

test=# select proname,provolatile from pg_proc where proname='currtid';
 proname | provolatile 
---------+-------------
 currtid | v

函数是 volatile ，对于SQL：select * from t1 where ctid=currtid('t1'::regclass,'(0,1)')，如果先计算 currtid('t1'::regclass,'(0,1)') 的结果，传给ctid，再执行SQL。在这期间（从即使currtid，到访问到实际的tuple，时间不确定，可能很长，也可能很短，看执行计划），如果该tuple被修改，则可能返回错误的结果（无记录）。如果采用全表，针对每个tuple，currtid('t1'::regclass,'(0,1)') 都要计算一次（volatile，即使参数值相同，不同时间返回的值是不同的），函数 currtid('t1'::regclass,'(0,1)') 的结果运算推迟到tuple访问的同时进行，避免了错误的结果。

四、修改函数属性为immutable

对于 currtid('t1'::regclass,'(0,1)') ，不同时刻执行，返回的结果可能不同，因此，修改函数的属性实际的风险是非常大的。

如果把函数的属性改成immutable 情况下的执行计划：

test=# update pg_proc set provolatile='i' where proname='currtid';
UPDATE 1
test=# explain select * from t1 where ctid=currtid('t1'::regclass,'(0,1)');
                    QUERY PLAN                     
---------------------------------------------------
 Tid Scan on t1  (cost=0.00..4.01 rows=1 width=44)
   TID Cond: (ctid = '(0,2)'::tid)
(2 rows)

可以看到，修改函数的属性为 immutable后，可以走 Tid Scan了。如果认为数据不一致的风险可以忽略，可以将currtid 赋值给变量方式，这样就无需每行都调用 currtid 函数。具体如下：

test=# declare
test-#   v_ctid tid;
test-#   v_cnt integer;
test-# begin
test-#   select ctid into v_ctid from t1 where relname='t2';
test-#   for i in 1..1000 loop
test-#     select currtid('t1'::regclass,v_ctid) into v_ctid;
test-#     select count(*) into v_cnt from t1 where ctid=v_ctid;
test-#   end loop;
test-# end;
test-# /
ANONYMOUS BLOCK
Time: 16.463 ms

test=# declare
test-#   v_ctid tid;
test-#   v_cnt integer;
test-# begin
test-#   select ctid into v_ctid from t1 where relname='t2';
test-#   for i in 1..1000 loop
test-#     select count(*) into v_cnt from t1 where ctid=currtid('t1'::regclass,v_ctid);
test-#   end loop;
test-# end;
test-# /
ANONYMOUS BLOCK
Time: 1007.308 ms (00:01.007)

posted @ 2021-06-22 19:10 KINGBASE研究院阅读(179) 评论(0) 收藏举报

刷新页面返回顶部

KINGBASE研究院

Currtid 函数与性能问题

一、rowid 与 ctid 的差异

二、使用currtid

三、存在问题

四、修改函数属性为immutable

公告