postgresql/lightdb vacuum对性能的影响及彻底理解表膨胀
先来看测试结果
zjh@postgres=# create table big_table(id int,v text); CREATE TABLE zjh@postgres=# ALTER TABLE big_table SET (autovacuum_enabled = off); # 关闭自动vacuum ALTER TABLE zjh@postgres=# insert into big_table select id, rpad('x',64,'x') from generate_series(1,1000000) id; INSERT 0 1000000 zjh@postgres=# zjh@postgres=# \timing on Timing is on. zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 76.603 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 60.682 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 60.963 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 60.632 ms ### 稳定的执行时间 zjh@postgres=# update big_table set v = v, id = id; #生成50%死元祖 UPDATE 1000000 Time: 1006.034 ms (00:01.006) zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 307.994 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 75.222 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 115.800 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 109.309 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 76.994 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 76.219 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 75.804 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 75.834 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 76.684 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 76.299 ms ## 稳定后带死元祖的执行时间,25%以上的额外时延 zjh@postgres=# update big_table set v = v, id = id; UPDATE 1000000 Time: 1923.425 ms (00:01.923) zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 352.238 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 103.585 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 211.861 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 144.573 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 99.129 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 100.284 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 99.148 ms ## 两次update后,额外负载65% zjh@postgres=# vacuum big_table ; VACUUM Time: 214.800 ms zjh@postgres=# zjh@postgres=# vacuum big_table ; VACUUM Time: 11.348 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 88.478 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 85.893 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 87.403 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 85.340 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 85.990 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 86.514 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 85.684 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 84.336 ms # vacuum之后,仍然还有40%的额外负载 zjh@postgres=# vacuum full big_table ; # full vacuum VACUUM Time: 416.220 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 66.535 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 63.514 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 63.039 ms ## 重组织后,负载基本消除,虽然也有一点点,但基本可以忽略不计了。 zjh@postgres=# update big_table set v = v, id = id; UPDATE 1000000 Time: 1116.814 ms (00:01.117) zjh@postgres=# update big_table set v = v, id = id; UPDATE 1000000 Time: 2250.193 ms (00:02.250) zjh@postgres=# update big_table set v = v, id = id; UPDATE 1000000 Time: 1264.835 ms (00:01.265) zjh@postgres=# update big_table set v = v, id = id; UPDATE 1000000 Time: 1266.069 ms (00:01.266) zjh@postgres=# update big_table set v = v, id = id; UPDATE 1000000 Time: 2000.205 ms (00:02.000) zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 384.475 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 174.367 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 364.749 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 178.216 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 176.623 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 170.568 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 170.408 ms zjh@postgres=# zjh@postgres=# zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 172.460 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 171.467 ms ## 五次update之后 zjh@postgres=# update big_table set v = v, id = id; UPDATE 1000000 Time: 1114.980 ms (00:01.115) zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 455.640 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 197.581 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 335.761 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 212.413 ms zjh@postgres=# select count(1) from big_table ; count --------- 1000000 (1 row) Time: 195.236 ms ## 六次update之后
zjh@postgres=# select pg_size_pretty(pg_relation_size('big_table'));
pg_size_pretty
----------------
675 MB -- 可见膨胀是非常厉害的。
(1 row)
zjh@postgres=# vacuum big_table ; VACUUM Time: 662.148 ms zjh@postgres=# select pg_size_pretty(pg_relation_size('big_table')); pg_size_pretty ---------------- 675 MB # 不回收,但是新的DML可以复用 (1 row) Time: 0.406 ms zjh@postgres=# vacuum full big_table ; VACUUM Time: 568.773 ms zjh@postgres=# select pg_size_pretty(pg_relation_size('big_table')); pg_size_pretty ---------------- 96 MB # 回收,其实就是重组织 (1 row)
zjh@postgres=# vacuum big_table ; VACUUM Time: 40.442 ms zjh@postgres=# zjh@postgres=# select pg_size_pretty(pg_relation_size('big_table')); pg_size_pretty ---------------- 204 MB (1 row) Time: 0.331 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 98.444 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 105.852 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 104.016 ms zjh@postgres=# select pg_size_pretty(pg_relation_size('big_table')); pg_size_pretty ---------------- 204 MB (1 row) Time: 0.273 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 106.985 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 106.612 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 106.837 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 107.296 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 105.588 ms zjh@postgres=# select pg_size_pretty(pg_relation_size('big_table')); pg_size_pretty ---------------- 204 MB (1 row) Time: 0.347 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 103.695 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 104.665 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 106.016 ms zjh@postgres=# select pg_size_pretty(pg_relation_size('big_table')); pg_size_pretty ---------------- 204 MB (1 row) Time: 0.310 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 104.338 ms # 此时所有死元祖回收的空间已经用完。 zjh@postgres=# select pg_size_pretty(pg_relation_size('big_table')); pg_size_pretty ---------------- 212 MB # 即使大量的死元祖,也会导致空间膨胀 (1 row) Time: 0.446 ms zjh@postgres=# update big_table set v = v, id = id limit 100000; UPDATE 100000 Time: 109.384 ms zjh@postgres=# select pg_size_pretty(pg_relation_size('big_table')); pg_size_pretty ---------------- 222 MB # 即使大量的死元祖,也会导致空间膨胀
(1 row) Time: 0.266 ms
从上可知,因为未清理的死元祖会占用额外空间、也会导致vm未被置为可见,所以不仅I/O更高、占用的buffer更多、也需要额外判断mvcc是否满足,进而对性能的影响还是比较大。只读特性确保避免意外操作导致表被更新,进而产生额外不必要的死元祖、vaccum活动以及vm被误干扰。
所以,对于大表做大量的update/delete(能truncate或insert select,就不要update/delete)之后,最好执行一次vacuum full xxx,释放空间。
postgresql中更新操作的物理实现
8.2版本引入了HOT特性,极大地提高了索引更新的性能。
8.2及之后,支持HOT,索引条目未更新不会插入新的,如下:
同时也会不定期的做碎片清理,如下:
所以在10+新版中,CPU最多多两次指针访问+空间浪费,一段时间后,只会多一次指针访问+0空间浪费。相比这一次指针访问,索引叶子节点中定位索引键就好几次比较了,所以几乎可以忽略不计。
最后,vacuum除了能够解决清理死元组消除表膨胀外,冻结事务ID也是其职责,这影响到事务回卷,可参见 postgresql中的事务回卷原理及预防措施。