postgresql xid回卷预防及排查
监控
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | WITH max_age AS ( SELECT 2000000000 as max_old_xid , setting AS autovacuum_freeze_max_age FROM pg_catalog.pg_settings WHERE name = 'autovacuum_freeze_max_age' ) , per_database_stats AS ( SELECT datname , m.max_old_xid:: int , m.autovacuum_freeze_max_age:: int , age(d.datfrozenxid) AS oldest_current_xid FROM pg_catalog.pg_database d JOIN max_age m ON (true) WHERE d.datallowconn ) SELECT max (oldest_current_xid) AS oldest_current_xid , max ( ROUND ( 100 * (oldest_current_xid / max_old_xid:: float ))) AS percent_towards_wraparound , max ( ROUND ( 100 * (oldest_current_xid / autovacuum_freeze_max_age:: float ))) AS percent_towards_emergency_autovac |
percent_towards_wraparound指标对于警报的设置非常重要。查询使用 age()
函数来确定 TXID 值,因此需要考虑它们是否真的处于耗尽状态,以确定回绕是否是一个真正存在的问题。如果耗尽,数据库将被迫关闭,并可能为了修复而导致停机时间的不确定。这个查询中存在一点缓冲,因为它检查的上限(确切地说是20亿)小于导致耗尽的实际最大整数值。但这已经足够接近了,达到100%的警报应该立即采取行动。
percent_towards_emergency_autovac指标是我们建议监测的附加值,特别是对于以前从未监测过此指标的系统(有关何时可以降低或删除此警报优先级,请参阅下面有关近期冻结成效的说明)。这将监视数据库的最高 TXID 值是否达到autovacuum_freeze_max_age。这是一个用户可调值,默认值为2亿,当任何表的最高 TXID 值达到该值时,该表上会出现更高优先级的autovacuum
。您将认识到这个特殊的vacuum
会话,因为在pg_stat_activity
中它将被标记(以防止回绕)。它的优先级更高,即使禁用autovacuum
,它也会运行,如果手动取消vacuum
,它几乎会立即重新启动。它还需要一些不同的内部低级锁,因此它可能会导致这些表上的争用稍微更高,这取决于它们在紧急vacuum
期间的使用方式。如果您确实遇到争用/锁的问题,并且是紧急vacuum
造成的,则完全可以安全地取消争用/锁,以允许其他事务完成。请注意,它将继续重新启动,直到vacuum
能够成功完成或手动运行vacuum
排查
1 2 3 | # 查看每个库的年龄 SELECT datname, age(datfrozenxid) FROM pg_database;<br><br>SELECT c.oid::regclass# 1个库每个表的年龄排序 SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ( 'r' , 'm' ) order by age desc;<br><br># 查看1个表的年龄<br><br>#通过以下语句可以查找出age年龄大于vacuum_freeze_table_age的表:<br> select datname,age(datfrozenxid) from pg_database where datname not in ( 'postgres' , 'template0' , 'template1' ) and age(datfrozenxid)>( select setting:: int from pg_settings where name= 'vacuum_freeze_table_age' )order by age(datfrozenxid) desc;<br> |
select relname, age(relfrozenxid) as xid_age, pg_size_pretty(pg_table_size(oid)) as table_sizefrom pg_class where relkind = 'r' and pg_table_size(oid) > 1073741824order by xid_age desc limit 20;--vacuum前事务年龄为 61436 relname | xid_age | table_size----------------+---------+------------ test_tab | 31260 | 4327 MB
运维脚本
1 2 3 4 5 | # 对指定数据库中年龄最大的前 50 张表进行 vacuum freeze for cmd in `psql -U用户名 -p端口号 -h连接串 -d数据库名 -c "SELECT 'vacuum freeze '||c.oid::regclass||';' as vacuum_cmd FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm') order by greatest(age(c.relfrozenxid),age(t.relfrozenxid)) desc offset 50 limit 50;" | grep -v vacuum_cmd | grep -v row | grep vacuum`; do |
python脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | from multiprocessing import Pool import psycopg2 args = dict (host = 'pgm-bp10xxxx.pg.rds.aliyuncs.com' , port = 5432 , dbname = '数据库名' , user = '用户名' , password = '密码' ) def vacuum_handler(sql): sql_str = "SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm') order by age desc limit 10; " try : conn = psycopg2.connect( * * args) cur = conn.cursor() cur.execute(sql) conn.commit() cur = conn.cursor() cur.execute(sql_str) print cur.fetchall() conn.close() except Exception as e: print str (e) # 对指定数据库中年龄最大的前 1000 张表进行 vacuum freeze,32 个进程并发执行 def multi_vacuum(): pool = Pool(processes = 32 ) sql_str = "SELECT 'vacuum freeze '||c.oid::regclass||';' as vacuum_cmd FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm') order by greatest(age(c.relfrozenxid),age(t.relfrozenxid)) desc limit 1000;" ; try : conn = psycopg2.connect( * * args) cur = conn.cursor() cur.execute(sql_str) rows = cur.fetchall() for row in rows: cmd = row[ 'vacuum_cmd' ] pool.apply_async(vacuum_handler, (cmd, )) conn.close() pool.close() pool.join() except Exception as e: print str (e) multi_vacuum() |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?