PG优化实战系列-利用hint固定执行计划,避免数据全量排序
项目现场开发给了个慢SQL寻求优化。T3表结构又全是varchar…已经无力吐槽了,之前的博客已经讲过了在磁盘上varchar几乎等同于实际长度,但是如果放到内存,分配的空间是定义长度,很容易导致内存不足,如果嫌命长,可以多试试。此外不要用UUID做主键,UUID做主键会导致索引结构大,同时UUID是随机的,入库时很容易导致叶子节点分裂,破坏B树结构,影响查询,入库以及更新效率。
--原始SQL select t3.apptype, t3.uname, t3.uid, t3.nickname, t3.action_time, t3.content from t1 inner join t2 on (t1.task_id = t2.task_id or t1.child_task_id = t2.task_id) inner join t3 on t2.md_id = t3.md_id where t1.id = 1 order by t3.action_time desc limit 10 offset 0; --对应执行计划 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=2226.68..2226.70 rows=10 width=206) (actual time=1770.610..1770.641 rows=10 loops=1) Buffers: shared hit=776920 -> Sort (cost=2226.68..2227.82 rows=458 width=206) (actual time=1770.607..1770.614 rows=10 loops=1) Sort Key: t3.action_time DESC Sort Method: top-N heapsort Memory: 30kB Buffers: shared hit=776920 -> Nested Loop (cost=29.33..2216.78 rows=458 width=206) (actual time=26.288..1674.090 rows=106066 loops=1) Buffers: shared hit=776920 -> Nested Loop (cost=28.77..1666.80 rows=458 width=33) (actual time=26.244..336.274 rows=162491 loops=1) Buffers: shared hit=20890 -> Seq Scan on t1 (cost=0.00..1.18 rows=1 width=292) (actual time=0.008..0.015 rows=1 loops=1) Filter: (id = 1) Rows Removed by Filter: 13 Buffers: shared hit=1 -> Bitmap Heap Scan on t2 (cost=28.77..1661.05 rows=458 width=47) (actual time=26.230..157.987 rows=162491 loops=1) Recheck Cond: (((t1.task_id)::text = (task_id)::text) OR ((t1.child_task_id)::text = (task_id)::text)) Heap Blocks: exact=19013 Buffers: shared hit=20889 -> BitmapOr (cost=28.77..28.77 rows=458 width=0) (actual time=23.810..23.811 rows=0 loops=1) Buffers: shared hit=1876 -> Bitmap Index Scan on "TTRR_CONFLICT_TASK_ID_IDX" (cost=0.00..14.27 rows=229 width=0) (actual time=23.790..23.791 rows=162491 loops=1) Index Cond: ((task_id)::text = (t1.task_id)::text) Buffers: shared hit=1872 -> Bitmap Index Scan on "TTRR_CONFLICT_TASK_ID_IDX" (cost=0.00..14.27 rows=229 width=0) (actual time=0.012..0.013 rows=0 loops=1) Index Cond: ((task_id)::text = (t1.child_task_id)::text) Buffers: shared hit=4 -> Index Scan using idx_mdid on t3 (cost=0.56..1.19 rows=1 width=239) (actual time=0.006..0.006 rows=1 loops=162491) Index Cond: ((md_id)::text = (t2.md_id)::text) Buffers: shared hit=756030 Planning Time: 0.336 ms Execution Time: 1770.725 ms
--命中10w+ -> Nested Loop (cost=29.33..2216.78 rows=458 width=206) (actual time=26.288..1674.090 rows=106066 loops=1) --循环16w+次,耗时947ms -> Index Scan using idx_mdid on t3 (cost=0.56..1.19 rows=1 width=239) (actual time=0.006..0.006 rows=1 loops=162491)
从截取的执行计划段可以看到总体结果集捞出来了10w+的数据,而实际输出的分页只需10条,显然执行计划里面将全部命中结果获取后再进行排序取10条的方式很不明智。
验证:不全部命中直接返回10条是否能提速?
--验证SQL,直接去掉order by explain (analyze,buffers) select t3.apptype, t3.uname, t3.uid, t3.nickname, t3.action_time, t3.content from t1 inner join t2 on (t1.task_id = t2.task_id or t1.child_task_id = t2.task_id) inner join t3 on t2.md_id = t3.md_id where t1.id = 1 limit 10 offset 0; --对应执行计划 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=29.33..77.09 rows=10 width=206) (actual time=29.312..29.474 rows=10 loops=1) Buffers: shared hit=1932 -> Nested Loop (cost=29.33..2216.78 rows=458 width=206) (actual time=29.310..29.444 rows=10 loops=1) Buffers: shared hit=1932 -> Nested Loop (cost=28.77..1666.80 rows=458 width=33) (actual time=29.271..29.291 rows=11 loops=1) Buffers: shared hit=1878 -> Seq Scan on t1 (cost=0.00..1.18 rows=1 width=292) (actual time=0.008..0.009 rows=1 loops=1) Filter: (id = 1) Rows Removed by Filter: 10 Buffers: shared hit=1 -> Bitmap Heap Scan on t2 (cost=28.77..1661.05 rows=458 width=47) (actual time=29.257..29.265 rows=11 loops=1) Recheck Cond: (((t1.task_id)::text = (task_id)::text) OR ((t1.child_task_id)::text = (task_id)::text)) Heap Blocks: exact=1 Buffers: shared hit=1877 -> BitmapOr (cost=28.77..28.77 rows=458 width=0) (actual time=26.835..26.836 rows=0 loops=1) Buffers: shared hit=1876 -> Bitmap Index Scan on "TTRR_CONFLICT_TASK_ID_IDX" (cost=0.00..14.27 rows=229 width=0) (actual time=26.810..26.811 rows=162491 loops=1) Index Cond: ((task_id)::text = (t1.task_id)::text) Buffers: shared hit=1872 -> Bitmap Index Scan on "TTRR_CONFLICT_TASK_ID_IDX" (cost=0.00..14.27 rows=229 width=0) (actual time=0.014..0.015 rows=0 loops=1) Index Cond: ((task_id)::text = (t1.child_task_id)::text) Buffers: shared hit=4 -> Index Scan using idx_mdid on t3 (cost=0.56..1.19 rows=1 width=239) (actual time=0.011..0.011 rows=1 loops=11) Index Cond: ((md_id)::text = (t2.md_id)::text) Buffers: shared hit=54 Planning Time: 1.044 ms Execution Time: 29.538 ms
从以上执行计划可以看到,在不全部命中直接输出的情况下,速度提升巨大。与业务沟通后,输出必须带时间排序。
思路:能否只命中部分数据后直接返回?
“order by t3.action_time desc limit 10 offset 0;”实际是取最近数据的10条。
与业务沟通后可以增加action_time过滤条件,如近3天、5天。
--对应SQL explain (analyze,buffers) select t3.apptype, t3.uname, t3.uid, t3.nickname, t3.action_time, t3.content from t1 inner join t2 on (t1.task_id = t2.task_id or t1.child_task_id = t2.task_id) inner join t3 on t2.md_id = t3.md_id where t1.id = 1 and action_time>'1610035195' order by t3.action_time desc limit 10 offset 0; --对应执行计划 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=3461.45..3461.46 rows=4 width=205) (actual time=1789.902..1789.931 rows=10 loops=1) Buffers: shared hit=781825 -> Sort (cost=3461.45..3461.46 rows=4 width=205) (actual time=1789.899..1789.906 rows=10 loops=1) Sort Key: t3.action_time DESC Sort Method: top-N heapsort Memory: 31kB Buffers: shared hit=781825 -> Nested Loop (cost=45.35..3461.41 rows=4 width=205) (actual time=46.231..1760.202 rows=106066 loops=1) Buffers: shared hit=781825 -> Nested Loop (cost=44.79..1766.05 rows=460 width=33) (actual time=35.603..387.490 rows=162491 loops=1) Buffers: shared hit=25795 -> Seq Scan on t1 (cost=0.00..1.18 rows=1 width=292) (actual time=0.013..0.019 rows=1 loops=1) Filter: (id = 1) Rows Removed by Filter: 13 Buffers: shared hit=1 -> Bitmap Heap Scan on t2 (cost=44.79..1760.27 rows=460 width=47) (actual time=35.582..189.271 rows=162491 loops=1) Recheck Cond: (((t1.task_id)::text = (task_id)::text) OR ((t1.child_task_id)::text = (task_id)::text)) Heap Blocks: exact=22050 Buffers: shared hit=25794 -> BitmapOr (cost=44.79..44.79 rows=460 width=0) (actual time=32.343..32.344 rows=0 loops=1) Buffers: shared hit=3744 -> Bitmap Index Scan on "TTRR_CONFLICT_TASK_ID_IDX" (cost=0.00..22.28 rows=230 width=0) (actual time=32.316..32.317 rows=162491 loops=1) Index Cond: ((task_id)::text = (t1.task_id)::text) Buffers: shared hit=3740 -> Bitmap Index Scan on "TTRR_CONFLICT_TASK_ID_IDX" (cost=0.00..22.28 rows=230 width=0) (actual time=0.015..0.016 rows=0 loops=1) Index Cond: ((task_id)::text = (t1.child_task_id)::text) Buffers: shared hit=4 -> Index Scan using idx_mdid on t3 (cost=0.56..3.68 rows=1 width=238) (actual time=0.007..0.007 rows=0 loops=162491) Index Cond: ((md_id)::text = (t2.md_id)::text) Filter: ((action_time)::text > '1610035195'::text) Rows Removed by Filter: 0 Buffers: shared hit=756030 Planning Time: 0.411 ms Execution Time: 1790.004 ms
从执行计划可以看到没有按照想象中的来执行;
试试hint
--SQL /*+ indexscan(t3 idx_index_webpost_action_time2) */ explain (analyze,buffers) select t3.apptype, t3.uname, t3.uid, t3.nickname, t3.action_time, t3.content from t1 inner join t2 on (t1.task_id = t2.task_id or t1.child_task_id = t2.task_id) inner join t3 on t2.md_id = t3.md_id where t1.id = 1 and action_time>'1610035195' order by t3.action_time desc limit 10 offset 0; --执行计划 QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---- Limit (cost=174120.21..174120.22 rows=4 width=205) (actual time=709.149..709.188 rows=10 loops=1) Buffers: shared hit=66616, temp read=1725 written=1725 -> Sort (cost=174120.21..174120.22 rows=4 width=205) (actual time=709.147..709.161 rows=10 loops=1) Sort Key: t3.action_time DESC Sort Method: top-N heapsort Memory: 28kB Buffers: shared hit=66616, temp read=1725 written=1725 -> Hash Join (cost=1772.23..174120.17 rows=4 width=205) (actual time=523.189..682.476 rows=28597 loops=1) Hash Cond: ((t3.md_id)::text = (t2.md_id)::text) Buffers: shared hit=66616, temp read=1725 written=1725 -> Index Scan using idx_index_webpost_action_time2 on t3 (cost=0.43..172161.43 rows=49838 width=238) (actual time=0.019..54.132 rows=46447 loops=1) Index Cond: ((action_time)::text > '1610035195'::text) Buffers: shared hit=40821 -> Hash (cost=1766.05..1766.05 rows=460 width=33) (actual time=497.318..497.320 rows=162491 loops=1) Buckets: 131072 (originally 1024) Batches: 2 (originally 1) Memory Usage: 7169kB Buffers: shared hit=25795, temp written=526 -> Nested Loop (cost=44.79..1766.05 rows=460 width=33) (actual time=34.844..369.624 rows=162491 loops=1) Buffers: shared hit=25795 -> Seq Scan on t1 (cost=0.00..1.18 rows=1 width=292) (actual time=0.015..0.021 rows=1 loops=1) Filter: (id = 1) Rows Removed by Filter: 13 Buffers: shared hit=1 -> Bitmap Heap Scan on t2 (cost=44.79..1760.27 rows=460 width=47) (actual time=34.822..184.299 rows=162491 loops=1) Recheck Cond: (((t1.task_id)::text = (task_id)::text) OR ((t1.child_task_id)::text = (task_id)::text)) Heap Blocks: exact=22050 Buffers: shared hit=25794 -> BitmapOr (cost=44.79..44.79 rows=460 width=0) (actual time=31.881..31.882 rows=0 loops=1) Buffers: shared hit=3744 -> Bitmap Index Scan on "TTRR_CONFLICT_TASK_ID_IDX" (cost=0.00..22.28 rows=230 width=0) (actual time=31.856..31.857 rows=162491 loops=1) Index Cond: ((task_id)::text = (t1.task_id)::text) Buffers: shared hit=3740 -> Bitmap Index Scan on "TTRR_CONFLICT_TASK_ID_IDX" (cost=0.00..22.28 rows=230 width=0) (actual time=0.014..0.015 rows=0 loops=1) Index Cond: ((task_id)::text = (t1.child_task_id)::text) Buffers: shared hit=4 Planning Time: 0.429 ms Execution Time: 709.298 ms
从执行计划可以看到在构建hash表的时候有很多临时read、written,临时读写一方面可能是work_mem不够,另外一方面最根本原因是表结构字段类型全被设置成varchar(500)导致;
-> Index Scan using idx_index_webpost_action_time2 on t3 (cost=0.43..172161.43 rows=49838 width=238) (actual time=0.019..54.132 rows=46447 loops=1)
此外可以看到捞取T3表的行数从10w降低到4.6w(此值跟选定的时间范围有关,业务侧可以根据实际情况选择1,3,5天);
同时可以看到通过T3的action_time索引扫描cost代价很大,导致优化器不选择此执行计划,鉴于实际情况使用此执行计划更满足业务场景,需要用hint对pg进行执行计划选取。
问题总结:
1、 表结构要设计合理,字段类型不要为了省事全部设置成varchar类型
2、 UUID尽量少用,可以使用序列代替,如果是为了保证行唯一,可以考虑用hashtext(uuid)::bigint代替。
3、 根据业务场景合理变更SQL,如果执行计划不符合要求可以调整配置参数或者用hint。
4、 优化的原则是尽量减少磁盘IO,走索引或者减少捞取最终或中间数据量等都能达到目的。