通过自关联替代开窗函数实现SQL优化
数据库环境:SQL SERVER 2005
我们平时都是通过开窗析函数来取代自关联,减少表扫描,从而优化SQL。今天,反其道而行,
用自关联改写开窗函数。我们先来看一下SQL。
原SQL语句
SELECT * FROM ( SELECT MIN(a.line_no) OVER ( PARTITION BY a.id_clerk ) AS line_no , a.dslyipt_no , a.int_year , a.int_period , a.id_scheme , d.name_scheme , a.id_schhelp , e.name_schhelp , a.id_dept , b.name_dept , a.id_clerk , c.name_clerk , a.id_slyitem , a.name_slyitem , a.dec_value FROM dslyipt_03 a , ctlm1003 b , ctlm1007 c , ctlm7201 d , ctlm7219 e WHERE a.id_dept = b.id_dept AND a.id_clerk = c.id_clerk AND a.id_scheme = d.id_scheme AND a.id_schhelp = e.id_schhelp AND a.id_scheme = e.id_scheme ) dslyipt_03 WHERE dslyipt_no IN ( '201509000169' ) ------------------------相关表的数据量---------------------------------------- SELECT COUNT(*) FROM dslyipt_03 --2321920 SELECT COUNT(*) FROM dslyipt_03 WHERE dslyipt_no IN ( '201509000169' )--16 SELECT COUNT(*) FROM ctlm1003 --125 SELECT COUNT(*) FROM ctlm1007 --11986 SELECT COUNT(*) FROM ctlm7201 --16 SELECT COUNT(*) FROM ctlm7219 --32
dslyipt_03表是大表,有2321920条数据,过滤之后只有16条数据,原SQL总共执行了30多秒才出结果,
因此可以判定,SQL还有优化的余地。
对2KW的表进行开窗,慢是有原因的。那么,我们是否可以先过滤再开窗呢?我们来改一下SQL,改写之后
的SQL如下:
SELECT MIN(a.line_no) OVER ( PARTITION BY a.id_clerk ) AS line_no , a.dslyipt_no , a.int_year , a.int_period , a.id_scheme , d.name_scheme , a.id_schhelp , e.name_schhelp , a.id_dept , b.name_dept , a.id_clerk , c.name_clerk , a.id_slyitem , a.name_slyitem , a.dec_value FROM dslyipt_03 a , ctlm1003 b , ctlm1007 c , ctlm7201 d , ctlm7219 e WHERE a.id_dept = b.id_dept AND a.id_clerk = c.id_clerk AND a.id_scheme = d.id_scheme AND a.id_schhelp = e.id_schhelp AND a.id_scheme = e.id_scheme AND a.dslyipt_no IN ( '201509000169' )
嗖的一下,数据就出来了!但是,再核对一下数据,发现字段line_no的数据对不上,说明改错了。改写
后对line_no开窗的范围变小了,所以数据有问题。
现在通过自关联来实现开窗函数的功能,自关联改写的SQL如下:
SELECT f.line_no , a.dslyipt_no , a.int_year , a.int_period , a.id_scheme , d.name_scheme , a.id_schhelp , e.name_schhelp , a.id_dept , b.name_dept , a.id_clerk , c.name_clerk , a.id_slyitem , a.name_slyitem , a.dec_value FROM dslyipt_03 a , ctlm1003 b , ctlm1007 c , ctlm7201 d , ctlm7219 e , ( SELECT a.id_clerk , MIN(a.line_no) AS line_no FROM dslyipt_03 a GROUP BY a.id_clerk ) f WHERE a.id_dept = b.id_dept AND a.id_clerk = c.id_clerk AND a.id_scheme = d.id_scheme AND a.id_schhelp = e.id_schhelp AND a.id_scheme = e.id_scheme AND a.id_clerk = f.id_clerk AND a.dslyipt_no IN ( '201509000169' ) OPTION(HASH JOIN)
在语句后面,我们通过查询提示,建议SQL走哈希连接,1S之内就出结果了,经核对,数据无误!
我们来对比一下改写前后TIME、IO
改写前
改写后