利用分析函数优化自连接
春节回来上班第一天也够蛋疼的,盖儿又给我发了一个SQL,他说这个SQL db file sequential read 很高,要跑120秒,逻辑读有1626677 叫我帮忙优化一下,SQL语句如下:
select distinct decode(length(a.category_id), 5, decode(a.origin_type, 801, 888888, 999999), a.category_id) category_id, a.notice_code, a.treat_status, lr.real_name as receiver_name, f.send_code, f.policy_code, g.real_name agent_name, f.organ_id, f.dept_id, a.policy_id, a.change_id, a.case_id, a.group_policy_id, a.fee_id, a.auth_id, a.pay_id, cancel_appoint.appoint_time cancel_appoint_time, a.insert_time, a.send_time, a.end_time, f.agency_code, a.REPLY_TIME, a.REPLY_EMP_ID, a.FIRST_DUTY, a.NEED_SEND_PRINT, 11 source from t_policy_problem a, t_policy f, t_agent g, t_letter_receiver lr, t_problem_category pc, t_policy_cancel_appoint cancel_appoint where f.agent_id = g.agent_id(+) and a.policy_id = f.policy_id(+) and lr.main_receiver = 'Y' and a.category_id = pc.category_id and a.item_id = lr.item_id and a.policy_id = cancel_appoint.policy_id(+) And a.Item_Id = (Select Max(item_id) From t_Policy_Problem Where notice_code = a.notice_code) and a.policy_id is not null and a.notice_code is not null and a.change_id is null and a.case_id is null and a.group_policy_id is null and a.origin_type not in (801, 802) and a.pay_id is null and a.category_id not in (130103, 130104, 130102, 140102, 140101) and f.policy_type = 1 and (a.fee_id is null or (a.fee_id is not null and a.origin_type = 701)) and f.organ_id in (select distinct organ_id from T_COMPANY_ORGAN start with organ_id = '107' connect by parent_id = prior organ_id) and pc.NEED_PRITN = 'Y' SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 242 | 731 (1)| | 1 | SORT UNIQUE | | 1 | 242 | 729 (0)| |* 2 | FILTER | | | | | |* 3 | HASH JOIN | | 1 | 242 | 714 (1)| | 4 | NESTED LOOPS | | 1 | 236 | 712 (1)| | 5 | NESTED LOOPS OUTER | | 1 | 219 | 711 (1)| | 6 | NESTED LOOPS | | 1 | 203 | 710 (1)| | 7 | NESTED LOOPS | | 1 | 196 | 709 (1)| | 8 | NESTED LOOPS OUTER | | 1 | 121 | 708 (1)| |* 9 | TABLE ACCESS FULL | T_POLICY_PROBLEM | 1 | 107 | 706 (0)| | 10 | TABLE ACCESS BY INDEX ROWID| T_POLICY_CANCEL_APPOINT | 1 | 14 | 2 (50)| |* 11 | INDEX UNIQUE SCAN | UK1_POLICY_CANCEL_APPOINT | 1 | | | |* 12 | TABLE ACCESS BY INDEX ROWID | T_POLICY | 1 | 75 | 2 (50)| |* 13 | INDEX UNIQUE SCAN | PK_T_POLICY | 1 | | 1 (0)| |* 14 | TABLE ACCESS BY INDEX ROWID | T_PROBLEM_CATEGORY | 1 | 7 | 2 (50)| |* 15 | INDEX UNIQUE SCAN | PK_T_PROBLEM_CATEGORY | 1 | | | | 16 | TABLE ACCESS BY INDEX ROWID | T_AGENT | 1 | 16 | 2 (50)| |* 17 | INDEX UNIQUE SCAN | PK_T_AGENT | 1 | | | |* 18 | INDEX RANGE SCAN | T_LETTER_RECEIVER_IDX_001 | 1 | 17 | 2 (0)| | 19 | VIEW | VW_NSO_1 | 7 | 42 | | |* 20 | CONNECT BY WITH FILTERING | | | | | | 21 | NESTED LOOPS | | | | | |* 22 | INDEX UNIQUE SCAN | PK_T_COMPANY_ORGAN | 1 | 6 | | | 23 | TABLE ACCESS BY USER ROWID | T_COMPANY_ORGAN | | | | | 24 | NESTED LOOPS | | | | | | 25 | BUFFER SORT | | 7 | 70 | | | 26 | CONNECT BY PUMP | | | | | |* 27 | INDEX RANGE SCAN | T_COMPANY_ORGAN_IDX_002 | 7 | 70 | 1 (0)| | 28 | SORT AGGREGATE | | 1 | 21 | | | 29 | TABLE ACCESS BY INDEX ROWID | T_POLICY_PROBLEM | 1 | 21 | 2 (50)| |* 30 | INDEX RANGE SCAN | IDX_POLICY_PROBLEM__N_CODE | 1 | | 3 (0)| -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("SYS_ALIAS_1"."ITEM_ID"= (SELECT /*+ */ MAX("T_POLICY_PROBLEM"."ITEM_ID") FROM "T_POLICY_PROBLEM" "T_POLICY_PROBLEM" WHERE "T_POLICY_PROBLEM"."NOTICE_CODE"=:B1)) 3 - access("F"."ORGAN_ID"="VW_NSO_1"."$nso_col_1") 9 - filter("SYS_ALIAS_1"."POLICY_ID" IS NOT NULL AND "SYS_ALIAS_1"."NOTICE_CODE" IS NOT NULL AND "SYS_ALIAS_1"."CHANGE_ID" IS NULL AND "SYS_ALIAS_1"."CASE_ID" IS NULL AND "SYS_ALIAS_1"."GROUP_POLICY_ID" IS NULL AND TO_NUMBER("SYS_ALIAS_1"."ORIGIN_TYPE")<>801 AND TO_NUMBER("SYS_ALIAS_1"."ORIGIN_TYPE")<>802 AND "SYS_ALIAS_1"."PAY_ID" IS NULL AND "SYS_ALIAS_1"."CATEGORY_ID"<>130103 AND "SYS_ALIAS_1"."CATEGORY_ID"<>130104 AND "SYS_ALIAS_1"."CATEGORY_ID"<>130102 AND "SYS_ALIAS_1"."CATEGORY_ID"<>140102 AND "SYS_ALIAS_1"."CATEGORY_ID"<>140101 AND ("SYS_ALIAS_1"."FEE_ID" IS NULL OR "SYS_ALIAS_1"."FEE_ID" IS NOT NULL AND TO_NUMBER("SYS_ALIAS_1"."ORIGIN_TYPE")=701)) 11 - access("SYS_ALIAS_1"."POLICY_ID"="CANCEL_APPOINT"."POLICY_ID"(+)) 12 - filter(TO_NUMBER("F"."POLICY_TYPE")=1) 13 - access("SYS_ALIAS_1"."POLICY_ID"="F"."POLICY_ID") 14 - filter("PC"."NEED_PRITN"='Y') 15 - access("SYS_ALIAS_1"."CATEGORY_ID"="PC"."CATEGORY_ID") filter("PC"."CATEGORY_ID"<>130103 AND "PC"."CATEGORY_ID"<>130104 AND "PC"."CATEGORY_ID"<>130102 AND "PC"."CATEGORY_ID"<>140102 AND "PC"."CATEGORY_ID"<>140101) 17 - access("F"."AGENT_ID"="G"."AGENT_ID"(+)) 18 - access("LR"."MAIN_RECEIVER"='Y' AND "SYS_ALIAS_1"."ITEM_ID"="LR"."ITEM_ID") 20 - filter("T_COMPANY_ORGAN"."ORGAN_ID"='107') 22 - access("T_COMPANY_ORGAN"."ORGAN_ID"='107') 27 - access("T_COMPANY_ORGAN"."PARENT_ID"=NULL) 30 - access("T_POLICY_PROBLEM"."NOTICE_CODE"=:B1) 64 rows selected. Statistics ---------------------------------------------------------- 7 recursive calls 0 db block gets 1626677 consistent gets 30677 physical reads 128 redo size 2291351 bytes sent via SQL*Net to client 13277 bytes received via SQL*Net from client 1060 SQL*Net roundtrips to/from client 6 sorts (memory) 0 sorts (disk) 15878 rows processed
实际会返回15878行,但是执行计划上说返回1条记录,初学者可能会觉得是统计信息的问题(如果你觉得这个SQL跑得慢是统计信息不准,那么你就是初学者),其实不是这样的
因为有INDEX UNIQUE SCAN ,返回1条记录是正常的(不返回1条才不正常),另外第九步这里,它过滤条件太复杂,CBO在计算基数的时候也很容易把它算少,这里等于1
好了言归正传,这个SQL最坑爹的地方其实 是这个条件
And a.Item_Id = (Select Max(item_id)
From t_Policy_Problem
Where notice_code = a.notice_code)
这里其实相当于 t_Policy_Problem 这个表做自连接,但是这个自连接很坑爹,会导致 t_Policy_Problem 表扫描2次,从执行计划上可以看出,第九步这里它做了一个全表扫描,然后在最后28,29.30 这3步走了索引,然后回表,最后它还要被FILTER过滤 ,恩 这个地方就是 这个SQL的性能瓶颈 那么SQL 改写如下:
WITH t_Policy_Problem_w AS (SELECT tp.*, max(item_id) OVER (PARTITION BY notice_code) max_item_id FROM t_Policy_Problem tp) select distinct decode(length(a.category_id), 5, decode(a.origin_type, 801, 888888, 999999), a.category_id) category_id, a.notice_code, a.treat_status, lr.real_name as receiver_name, f.send_code, f.policy_code, g.real_name agent_name, f.organ_id, f.dept_id, a.policy_id, a.change_id, a.case_id, a.group_policy_id, a.fee_id, a.auth_id, a.pay_id, cancel_appoint.appoint_time cancel_appoint_time, a.insert_time, a.send_time, a.end_time, f.agency_code, a.REPLY_TIME, a.REPLY_EMP_ID, a.FIRST_DUTY, a.NEED_SEND_PRINT, 11 source from t_Policy_Problem_w a, t_policy f, t_agent g, t_letter_receiver lr, t_problem_category pc, t_policy_cancel_appoint cancel_appoint where a.item_id=a.max_item_id and f.agent_id = g.agent_id(+) and a.policy_id = f.policy_id(+) and lr.main_receiver = 'Y' and a.category_id = pc.category_id and a.item_id = lr.item_id and a.policy_id = cancel_appoint.policy_id(+) and a.policy_id is not null and a.notice_code is not null and a.change_id is null and a.case_id is null and a.group_policy_id is null and a.origin_type not in (801, 802) and a.pay_id is null and a.category_id not in (130103, 130104, 130102, 140102, 140101) and f.policy_type = 1 and (a.fee_id is null or (a.fee_id is not null and a.origin_type = 701)) and f.organ_id in (select distinct organ_id from T_COMPANY_ORGAN start with organ_id = '107' connect by parent_id = prior organ_id) and pc.NEED_PRITN = 'Y' ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 21241 | 5289K| | 17992 (0)| | 1 | SORT UNIQUE | | 21241 | 5289K| 11M| 17992 (0)| |* 2 | HASH JOIN | | 21241 | 5289K| 5192K| 17192 (1)| |* 3 | HASH JOIN OUTER | | 21248 | 4938K| 4856K| 16727 (1)| |* 4 | HASH JOIN OUTER | | 21248 | 4606K| 4568K| 15994 (1)| |* 5 | HASH JOIN | | 21248 | 4316K| | 15920 (1)| |* 6 | TABLE ACCESS FULL | T_PROBLEM_CATEGORY | 371 | 2597 | | 4 (0)| |* 7 | HASH JOIN | | 29477 | 5786K| 5712K| 15915 (1)| |* 8 | HASH JOIN | | 62888 | 4974K| | 9575 (1)| | 9 | VIEW | VW_NSO_1 | 7 | 42 | | | |* 10 | CONNECT BY WITH FILTERING | | | | | | | 11 | NESTED LOOPS | | | | | | |* 12 | INDEX UNIQUE SCAN | PK_T_COMPANY_ORGAN | 1 | 6 | | | | 13 | TABLE ACCESS BY USER ROWID| T_COMPANY_ORGAN | | | | | | 14 | NESTED LOOPS | | | | | | | 15 | BUFFER SORT | | 7 | 70 | | | | 16 | CONNECT BY PUMP | | | | | | |* 17 | INDEX RANGE SCAN | T_COMPANY_ORGAN_IDX_002 | 7 | 70 | | 1 (0)| |* 18 | TABLE ACCESS FULL | T_POLICY | 637K| 45M| | 9569 (0)| |* 19 | VIEW | | 300K| 34M| | | | 20 | WINDOW SORT | | 300K| 30M| 88M| 5648 (0)| | 21 | TABLE ACCESS FULL | T_POLICY_PROBLEM | 300K| 30M| | 706 (0)| | 22 | TABLE ACCESS FULL | T_POLICY_CANCEL_APPOINT | 86 | 1204 | | 2 (0)| | 23 | TABLE ACCESS FULL | T_AGENT | 88982 | 1390K| | 619 (0)| |* 24 | INDEX FAST FULL SCAN | T_LETTER_RECEIVER_IDX_001 | 300K| 4987K| | 251 (0)| ---------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("A"."ITEM_ID"="LR"."ITEM_ID") 3 - access("F"."AGENT_ID"="G"."AGENT_ID"(+)) 4 - access("A"."POLICY_ID"="CANCEL_APPOINT"."POLICY_ID"(+)) 5 - access("A"."CATEGORY_ID"="PC"."CATEGORY_ID") 6 - filter("PC"."NEED_PRITN"='Y' AND "PC"."CATEGORY_ID"<>130103 AND "PC"."CATEGORY_ID"<>130104 AND "PC"."CATEGORY_ID"<>130102 AND "PC"."CATEGORY_ID"<>140102 AND "PC"."CATEGORY_ID"<>140101) 7 - access("A"."POLICY_ID"="F"."POLICY_ID") 8 - access("F"."ORGAN_ID"="VW_NSO_1"."$nso_col_1") 10 - filter("T_COMPANY_ORGAN"."ORGAN_ID"='107') 12 - access("T_COMPANY_ORGAN"."ORGAN_ID"='107') 17 - access("T_COMPANY_ORGAN"."PARENT_ID"=NULL) 18 - filter(TO_NUMBER("F"."POLICY_TYPE")=1) 19 - filter("A"."ITEM_ID"="A"."MAX_ITEM_ID" AND "A"."POLICY_ID" IS NOT NULL AND "A"."NOTICE_CODE" IS NOT NULL AND "A"."CHANGE_ID" IS NULL AND "A"."CASE_ID" IS NULL AND "A"."GROUP_POLICY_ID" IS NULL AND TO_NUMBER("A"."ORIGIN_TYPE")<>801 AND TO_NUMBER("A"."ORIGIN_TYPE")<>802 AND "A"."PAY_ID" IS NULL AND "A"."CATEGORY_ID"<>130103 AND "A"."CATEGORY_ID"<>130104 AND "A"."CATEGORY_ID"<>130102 AND "A"."CATEGORY_ID"<>140102 AND "A"."CATEGORY_ID"<>140101 AND ("A"."FEE_ID" IS NULL OR "A"."FEE_ID" IS NOT NULL AND TO_NUMBER("A"."ORIGIN_TYPE")=701)) 24 - filter("LR"."MAIN_RECEIVER"='Y') 53 rows selected. Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 115995 consistent gets 42204 physical reads 0 redo size 2182416 bytes sent via SQL*Net to client 13289 bytes received via SQL*Net from client 1060 SQL*Net roundtrips to/from client 7 sorts (memory) 0 sorts (disk) 15879 rows processed
利用分析函数MAX OVER (PARTITION BY) 干掉 那个自连接,减少表访问次数,干掉那个FILTER ,现在逻辑读降低到了115995,性能提升了近15倍
这个SQL还有进一步优化的空间,另外这个SQL也还有需要改写的地方,不过那些太简单了 就不说了,其实以前盖尔发的SQL也有
And a.Item_Id = (Select Max(item_id)
From t_Policy_Problem
Where notice_code = a.notice_code)
不过当时没给他改SQL 呵呵,当时太懒了,今天第一条上班心情不错 加上时间充裕,就搞搞吧
通过这个案例,你要学到的就是自连接的优化方法,利用分析函数干掉自连接,减少表访问次数。