[Oracle] Group By 语句的扩展 - Rollup、Cube和Grouping Sets
常常写SQL语句的人应该知道Group by语句的主要使用方法是进行分类汇总,以下是一种它最常见的使用方法(依据部门、职位分别统计业绩):
SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno = b.deptno GROUP BY a.dname,b.job; DNAME JOB SUM_SAL -------------- --------- ---------- SALES MANAGER 2850 SALES CLERK 950 SALES SALESMAN 5600 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 ACCOUNTING CLERK 1300 RESEARCH MANAGER 2975 RESEARCH ANALYST 6000 RESEARCH CLERK 1900这时候,假设有人跑过来跟你说:我除了以上数据之外,还要每一个部门总的业绩以及全部部门加起来的业绩,这时候你非常可能会想到例如以下的笨方法(union all):
select * from ( SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno = b.deptno GROUP BY a.dname,b.job UNION ALL --实现了部门的小计 SELECT a.dname,NULL, SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno = b.deptno GROUP BY a.dname UNION ALL --实现了全部部门总的合计 SELECT NULL,NULL, SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno = b.deptno) order by dname; DNAME JOB SUM_SAL -------------- --------- ---------- ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 ACCOUNTING 8750 RESEARCH CLERK 1900 RESEARCH MANAGER 2975 RESEARCH ANALYST 6000 RESEARCH 10875 SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 SALES 9400 29025 union all 合并笨办法产生的运行计划 ------------------------------------------------------------------------------- Plan hash value: 2979078843 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 29 | 812 | 23 (22)| 00:00:01 | | 1 | SORT ORDER BY | | 29 | 812 | 23 (22)| 00:00:01 | | 2 | VIEW | | 29 | 812 | 22 (19)| 00:00:01 | | 3 | UNION-ALL | | | | | | | 4 | HASH GROUP BY | | 14 | 756 | 8 (25)| 00:00:01 | |* 5 | HASH JOIN | | 14 | 756 | 7 (15)| 00:00:01 | | 6 | TABLE ACCESS FULL| DEPT | 4 | 88 | 3 (0)| 00:00:01 | | 7 | TABLE ACCESS FULL| EMP | 14 | 448 | 3 (0)| 00:00:01 | | 8 | HASH GROUP BY | | 14 | 672 | 8 (25)| 00:00:01 | |* 9 | HASH JOIN | | 14 | 672 | 7 (15)| 00:00:01 | | 10 | TABLE ACCESS FULL| DEPT | 4 | 88 | 3 (0)| 00:00:01 | | 11 | TABLE ACCESS FULL| EMP | 14 | 364 | 3 (0)| 00:00:01 | | 12 | SORT AGGREGATE | | 1 | 39 | | | |* 13 | HASH JOIN | | 14 | 546 | 7 (15)| 00:00:01 | | 14 | TABLE ACCESS FULL| DEPT | 4 | 52 | 3 (0)| 00:00:01 | | 15 | TABLE ACCESS FULL| EMP | 14 | 364 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------事实上,假设你知道Group By的Rollup扩展的话,这样的需求仅仅是小case:
SELECT a.dname,b.job, SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno = b.deptno GROUP BY ROLLUP(a.dname,b.job); DNAME JOB SUM_SAL -------------- --------- ---------- SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 SALES 9400 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 RESEARCH 10875 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 ACCOUNTING 8750 29025 rollup写法产生的运行计划 ----------------------------------------------------------------------------- Plan hash value: 1037965942 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 756 | 8 (25)| 00:00:01 | | 1 | SORT GROUP BY ROLLUP| | 14 | 756 | 8 (25)| 00:00:01 | |* 2 | HASH JOIN | | 14 | 756 | 7 (15)| 00:00:01 | | 3 | TABLE ACCESS FULL | DEPT | 4 | 88 | 3 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | EMP | 14 | 448 | 3 (0)| 00:00:01 | -----------------------------------------------------------------------------能够发现,这样的方法不但SQL书写方便,性能也能得到提高。
这时候,假设又有人跑过来说:除了以上数据,他还须要每一个职位总的业绩,你仅仅要把rollup换成cube就能够了,例如以下所看到的:
-- CUBE分组 SELECT a.dname,b.job, SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno = b.deptno GROUP BY CUBE(a.dname,b.job); DNAME JOB SUM_SAL -------------- --------- ---------- 29025 CLERK 4150 ANALYST 6000 MANAGER 8275 SALESMAN 5600 PRESIDENT 5000 SALES 9400 SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 RESEARCH 10875 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 ACCOUNTING 8750 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000从上面能够看出:cube比rollup的展现的粒度更细一些。
这时候,假设又有人跑过来说:他不须要那么细的数据,仅仅须要汇总的数据,能够使用Grouping Sets:
---GROUPING SETS分组 SELECT to_char(b.hiredate,'yyyy') hire_year,a.dname,b.job, SUM(sal) sum_sal FROM dept a,emp b WHERE a.deptno = b.deptno GROUP BY GROUPING SETS(to_char(b.hiredate,'yyyy'),a.dname,b.job); HIRE DNAME JOB SUM_SAL ---- -------------- --------- ---------- 1987 4100 1980 800 1982 1300 1981 22825 ACCOUNTING 8750 RESEARCH 10875 SALES 9400 CLERK 4150 SALESMAN 5600 PRESIDENT 5000 MANAGER 8275 ANALYST 6000