利用Oracle分析函数row_number和sys_connect_by_path实现多行数据合并为一行
emo场景,以oracle自带库中的表emp为例:
select ename,deptno from emp order by deptno;
ENAME | DEPTNO |
CLARK | 10 |
KING | 10 |
MILLER | 10 |
SMITH | 20 |
ADAMS | 20 |
FORD | 20 |
SCOTT | 20 |
JONES | 20 |
ALLEN | 30 |
BLAKE | 30 |
MARTIN | 30 |
JAMES | 30 |
TURNER | 30 |
WARD | 30 |
现在想要将同一部门的人给合并成一行记录,如何做呢?如下:
ENAME | DEPTNO |
CLARK,KING,MILLER | 10 |
ADAMS,FORD,JONES,SCOTT,SMITH | 20 |
ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD | 30 |
通常我们都是自己写函数或在程序中处理,这里我们利用oracle自带的分析函数row_number()和sys_connect_by_path来进行sql语句层面的多行到单行的合并,并且效率会非常高。
基本思路:
1、对deptno进行row_number()按ename排位并打上排位号
select deptno,ename,row_number() over(partition by deptno order by deptno,ename) rank
from emp order by deptno,ename;
DEPTNO | ENAME | RANK |
10 | CLARK | 1 |
10 | KING | 2 |
10 | MILLER | 3 |
20 | ADAMS | 1 |
20 | FORD | 2 |
20 | JONES | 3 |
20 | SCOTT | 4 |
20 | SMITH | 5 |
30 | ALLEN | 1 |
30 | BLAKE | 2 |
30 | JAMES | 3 |
30 | MARTIN | 4 |
30 | TURNER | 5 |
30 | WARD | 6 |
可看出,经过row_number()后,部门人已经按部门和人名进行了排序,并打上了一个位置字段rank
2、利用oracle的递归查询connect by进行表内递归,并通过sys_connect_by_path进行父子数据追溯串的构造,这里要针对ename字段进行构造,使之合并在一个字段内(数据很多,只截取部分)
select deptno,ename,rank,level as curr_level,
ltrim(sys_connect_by_path(ename,','),',') ename_path from (
select deptno,ename,row_number() over(partition by deptno order by deptno,ename) rank
from emp order by deptno,ename) connect by deptno = prior deptno and rank-1 = prior rank;
各部门递归后的数据量都是:(1+n)/2 * n 即:deptno=10 数据量:(1+3)/2 * 3 = 6;
deptno=20 数据量:(1+5)/2 * 5 = 15; deptno=30 数据量:(1+6)/2 * 6 = 21;
DEPTNO | ENAME | RANK | CURR_LEVEL | ENAME_PATH |
10 | CLARK | 1 | 1 | CLARK |
10 | KING | 2 | 2 | CLARK,KING |
10 | MILLER | 3 | 3 | CLARK,KING,MILLER |
10 | KING | 2 | 1 | KING |
10 | MILLER | 3 | 2 | KING,MILLER |
10 | MILLER | 3 | 1 | MILLER |
DEPTNO | ENAME | RANK | CURR_LEVEL | ENAME_PATH |
20 | ADAMS | 1 | 1 | ADAMS |
20 | FORD | 2 | 2 | ADAMS,FORD |
20 | JONES | 3 | 3 | ADAMS,FORD,JONES |
20 | SCOTT | 4 | 4 | ADAMS,FORD,JONES,SCOTT |
20 | SMITH | 5 | 5 | ADAMS,FORD,JONES,SCOTT,SMITH |
20 | FORD | 2 | 1 | FORD |
20 | JONES | 3 | 2 | FORD,JONES |
20 | SCOTT | 4 | 3 | FORD,JONES,SCOTT |
20 | SMITH | 5 | 4 | FORD,JONES,SCOTT,SMITH |
20 | JONES | 3 | 1 | JONES |
20 | SCOTT | 4 | 2 | JONES,SCOTT |
20 | SMITH | 5 | 3 | JONES,SCOTT,SMITH |
20 | SCOTT | 4 | 1 | SCOTT |
20 | SMITH | 5 | 2 | SCOTT,SMITH |
20 | SMITH | 5 | 1 | SMITH |
这里我们仅列出deptno=10、20的,至此我们应该能否发现一些线索了,即每个部门中,curr_level最高的那行,有我们所需要的数据。那后面该怎么办,取出那个数据? 对了,继续用row_number()进行排位标记,然后再按排位标记取出即可。
3、 对deptno继续进行row_number()按curr_level排位
select deptno,ename_path,row_number() over(partition by deptno order by deptno,curr_level desc) ename_path_rank from (select deptno,ename,rank,level as curr_level,
ltrim(sys_connect_by_path(ename,','),',') ename_path from (
select deptno,ename,row_number() over(partition by deptno order by deptno,ename) rank
from emp order by deptno,ename) connect by deptno = prior deptno and rank-1 = prior rank);
DEPTNO | ENAME_PATH | ENAME_PATH_RANK |
10 | CLARK,KING,MILLER | 1 |
10 | CLARK,KING | 2 |
10 | KING,MILLER | 3 |
10 | CLARK | 4 |
10 | KING | 5 |
10 | MILLER | 6 |
DEPTNO | ENAME_PATH | ENAME_PATH_RANK |
20 | ADAMS,FORD,JONES,SCOTT,SMITH | 1 |
20 | ADAMS,FORD,JONES,SCOTT | 2 |
20 | FORD,JONES,SCOTT,SMITH | 3 |
20 | ADAMS,FORD,JONES | 4 |
20 | FORD,JONES,SCOTT | 5 |
20 | JONES,SCOTT,SMITH | 6 |
20 | ADAMS,FORD | 7 |
20 | FORD,JONES | 8 |
20 | SCOTT,SMITH | 9 |
20 | JONES,SCOTT | 10 |
20 | ADAMS | 11 |
20 | JONES | 12 |
20 | SMITH | 13 |
20 | SCOTT | 14 |
20 | FORD | 15 |
这里还是仅列出deptno为10、20的,至此应该很明了了,在进行一次查询,取ename_path_rank为1的即可获得我们想要的结果。
4、获取想要排位的数据,即得部门下所有人多行到单行的合并
select deptno,ename_path from (select deptno,ename_path,
row_number() over(partition by deptno order by deptno,curr_level desc) ename_path_rank
from (select deptno,ename,rank,level as curr_level,
ltrim(sys_connect_by_path(ename,','),',') ename_path from (
select deptno,ename,row_number() over(partition by deptno order by deptno,ename) rank
from emp order by deptno,ename) connect by deptno = prior deptno and rank-1 = prior rank))
where ename_path_rank=1;