sys_connect_by_path函数配合group by 进行分组拼接
最近,碰到一个需求将 approval_code值对应的多个FIRST_NAME值通过line_no的asc排序 合并为一个最长的字段 ,对应的表 如下:
对应表的sql 语句如下:
SELECT DISTINCT t1.FIRST_NAME, t2.approval_code, t2.line_no FROM K2_ACCESS_USER@k2global t1 INNER JOIN k2_approval_path t2 ON t1.DOMAIN_NAME=t2.USER_ID right join k2_credit_limit_hist t4 on t2.approval_code=t4.approval_code and t4.expired_date>=to_date('2012-01-01','yyyy-mm-dd') ORDER BY t2. APPROVAL_CODE,t2.line_no
起初,我是打算这样获取approval_code对应的FIRST_NAME合并值(当时还不知道 可以直接配合group by 获取到分组的最大的FIRST_NAME的合并值)
------------------------start to combine the approver's name------- SELECT max(substr(sys_connect_by_path(FIRST_NAME,','),2))FIRST_NAME FROM ( SELECT ltrim(APPROVAL_CODE,'RQP')+row_number() over(ORDER BY APPROVAL_CODE ) ROW_NUM, FIRST_NAME, APPROVAL_CODE FROM (select distinct FIRST_NAME, approval_code from ( SELECT DISTINCT t1.FIRST_NAME, t2.approval_code, t2.line_no FROM K2_ACCESS_USER@k2global t1 INNER JOIN k2_approval_path t2 ON t1.DOMAIN_NAME=t2.USER_ID right join k2_credit_limit_hist t4 on t2.approval_code=t4.approval_code and t4.expired_date>=to_date('2012-01-01','yyyy-mm-dd') ORDER BY t2. APPROVAL_CODE,t2.line_no )) ) t3 START WITH t3.APPROVAL_CODE='RQP0001105' --RQP0001105 用来作为一个测试的值 CONNECT BY t3.ROW_NUM -1 = prior t3.ROW_NUM ------------------------end to combine the approver's name--------
但是很快我发现我获取到的不是我想要的:
我去掉包含sys_connect_by_path函数的max()之后,并在select 列表中增加ROW_NUM
SELECT t3.ROW_NUM, substr(sys_connect_by_path(FIRST_NAME,','),2)FIRST_NAME FROM ( SELECT ltrim(APPROVAL_CODE,'RQP')+row_number() over(ORDER BY APPROVAL_CODE ) ROW_NUM, -- row_number() over(partition by APPROVAL_CODE ORDER BY APPROVAL_CODE ) RANK_NUM, FIRST_NAME, APPROVAL_CODE FROM (select distinct FIRST_NAME, approval_code from ( SELECT DISTINCT t1.FIRST_NAME, t2.approval_code, t2.line_no FROM K2_ACCESS_USER@k2global t1 INNER JOIN k2_approval_path t2 ON t1.DOMAIN_NAME=t2.USER_ID right join k2_credit_limit_hist t4 on t2.approval_code=t4.approval_code and t4.expired_date>=to_date('2012-01-01','yyyy-mm-dd') ORDER BY t2. APPROVAL_CODE,t2.line_no )) ) t3 START WITH t3.APPROVAL_CODE='RQP0001105' --t3.APPROVAL_CODE=hist.approval_code --and t3.RANK_NUM=1 CONNECT BY t3.ROW_NUM -1 = prior t3.ROW_NUM
获得的结果如下:
可以看到 其实在调用sys_connect_by_path函数的过程中 已经生成了我们想要的值'Kenneth,Lawrence' 但是由于一些原因这个值最后被重写为Lawrence.
我观察了下早先的代码 sys_connect_by_path最后的条件部分:
START WITH
t3.APPROVAL_CODE='RQP0001105'
CONNECT BY t3.ROW_NUM -1 = prior t3.ROW_NUM
我start with 用的条件是t3.APPROVAL_CODE='RQP0001105' ('RQP0001105'是代入的测试值), 而实际上在表中APPROVAL_CODE值为'RQP0001105'
有两个为别为ROW_NUM1106和1107的两条记录.于是我在执行函数sys_connect_by_path的时候其实是分为两步来执行的 ,它会分别从ROW_NUM=1106和1107两条记录开始执行一次,也就是说它是这样的
start with t3.APPROVAL_CODE='RQP0001105' and t3.ROW_NUM='1106'
CONNECT BY t3.ROW_NUM -1 = prior t3.ROW_NUM
执行结果:
和
start with t3.APPROVAL_CODE='RQP0001105' and t3.ROW_NUM='1107'
CONNECT BY t3.ROW_NUM -1 = prior t3.ROW_NUM
执行结果:
我们可以判断出来是由于从start with t3.APPROVAL_CODE='RQP0001105' and t3.ROW_NUM='1107'的时候将上一步 调用函数生成的值'Kenneth,Lawrence' 重写为Lawrence.
于是问题就清楚了, 解决方法是在start with 的时候再加上一个条件使他只从最上面的那条记录开始执行. 我的方法是新增一个rank列,rank列的值只会和多条记录中的第一个记录的ROW_NUM相同
ltrim(APPROVAL_CODE,'RQP')+row_number() over(ORDER BY APPROVAL_CODE ) ROW_NUM,
ltrim(APPROVAL_CODE,'RQP')+RANK() over(ORDER BY APPROVAL_CODE ) RANK_NUM,
同时 下面的条件改为:
START WITH t3.APPROVAL_CODE=hist.approval_code and t3.ROW_NUM=t3.RANK_NUM
CONNECT BY t3.ROW_NUM -1 = prior t3.ROW_NUM
小组的leader建议我的方法是在原先的代码中新增了一列
rank_num,它是由表中分块排序而来 见如下:
SELECT ltrim(APPROVAL_CODE,'RQP')+row_number() over(ORDER BY APPROVAL_CODE ) ROW_NUM, row_number() over(partition by APPROVAL_CODE ORDER BY APPROVAL_CODE ) RANK_NUM, FIRST_NAME, APPROVAL_CODE from (select distinct FIRST_NAME, approval_code FROM ( SELECT DISTINCT t1.FIRST_NAME, t2.approval_code, t2.line_no FROM K2_ACCESS_USER@k2global t1 INNER JOIN k2_approval_path t2 ON t1.DOMAIN_NAME=t2.USER_ID right join k2_credit_limit_hist t4 on t2.approval_code=t4.approval_code and t4.expired_date>=to_date('2012-01-01','yyyy-mm-dd') ORDER BY t2. APPROVAL_CODE,t2.line_no ))
执行之后可以看到获取到的数据如下:
我们将原先的
START WITH t3.APPROVAL_CODE='RQP0001105'
CONNECT BY t3.ROW_NUM -1 = prior t3.ROW_NUM
修改为
START WITH t3.APPROVAL_CODE='RQP0001105'and t3.RANK_NUM=1
CONNECT BY t3.ROW_NUM -1 = prior t3.ROW_NUM
即可.
SELECT max(substr(sys_connect_by_path(FIRST_NAME,','),2)) FIRST_NAME -- , length(FIRST_NAME),t3.ROW_NUM,t3.APPROVAL_CODE FROM ( SELECT ltrim(APPROVAL_CODE,'RQP')+row_number() over(ORDER BY APPROVAL_CODE ) ROW_NUM, -- ltrim(APPROVAL_CODE,'RQP')+RANK() over(ORDER BY APPROVAL_CODE ) RANK_NUM, row_number() over(partition by APPROVAL_CODE ORDER BY APPROVAL_CODE ) RANK_NUM, -- row_number(), FIRST_NAME, APPROVAL_CODE from (select distinct FIRST_NAME, approval_code FROM ( SELECT DISTINCT t1.FIRST_NAME, t2.approval_code, t2.line_no FROM K2_ACCESS_USER@k2global t1 INNER JOIN k2_approval_path t2 ON t1.DOMAIN_NAME=t2.USER_ID right join k2_credit_limit_hist t4 on t2.approval_code=t4.approval_code and t4.expired_date>=to_date('2012-01-01','yyyy-mm-dd') -- where APPROVAL_CODE='RQP0001199' ORDER BY t2. APPROVAL_CODE,t2.line_no )) ) t3 START WITH t3.APPROVAL_CODE='RQP0001105' and t3.RANK_NUM=1 --and t3.app CONNECT BY t3.ROW_NUM -1 = prior t3.ROW_NUM
执行后结果如下:
好吧,上面写的是我之前走比较绕的路子.实际上要实现我们要的值只需要配合group by 进行分组拼接即可 代码如下:
SELECT max(SUBSTR(SYS_CONNECT_BY_PATH(create_by, ','), 2)) create_by FROM (SELECT ltrim(APPROVAL_CODE,'RQP')+row_number() over(ORDER BY APPROVAL_CODE ) ROW_NUM, -- row_number() over(partition by APPROVAL_CODE ORDER BY APPROVAL_CODE ) RANK_NUM, create_by, approval_code from (select distinct create_by, approval_code FROM (SELECT DISTINCT t.create_by, t.approval_code, t.line_no FROM k2_approval_path t RIGHT JOIN K2_CREDIT_LIMIT_HIST T4 on t.approval_code=t4.approval_code and t4.expired_date>=to_date('2010-01-01','yyyy-mm-dd') ORDER BY t.approval_code,t.line_no ) ) )T1 START WITH t1.approval_code= hist.approval_code --t1.approval_code= hist.approval_code and t1.RANK_NUM=1 CONNECT BY T1.ROW_NUM -1 = PRIOR T1.ROW_NUM group by t1.approval_code