【oracle SQL】为避免oracle函数listagg隐患而进行的一次sql改进
这篇SQL优化文较长,性急者请直接拉到页尾看答案,若不着急请慢慢看。
后续:https://www.cnblogs.com/heyang78/p/15862497.html
【基本表】
-- tag表结构 create table tag( id number(12), name nvarchar2(20), primary key(id) ) -- tag充值 insert into tag select rownum, dbms_random.String('*',dbms_random.value(6,20)) from dual connect by level<10+1 --用户表结构 create table customer( id number(12), name nvarchar2(20), primary key(id) ) --用户表充值 insert into customer select rownum, dbms_random.String('*',dbms_random.value(6,20)) from dual connect by level<20+1 --利用笛卡尔积创建连接表 create table customer_tag as select rownum as id,tag.id as tid,customer.id as cid from tag,customer where mod(tag.id,2)=1 --连接表加主键 alter table customer_tag add constraint pk_customer_tag primary key (id); --连接表设索引 create index idx_tid_cid on customer_tag(tid,cid);
【表数据】
--tag表
SQL> select * from tag; ID NAME ---------- ---------------------------------------- 1 HPYAWGEZMLHLZZY 2 HLBTQPMEFBK 3 OWCRYDVDNGCMCACCF 4 YJQTMTREUMQ 5 VLCGRLPCPN 6 AWKMEVVYK 7 VUWOGAWDPLROQ 8 TXSDSPLJUTCXI 9 PPHHLEKRDDVABT 10 DTUUCPWL 已选择10行。
--customer表
SQL> select * from customer; ID NAME ---------- ---------------------------------------- 1 RRWCSHGGXWIYBKGXVCEU 2 KJBKAP 3 KOWUIGSCZ 4 YQQERMQIFJFHBRMYB 5 SLHDVDA 6 DABAFFOLGV 7 QSOLQIQVWN 8 WBSAFUYZXTFSNWNUVLD 9 HENMSABDZIB 10 HVBGVEEMFLOUMGIHBLZ 11 BAKPBTDF 12 ZWFOGZIA 13 GBWSDUKVUULBHBIM 14 JPFMYDBRTROCM 15 OIYDCYLWFJNVLGMLJOD 16 LGDTXVNGGHIK 17 SPEFMER 18 WKQSKPKFXCAJQRMDLK 19 FUGHCCZRTRD 20 KOBERTDCS 已选择20行。
--连接表Customer_tag
SQL> select * from customer_tag; ID TID CID ---------- ---------- ---------- 1 1 1 2 1 2 3 1 3 4 1 4 5 1 5 6 1 6 7 1 7 8 1 8 9 1 9 10 1 10 11 1 11 ID TID CID ---------- ---------- ---------- 12 1 12 13 1 13 14 1 14 15 1 15 16 1 16 17 1 17 18 1 18 19 1 19 20 1 20 21 3 1 22 3 2 ID TID CID ---------- ---------- ---------- 23 3 3 24 3 4 25 3 5 26 3 6 27 3 7 28 3 8 29 3 9 30 3 10 31 3 11 32 3 12 33 3 13 ID TID CID ---------- ---------- ---------- 34 3 14 35 3 15 36 3 16 37 3 17 38 3 18 39 3 19 40 3 20 41 5 1 42 5 2 43 5 3 44 5 4 ID TID CID ---------- ---------- ---------- 45 5 5 46 5 6 47 5 7 48 5 8 49 5 9 50 5 10 51 5 11 52 5 12 53 5 13 54 5 14 55 5 15 ID TID CID ---------- ---------- ---------- 56 5 16 57 5 17 58 5 18 59 5 19 60 5 20 61 7 1 62 7 2 63 7 3 64 7 4 65 7 5 66 7 6 ID TID CID ---------- ---------- ---------- 67 7 7 68 7 8 69 7 9 70 7 10 71 7 11 72 7 12 73 7 13 74 7 14 75 7 15 76 7 16 77 7 17 ID TID CID ---------- ---------- ---------- 78 7 18 79 7 19 80 7 20 81 9 1 82 9 2 83 9 3 84 9 4 85 9 5 86 9 6 87 9 7 88 9 8 ID TID CID ---------- ---------- ---------- 89 9 9 90 9 10 91 9 11 92 9 12 93 9 13 94 9 14 95 9 15 96 9 16 97 9 17 98 9 18 99 9 19 ID TID CID ---------- ---------- ---------- 100 9 20 已选择100行。
【需求】
先需要把每个customer拥有的tag列出,比如用户A拥有tagid=1,3,5,7,9的tag,那么该输出这样的数据:
A,1,0,1,0,1,0,1,0,1,0
【解决方案】
首先,用户拥有的tag数量是一定的,就是10个,我们可以先做成一个定值序列。
SQL> select level as sn,'0' as val from dual connect by level<=10; SN V ---------- - 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 已选择10行。
这里还没有cid的出现,让上面与客户表做一个笛卡儿积就有了:
SQL> select a.sn,a.val,b.id as cid from 2 (select level as sn,'0' as val from dual connect by level<=10) a, 3 (select id from customer where id<20) b 4 order by b.id,a.val,a.sn; SN V CID ---------- - ---------- 1 0 1 2 0 1 3 0 1 4 0 1 5 0 1 6 0 1 7 0 1 8 0 1 9 0 1 10 0 1 1 0 2 SN V CID ---------- - ---------- 2 0 2 3 0 2 4 0 2 5 0 2 6 0 2 7 0 2 8 0 2 9 0 2 10 0 2 1 0 3 2 0 3 SN V CID ---------- - ---------- 3 0 3 4 0 3 5 0 3 6 0 3 7 0 3 8 0 3 9 0 3 10 0 3 1 0 4 2 0 4 3 0 4 SN V CID ---------- - ---------- 4 0 4 5 0 4 6 0 4 7 0 4 8 0 4 9 0 4 10 0 4 1 0 5 2 0 5 3 0 5 4 0 5 SN V CID ---------- - ---------- 5 0 5 6 0 5 7 0 5 8 0 5 9 0 5 10 0 5 1 0 6 2 0 6 3 0 6 4 0 6 5 0 6 SN V CID ---------- - ---------- 6 0 6 7 0 6 8 0 6 9 0 6 10 0 6 1 0 7 2 0 7 3 0 7 4 0 7 5 0 7 6 0 7 SN V CID ---------- - ---------- 7 0 7 8 0 7 9 0 7 10 0 7 1 0 8 2 0 8 3 0 8 4 0 8 5 0 8 6 0 8 7 0 8 SN V CID ---------- - ---------- 8 0 8 9 0 8 10 0 8 1 0 9 2 0 9 3 0 9 4 0 9 5 0 9 6 0 9 7 0 9 8 0 9 SN V CID ---------- - ---------- 9 0 9 10 0 9 1 0 10 2 0 10 3 0 10 4 0 10 5 0 10 6 0 10 7 0 10 8 0 10 9 0 10 SN V CID ---------- - ---------- 10 0 10 1 0 11 2 0 11 3 0 11 4 0 11 5 0 11 6 0 11 7 0 11 8 0 11 9 0 11 10 0 11 SN V CID ---------- - ---------- 1 0 12 2 0 12 3 0 12 4 0 12 5 0 12 6 0 12 7 0 12 8 0 12 9 0 12 10 0 12 1 0 13 SN V CID ---------- - ---------- 2 0 13 3 0 13 4 0 13 5 0 13 6 0 13 7 0 13 8 0 13 9 0 13 10 0 13 1 0 14 2 0 14 SN V CID ---------- - ---------- 3 0 14 4 0 14 5 0 14 6 0 14 7 0 14 8 0 14 9 0 14 10 0 14 1 0 15 2 0 15 3 0 15 SN V CID ---------- - ---------- 4 0 15 5 0 15 6 0 15 7 0 15 8 0 15 9 0 15 10 0 15 1 0 16 2 0 16 3 0 16 4 0 16 SN V CID ---------- - ---------- 5 0 16 6 0 16 7 0 16 8 0 16 9 0 16 10 0 16 1 0 17 2 0 17 3 0 17 4 0 17 5 0 17 SN V CID ---------- - ---------- 6 0 17 7 0 17 8 0 17 9 0 17 10 0 17 1 0 18 2 0 18 3 0 18 4 0 18 5 0 18 6 0 18 SN V CID ---------- - ---------- 7 0 18 8 0 18 9 0 18 10 0 18 1 0 19 2 0 19 3 0 19 4 0 19 5 0 19 6 0 19 7 0 19 SN V CID ---------- - ---------- 8 0 19 9 0 19 10 0 19 已选择190行。
从上面输出的数据可以看到,每个客户id对应了10个0,这个就是客户没有赋值状态的tags了。
再接下来,让上面的结果集与customer_tag表连结起来。
select c.*,d.*, decode(nvl(d.tid,0),0,'0','1') from (select a.sn,a.val,b.id as cid from (select level as sn,'0' as val from dual connect by level<=10) a, (select id from customer where id<=20) b ) c left join (select * from customer_tag where cid<=20) d on c.cid= d.cid and c.sn=d.tid order by c.cid,c.sn
上面这段SQL的关键在
decode(nvl(d.tid,0),0,'0','1')
一句,这句的用意在如果连接表customer_tag表在cid和tid交叉处有值,则把tid代表的位置置“1”,否则就是“0”.
上面的语句包括的列还有点多,可以简化一下:
select c.cid,c.sn, decode(nvl(d.tid,0),0,'0','1') as tg from (select a.sn,a.val,b.id as cid from (select level as sn,'0' as val from dual connect by level<=10) a, (select id from customer where id<=20) b ) c left join (select * from customer_tag where cid<=20) d on c.cid= d.cid and c.sn=d.tid order by c.cid,c.sn
我截取两个片段给大家看看:
CID SN T ---------- ---------- - 19 8 0 19 9 1 19 10 0 20 1 1 20 2 0 20 3 1 20 4 0 20 5 1 20 6 0 20 7 1 20 8 0 CID SN T ---------- ---------- - 20 9 1 20 10 0
这个已经很接近需求了,只要我们把cid分组,然后把tg给listagg起来,排序方式按sn就可以。
select e.cid,listagg(e.tg,',') within group (order by e.sn) as tags from ( select c.cid,c.sn, decode(nvl(d.tid,0),0,'0','1') as tg from (select a.sn,a.val,b.id as cid from (select level as sn,'0' as val from dual connect by level<=10) a, (select id from customer where id<=20) b ) c left join (select * from customer_tag where cid<=20) d on c.cid= d.cid and c.sn=d.tid --order by c.cid,c.sn ) e group by e.cid
我再截取一段给大家看看:
CID ---------- TAGS -------------------------------------------------------------------------------- 16 1,0,1,0,1,0,1,0,1,0 17 1,0,1,0,1,0,1,0,1,0 18 1,0,1,0,1,0,1,0,1,0 CID ---------- TAGS -------------------------------------------------------------------------------- 19 1,0,1,0,1,0,1,0,1,0 20 1,0,1,0,1,0,1,0,1,0
这个已经非常接近需求了,只要将上面的结果集与customer表再连结一下。
【最终SQL】
select ct.name||','||f.tags as line from ( select e.cid,listagg(e.tg,',') within group (order by e.sn) as tags from ( select c.cid,c.sn, decode(nvl(d.tid,0),0,'0','1') as tg from (select a.sn,a.val,b.id as cid from (select level as sn,'0' as val from dual connect by level<=10) a, (select id from customer where id<=20) b ) c left join (select * from customer_tag where cid<=20) d on c.cid= d.cid and c.sn=d.tid --order by c.cid,c.sn ) e group by e.cid ) f left join customer ct on f.cid=ct.id order by f.cid
最终结果:
LINE -------------------------------------------------------------------------------- RRWCSHGGXWIYBKGXVCEU,1,0,1,0,1,0,1,0,1,0 KJBKAP,1,0,1,0,1,0,1,0,1,0 KOWUIGSCZ,1,0,1,0,1,0,1,0,1,0 YQQERMQIFJFHBRMYB,1,0,1,0,1,0,1,0,1,0 SLHDVDA,1,0,1,0,1,0,1,0,1,0 DABAFFOLGV,1,0,1,0,1,0,1,0,1,0 QSOLQIQVWN,1,0,1,0,1,0,1,0,1,0 WBSAFUYZXTFSNWNUVLD,1,0,1,0,1,0,1,0,1,0 HENMSABDZIB,1,0,1,0,1,0,1,0,1,0 HVBGVEEMFLOUMGIHBLZ,1,0,1,0,1,0,1,0,1,0 BAKPBTDF,1,0,1,0,1,0,1,0,1,0 LINE -------------------------------------------------------------------------------- ZWFOGZIA,1,0,1,0,1,0,1,0,1,0 GBWSDUKVUULBHBIM,1,0,1,0,1,0,1,0,1,0 JPFMYDBRTROCM,1,0,1,0,1,0,1,0,1,0 OIYDCYLWFJNVLGMLJOD,1,0,1,0,1,0,1,0,1,0 LGDTXVNGGHIK,1,0,1,0,1,0,1,0,1,0 SPEFMER,1,0,1,0,1,0,1,0,1,0 WKQSKPKFXCAJQRMDLK,1,0,1,0,1,0,1,0,1,0 FUGHCCZRTRD,1,0,1,0,1,0,1,0,1,0 KOBERTDCS,1,0,1,0,1,0,1,0,1,0 已选择20行。
可见,每个客户的tag情况都被列了出来,正如需求所说的。
上面的方案避免了listagg上限为4000的隐患,因为在实际处理中,tag只有1600个,加上1599个逗号,总数不超过3200,不会去触碰4000的底线。
上面的方案还把绝大部分计算保持在数据库服务端,使得java服务器端拿到的就是最终数据,为整体运行节约了时间。
其实,程序语言中大部分循环和分支的功能,靠SQL也能办到,用得正确必然能提高效率。
PS:
做出这个方案,是因为原来的实现 https://www.cnblogs.com/pyhy/p/15855992.html 存在listagg上限的隐患,同时我也觉得不用Java帮忙,SQL本身就能完成数据行的搭建,于是思索两天,便有了此新方案,当然,对此偏于复杂的SQL,不同的人有不同的理解,且不论新方案的结果如何,但这个思索过程我是挺享受的,获得了莫大的研究本身产生的快乐,这就足够了。
END
2022年2月4日04点48分补充
在实际运用中发现,SQL中第一句
select ct.name||','||f.tags as line from
会导致 错误:ORA-01489: 字符串连接的结果过长 ,换成concat去连接字串也是一样,看来oracle不想让本身就长的字串(f.tags)再去和别的字串连接。
这个问题也很好解决,那就是不在程序里把两个字段连起来,而是保持两个字段,让外接程序去连就好了,于是最终SQL变成这样:
select ct.name,f.tags as line from ( select e.cid,listagg(e.tg,',') within group (order by e.sn) as tags from ( select c.cid,c.sn, decode(nvl(d.tid,0),0,'0','1') as tg from (select a.sn,a.val,b.id as cid from (select level as sn,'0' as val from dual connect by level<=1000) a, (select id from customer where 10000<id and id<=20000) b ) c left join (select * from customer_tag where 10000<cid and cid<=20000) d on c.cid= d.cid and c.sn=d.tid --order by c.cid,c.sn ) e group by e.cid ) f left join customer ct on f.cid=ct.id order by f.cid
这样就没有ORA-01489错误了,SQL在SQL Plus和Python程序里都能正常执行。
END
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 【杭电多校比赛记录】2025“钉耙编程”中国大学生算法设计春季联赛(1)
2015-02-02 转载:JAVA中获取项目文件路径