【oracle SQL】为避免oracle函数listagg隐患而进行的一次sql改进

这篇SQL优化文较长，性急者请直接拉到页尾看答案，若不着急请慢慢看。

后续：https://www.cnblogs.com/heyang78/p/15862497.html

【基本表】

-- tag表结构
create table tag(
    id number(12),
    name nvarchar2(20),
    primary key(id)
)

-- tag充值
insert into tag
select rownum,
       dbms_random.String('*',dbms_random.value(6,20))
from dual
connect by level<10+1

--用户表结构
create table customer(
    id number(12),
    name nvarchar2(20),
    primary key(id)
)

--用户表充值
insert into customer
select rownum,
       dbms_random.String('*',dbms_random.value(6,20))
from dual
connect by level<20+1

--利用笛卡尔积创建连接表
create table customer_tag
as
select rownum as id,tag.id as tid,customer.id as cid from tag,customer where mod(tag.id,2)=1

--连接表加主键
alter table customer_tag add constraint pk_customer_tag primary key (id);

--连接表设索引
create index idx_tid_cid on customer_tag(tid,cid);

【表数据】

--tag表

SQL> select * from tag;

        ID NAME
---------- ----------------------------------------
         1 HPYAWGEZMLHLZZY
         2 HLBTQPMEFBK
         3 OWCRYDVDNGCMCACCF
         4 YJQTMTREUMQ
         5 VLCGRLPCPN
         6 AWKMEVVYK
         7 VUWOGAWDPLROQ
         8 TXSDSPLJUTCXI
         9 PPHHLEKRDDVABT
        10 DTUUCPWL

已选择10行。

--customer表

SQL> select * from customer;

        ID NAME
---------- ----------------------------------------
         1 RRWCSHGGXWIYBKGXVCEU
         2 KJBKAP
         3 KOWUIGSCZ
         4 YQQERMQIFJFHBRMYB
         5 SLHDVDA
         6 DABAFFOLGV
         7 QSOLQIQVWN
         8 WBSAFUYZXTFSNWNUVLD
         9 HENMSABDZIB
        10 HVBGVEEMFLOUMGIHBLZ
        11 BAKPBTDF
        12 ZWFOGZIA
        13 GBWSDUKVUULBHBIM
        14 JPFMYDBRTROCM
        15 OIYDCYLWFJNVLGMLJOD
        16 LGDTXVNGGHIK
        17 SPEFMER
        18 WKQSKPKFXCAJQRMDLK
        19 FUGHCCZRTRD
        20 KOBERTDCS

已选择20行。

--连接表Customer_tag

SQL> select * from customer_tag;

        ID        TID        CID
---------- ---------- ----------
         1          1          1
         2          1          2
         3          1          3
         4          1          4
         5          1          5
         6          1          6
         7          1          7
         8          1          8
         9          1          9
        10          1         10
        11          1         11

        ID        TID        CID
---------- ---------- ----------
        12          1         12
        13          1         13
        14          1         14
        15          1         15
        16          1         16
        17          1         17
        18          1         18
        19          1         19
        20          1         20
        21          3          1
        22          3          2

        ID        TID        CID
---------- ---------- ----------
        23          3          3
        24          3          4
        25          3          5
        26          3          6
        27          3          7
        28          3          8
        29          3          9
        30          3         10
        31          3         11
        32          3         12
        33          3         13

        ID        TID        CID
---------- ---------- ----------
        34          3         14
        35          3         15
        36          3         16
        37          3         17
        38          3         18
        39          3         19
        40          3         20
        41          5          1
        42          5          2
        43          5          3
        44          5          4

        ID        TID        CID
---------- ---------- ----------
        45          5          5
        46          5          6
        47          5          7
        48          5          8
        49          5          9
        50          5         10
        51          5         11
        52          5         12
        53          5         13
        54          5         14
        55          5         15

        ID        TID        CID
---------- ---------- ----------
        56          5         16
        57          5         17
        58          5         18
        59          5         19
        60          5         20
        61          7          1
        62          7          2
        63          7          3
        64          7          4
        65          7          5
        66          7          6

        ID        TID        CID
---------- ---------- ----------
        67          7          7
        68          7          8
        69          7          9
        70          7         10
        71          7         11
        72          7         12
        73          7         13
        74          7         14
        75          7         15
        76          7         16
        77          7         17

        ID        TID        CID
---------- ---------- ----------
        78          7         18
        79          7         19
        80          7         20
        81          9          1
        82          9          2
        83          9          3
        84          9          4
        85          9          5
        86          9          6
        87          9          7
        88          9          8

        ID        TID        CID
---------- ---------- ----------
        89          9          9
        90          9         10
        91          9         11
        92          9         12
        93          9         13
        94          9         14
        95          9         15
        96          9         16
        97          9         17
        98          9         18
        99          9         19

        ID        TID        CID
---------- ---------- ----------
       100          9         20

已选择100行。

【需求】

先需要把每个customer拥有的tag列出，比如用户A拥有tagid=1，3，5，7，9的tag，那么该输出这样的数据：

A，1，0，1，0，1，0，1，0，1，0

【解决方案】

首先，用户拥有的tag数量是一定的，就是10个，我们可以先做成一个定值序列。

SQL> select level as sn,'0' as val from dual connect by level<=10;

        SN V
---------- -
         1 0
         2 0
         3 0
         4 0
         5 0
         6 0
         7 0
         8 0
         9 0
        10 0

已选择10行。

这里还没有cid的出现，让上面与客户表做一个笛卡儿积就有了：

SQL> select a.sn,a.val,b.id as cid from
  2  (select level as sn,'0' as val from dual connect by level<=10) a,
  3  (select id from customer where id<20) b
  4  order by b.id,a.val,a.sn;

        SN V        CID
---------- - ----------
         1 0          1
         2 0          1
         3 0          1
         4 0          1
         5 0          1
         6 0          1
         7 0          1
         8 0          1
         9 0          1
        10 0          1
         1 0          2

        SN V        CID
---------- - ----------
         2 0          2
         3 0          2
         4 0          2
         5 0          2
         6 0          2
         7 0          2
         8 0          2
         9 0          2
        10 0          2
         1 0          3
         2 0          3

        SN V        CID
---------- - ----------
         3 0          3
         4 0          3
         5 0          3
         6 0          3
         7 0          3
         8 0          3
         9 0          3
        10 0          3
         1 0          4
         2 0          4
         3 0          4

        SN V        CID
---------- - ----------
         4 0          4
         5 0          4
         6 0          4
         7 0          4
         8 0          4
         9 0          4
        10 0          4
         1 0          5
         2 0          5
         3 0          5
         4 0          5

        SN V        CID
---------- - ----------
         5 0          5
         6 0          5
         7 0          5
         8 0          5
         9 0          5
        10 0          5
         1 0          6
         2 0          6
         3 0          6
         4 0          6
         5 0          6

        SN V        CID
---------- - ----------
         6 0          6
         7 0          6
         8 0          6
         9 0          6
        10 0          6
         1 0          7
         2 0          7
         3 0          7
         4 0          7
         5 0          7
         6 0          7

        SN V        CID
---------- - ----------
         7 0          7
         8 0          7
         9 0          7
        10 0          7
         1 0          8
         2 0          8
         3 0          8
         4 0          8
         5 0          8
         6 0          8
         7 0          8

        SN V        CID
---------- - ----------
         8 0          8
         9 0          8
        10 0          8
         1 0          9
         2 0          9
         3 0          9
         4 0          9
         5 0          9
         6 0          9
         7 0          9
         8 0          9

        SN V        CID
---------- - ----------
         9 0          9
        10 0          9
         1 0         10
         2 0         10
         3 0         10
         4 0         10
         5 0         10
         6 0         10
         7 0         10
         8 0         10
         9 0         10

        SN V        CID
---------- - ----------
        10 0         10
         1 0         11
         2 0         11
         3 0         11
         4 0         11
         5 0         11
         6 0         11
         7 0         11
         8 0         11
         9 0         11
        10 0         11

        SN V        CID
---------- - ----------
         1 0         12
         2 0         12
         3 0         12
         4 0         12
         5 0         12
         6 0         12
         7 0         12
         8 0         12
         9 0         12
        10 0         12
         1 0         13

        SN V        CID
---------- - ----------
         2 0         13
         3 0         13
         4 0         13
         5 0         13
         6 0         13
         7 0         13
         8 0         13
         9 0         13
        10 0         13
         1 0         14
         2 0         14

        SN V        CID
---------- - ----------
         3 0         14
         4 0         14
         5 0         14
         6 0         14
         7 0         14
         8 0         14
         9 0         14
        10 0         14
         1 0         15
         2 0         15
         3 0         15

        SN V        CID
---------- - ----------
         4 0         15
         5 0         15
         6 0         15
         7 0         15
         8 0         15
         9 0         15
        10 0         15
         1 0         16
         2 0         16
         3 0         16
         4 0         16

        SN V        CID
---------- - ----------
         5 0         16
         6 0         16
         7 0         16
         8 0         16
         9 0         16
        10 0         16
         1 0         17
         2 0         17
         3 0         17
         4 0         17
         5 0         17

        SN V        CID
---------- - ----------
         6 0         17
         7 0         17
         8 0         17
         9 0         17
        10 0         17
         1 0         18
         2 0         18
         3 0         18
         4 0         18
         5 0         18
         6 0         18

        SN V        CID
---------- - ----------
         7 0         18
         8 0         18
         9 0         18
        10 0         18
         1 0         19
         2 0         19
         3 0         19
         4 0         19
         5 0         19
         6 0         19
         7 0         19

        SN V        CID
---------- - ----------
         8 0         19
         9 0         19
        10 0         19

已选择190行。

从上面输出的数据可以看到，每个客户id对应了10个0，这个就是客户没有赋值状态的tags了。

再接下来，让上面的结果集与customer_tag表连结起来。

select c.*,d.*, decode(nvl(d.tid,0),0,'0','1') from
(select a.sn,a.val,b.id as cid from
(select level as sn,'0' as val from dual connect by level<=10) a,
(select id from customer where id<=20) b ) c
left join 
(select * from customer_tag where cid<=20) d
on c.cid= d.cid and c.sn=d.tid
order by c.cid,c.sn

上面这段SQL的关键在

decode(nvl(d.tid,0),0,'0','1')

一句，这句的用意在如果连接表customer_tag表在cid和tid交叉处有值，则把tid代表的位置置“1”，否则就是“0”.

上面的语句包括的列还有点多，可以简化一下：

    select c.cid,c.sn, decode(nvl(d.tid,0),0,'0','1') as tg from
    (select a.sn,a.val,b.id as cid from
    (select level as sn,'0' as val from dual connect by level<=10) a,
    (select id from customer where id<=20) b ) c
    left join 
    (select * from customer_tag where cid<=20) d
    on c.cid= d.cid and c.sn=d.tid
    order by c.cid,c.sn

我截取两个片段给大家看看：

       CID         SN T
---------- ---------- -
        19          8 0
        19          9 1
        19         10 0
        20          1 1
        20          2 0
        20          3 1
        20          4 0
        20          5 1
        20          6 0
        20          7 1
        20          8 0

       CID         SN T
---------- ---------- -
        20          9 1
        20         10 0

这个已经很接近需求了，只要我们把cid分组，然后把tg给listagg起来,排序方式按sn就可以。

select e.cid,listagg(e.tg,',') within group (order by e.sn) as tags  from
(
    select c.cid,c.sn, decode(nvl(d.tid,0),0,'0','1') as tg from
    (select a.sn,a.val,b.id as cid from
    (select level as sn,'0' as val from dual connect by level<=10) a,
    (select id from customer where id<=20) b ) c
    left join 
    (select * from customer_tag where cid<=20) d
    on c.cid= d.cid and c.sn=d.tid
    --order by c.cid,c.sn
) e
group by e.cid

我再截取一段给大家看看：

       CID
----------
TAGS
--------------------------------------------------------------------------------
        16
1,0,1,0,1,0,1,0,1,0

        17
1,0,1,0,1,0,1,0,1,0

        18
1,0,1,0,1,0,1,0,1,0


       CID
----------
TAGS
--------------------------------------------------------------------------------
        19
1,0,1,0,1,0,1,0,1,0

        20
1,0,1,0,1,0,1,0,1,0

这个已经非常接近需求了，只要将上面的结果集与customer表再连结一下。

【最终SQL】

select ct.name||','||f.tags as line from
(
    select e.cid,listagg(e.tg,',') within group (order by e.sn) as tags  from
    (
        select c.cid,c.sn, decode(nvl(d.tid,0),0,'0','1') as tg from
        (select a.sn,a.val,b.id as cid from
        (select level as sn,'0' as val from dual connect by level<=10) a,
        (select id from customer where id<=20) b ) c
        left join 
        (select * from customer_tag where cid<=20) d
        on c.cid= d.cid and c.sn=d.tid
        --order by c.cid,c.sn
    ) e
    group by e.cid
) f
left join customer ct
on f.cid=ct.id
order by f.cid

最终结果：

LINE
--------------------------------------------------------------------------------
RRWCSHGGXWIYBKGXVCEU,1,0,1,0,1,0,1,0,1,0
KJBKAP,1,0,1,0,1,0,1,0,1,0
KOWUIGSCZ,1,0,1,0,1,0,1,0,1,0
YQQERMQIFJFHBRMYB,1,0,1,0,1,0,1,0,1,0
SLHDVDA,1,0,1,0,1,0,1,0,1,0
DABAFFOLGV,1,0,1,0,1,0,1,0,1,0
QSOLQIQVWN,1,0,1,0,1,0,1,0,1,0
WBSAFUYZXTFSNWNUVLD,1,0,1,0,1,0,1,0,1,0
HENMSABDZIB,1,0,1,0,1,0,1,0,1,0
HVBGVEEMFLOUMGIHBLZ,1,0,1,0,1,0,1,0,1,0
BAKPBTDF,1,0,1,0,1,0,1,0,1,0

LINE
--------------------------------------------------------------------------------
ZWFOGZIA,1,0,1,0,1,0,1,0,1,0
GBWSDUKVUULBHBIM,1,0,1,0,1,0,1,0,1,0
JPFMYDBRTROCM,1,0,1,0,1,0,1,0,1,0
OIYDCYLWFJNVLGMLJOD,1,0,1,0,1,0,1,0,1,0
LGDTXVNGGHIK,1,0,1,0,1,0,1,0,1,0
SPEFMER,1,0,1,0,1,0,1,0,1,0
WKQSKPKFXCAJQRMDLK,1,0,1,0,1,0,1,0,1,0
FUGHCCZRTRD,1,0,1,0,1,0,1,0,1,0
KOBERTDCS,1,0,1,0,1,0,1,0,1,0

已选择20行。

可见，每个客户的tag情况都被列了出来，正如需求所说的。

上面的方案避免了listagg上限为4000的隐患，因为在实际处理中，tag只有1600个，加上1599个逗号，总数不超过3200，不会去触碰4000的底线。

上面的方案还把绝大部分计算保持在数据库服务端，使得java服务器端拿到的就是最终数据，为整体运行节约了时间。

其实，程序语言中大部分循环和分支的功能，靠SQL也能办到，用得正确必然能提高效率。

PS：

做出这个方案，是因为原来的实现 https://www.cnblogs.com/pyhy/p/15855992.html 存在listagg上限的隐患，同时我也觉得不用Java帮忙，SQL本身就能完成数据行的搭建，于是思索两天，便有了此新方案，当然，对此偏于复杂的SQL，不同的人有不同的理解，且不论新方案的结果如何，但这个思索过程我是挺享受的，获得了莫大的研究本身产生的快乐，这就足够了。

END

2022年2月4日04点48分补充

在实际运用中发现，SQL中第一句

select ct.name||','||f.tags as line from

会导致错误:ORA-01489: 字符串连接的结果过长，换成concat去连接字串也是一样，看来oracle不想让本身就长的字串（f.tags）再去和别的字串连接。

这个问题也很好解决，那就是不在程序里把两个字段连起来，而是保持两个字段，让外接程序去连就好了，于是最终SQL变成这样：

select ct.name,f.tags as line from
(
    select e.cid,listagg(e.tg,',') within group (order by e.sn) as tags  from
    (
        select c.cid,c.sn, decode(nvl(d.tid,0),0,'0','1') as tg from
        (select a.sn,a.val,b.id as cid from
        (select level as sn,'0' as val from dual connect by level<=1000) a,
        (select id from customer where 10000<id and id<=20000) b ) c
        left join 
        (select * from customer_tag where 10000<cid and cid<=20000) d
        on c.cid= d.cid and c.sn=d.tid
        --order by c.cid,c.sn
    ) e
    group by e.cid
) f
left join customer ct
on f.cid=ct.id
order by f.cid

这样就没有ORA-01489错误了，SQL在SQL Plus和Python程序里都能正常执行。

END

posted @ 2022-02-02 09:35 逆火狂飙阅读(425) 评论(2) 收藏举报

刷新页面返回顶部