再度思索:从配送表中选出订单号和配送者相同时的最新记录

在前文 https://www.cnblogs.com/heyang78/p/12079017.html 中,我主要描述了现象,给出了结论,这次试图对过程进行一点思索。

 

需求:从配送表里,当订单号和配送者一样时,取时间最靠近现在即时间值最大的一条记录。

 

配送表表结构:

create table peisong(
id int,
order_no int, --订单号
shipper_cd int,--配送者号
create_time timestamp,
primary key(id))

填充实验值:

复制代码

insert into peisong(id,order_no,shipper_cd,create_time) values('0',1,1,to_date('2004-05-07','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('1',1,1,to_date('2004-05-08','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('2',1,2,to_date('2004-05-09','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('3',1,2,to_date('2004-05-10','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('4',1,3,to_date('2004-05-11','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('5',2,1,to_date('2004-05-12','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('6',2,1,to_date('2004-05-13','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('7',2,2,to_date('2004-05-14','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('8',2,2,to_date('2004-05-15','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('9',2,3,to_date('2004-05-16','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('10',2,3,to_date('2004-05-17','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('11',3,1,to_date('2004-05-18','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('12',3,2,to_date('2004-05-19','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('13',3,2,to_date('2004-05-20','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('14',3,3,to_date('2004-05-21','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('15',3,4,to_date('2004-05-22','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('16',3,4,to_date('2004-05-23','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('17',3,4,to_date('2004-05-24','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('18',3,4,to_date('2004-05-25','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('19',3,4,to_date('2004-05-26','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('20',3,4,to_date('2004-05-27','yyyy-mm-dd'));

复制代码

有多种方式可以达成需求,以下列举四种:

方案一:先分组再内联方案

复制代码
select
    a.id,
    a.order_no,
    a.shipper_cd,
    to_char(a.create_time,'yyyy-mm-dd') as create_time
from
    peisong a,
    (
    select
        order_no,
        shipper_cd,
        max(create_time) as create_time
    from
        peisong
    group by
        order_no,
        shipper_cd) b
where
    a.order_no = b.order_no and 
    a.shipper_cd = b.shipper_cd and 
    a.create_time = b.create_time
order by
    a.id,
    a.order_no,
    a.shipper_cd
复制代码

或:

select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a, ( select order_no, shipper_cd, max(create_time) as create_time from peisong group by order_no, shipper_cd) b where a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time = b.create_time order by a.id, a.order_no, a.shipper_cd

筛选结果:

复制代码
        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

已选择10行。
复制代码

 如果我们把order_no,shipper_cd相同的记录划分成一组,那么由b表的形成条件可知,无论组内元素有多少条,最终只会选出create_time最大(即最近)的那条,两表内联查询便会在21*10=210的结果集选,因此这种查询方式适合组内数量稍多的情况,比较均衡。

 

方案二:左连接方案

复制代码
SELECT
    a.id,
    a.order_no,
    a.shipper_cd,
    to_char(a.create_time,'yyyy-mm-dd') as create_time
from
    peisong a left JOIN 
    peisong b
on
    a.order_no = b.order_no and 
    a.shipper_cd = b.shipper_cd and 
    a.create_time < b.create_time
where
    b.create_time  IS  NULL
order by
    a.id,
    a.order_no,
    a.shipper_cd
复制代码

或:

SELECT a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a left JOIN peisong b on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time where b.create_time IS NULL order by a.id, a.order_no, a.shipper_cd

结果:

复制代码
        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

已选择10行。
复制代码

 

这个左联SQL的代码稍微有点费解,我在这里赘述两句。

首先可以看看peisong表左联自己,条件是左表时间小于右表的情况。

SQL:

复制代码
SELECT
    a.id,
    to_char(a.create_time,'yyyy-mm-dd') as actime,
    to_char(b.create_time,'yyyy-mm-dd') as bctime
from
    peisong a left JOIN 
    peisong b
on
    a.order_no = b.order_no and 
    a.shipper_cd = b.shipper_cd and 
    a.create_time < b.create_time
order by
    a.id,
    a.order_no,
    a.shipper_cd
复制代码

查询结果:

复制代码
SQL> SELECT
  2      a.id,
  3      to_char(a.create_time,'yyyy-mm-dd') as actime,
  4      to_char(b.create_time,'yyyy-mm-dd') as bctime
  5  from
  6      peisong a left JOIN
  7      peisong b
  8  on
  9      a.order_no = b.order_no and
 10      a.shipper_cd = b.shipper_cd and
 11      a.create_time < b.create_time
 12  order by
 13      a.id,
 14      a.order_no,
 15      a.shipper_cd;

        ID ACTIME     BCTIME
---------- ---------- ----------
         0 2004-05-07 2004-05-08
         1 2004-05-08
         2 2004-05-09 2004-05-10
         3 2004-05-10
         4 2004-05-11
         5 2004-05-12 2004-05-13
         6 2004-05-13
         7 2004-05-14 2004-05-15
         8 2004-05-15
         9 2004-05-16 2004-05-17
        10 2004-05-17
        11 2004-05-18
        12 2004-05-19 2004-05-20
        13 2004-05-20
        14 2004-05-21
        15 2004-05-22 2004-05-23
        15 2004-05-22 2004-05-26
        15 2004-05-22 2004-05-25
        15 2004-05-22 2004-05-27
        15 2004-05-22 2004-05-24
        16 2004-05-23 2004-05-24
        16 2004-05-23 2004-05-27
        16 2004-05-23 2004-05-26
        16 2004-05-23 2004-05-25
        17 2004-05-24 2004-05-27
        17 2004-05-24 2004-05-26
        17 2004-05-24 2004-05-25
        18 2004-05-25 2004-05-27
        18 2004-05-25 2004-05-26
        19 2004-05-26 2004-05-27
        20 2004-05-27

已选择31行。
复制代码

由上面的查询结果可以看出来,当左表的create_time字段已经是最大时,右表里已经找不出order_no,shipper_cd相同,而create_time字段更大的记录,于是按连接规则补NULL,这个NULL在上面的查询中就是以空档显示的。

通过观察我们就能知道,左表acreate_time字段有数据,右表bcreate_time字段是空的情况,这就是我们想要的记录,于是筛选条件where b.create_time  IS  NULL就出现了。

让我们会看ab两表连接条件“on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time”,如果我们把order_no,shipper_cd相同的记录划到一组,会发现组越大,产生的连接结果就越多,最典型是15到20这一组(共5条记录),通过左联接产生了16条记录(5+4+3+2+1+1),如果有n条,那么就是n*(n+1)/2+1条,也就说说同组数量越大,产生记录会和数量平方成正比,这就是在原先实验中组小基本是一两条时左联接最快,组数量越大越慢的根本原因。综合分析可以得出结论,左联方案只是完成需求,对优化没有多考虑。

 

方案三:not exists方案

复制代码
select
    a.id,
    a.order_no,
    a.shipper_cd,
    to_char(a.create_time,'yyyy-mm-dd') as create_time
from
    peisong a
where
      not exists  (
    select
        1
    from
        peisong b
    where
        b.shipper_cd = a.shipper_cd and 
        b.order_no = a.order_no and 
        b.create_time > a.create_time)
order by
    a.id,
    a.order_no,
    a.shipper_cd
复制代码

或:

select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a where not exists ( select 1 from peisong b where b.shipper_cd = a.shipper_cd and b.order_no = a.order_no and b.create_time > a.create_time) order by a.id, a.order_no, a.shipper_cd

结果:

复制代码
        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27
复制代码

Not exists方式会形成n*n的结构,但右边的n只做条件不筛选,然后对左边的n过滤, 最适合按 order_no,shipper_cd分组,而组内元素较多>10的情况,和左联正好相反。

 

方案四:最好理解又高效的rank分析函数方案

Oracle提供了分析函数rank,它可以按order_no,shipper_cd分组,按create_time逆序排序,然后将组内顺序赋上顺序值。

像这样

select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong

之后我们只要取出seq=1的记录就达成需求了。

最终SQL:

select
id,
order_no,
shipper_cd,
to_char(create_time,'yyyy-mm-dd') as create_time
from peisong where id in (select a.id from (select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong) a where a.seq=1)
order by id

执行结果:

复制代码
SQL> select
  2      id,
  3      order_no,
  4      shipper_cd,
  5      to_char(create_time,'yyyy-mm-dd') as create_time
  6  from peisong where id in (select a.id from (select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong) a where a.seq=1)
  7  order by id;

        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

已选择10行。
复制代码

这种方案由于rank的使用,第二个子查询就抛下了大多数无用数据行,因此效率也不错,还好理解,值得推荐。

 

可以看出,以上四种SQL文都能达成需求,具体选择哪种方案,需要根据实际数据分布进行实验,再选最合适的一种。

这次的分析应该比上次:https://www.cnblogs.com/heyang78/p/12079017.html 要更深入一些。

END

posted @   逆火狂飙  阅读(132)  评论(0编辑  收藏  举报
编辑推荐:
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 【杭电多校比赛记录】2025“钉耙编程”中国大学生算法设计春季联赛(1)
历史上的今天:
2019-08-31 转贴:MySQL的explain分析sql语句
2017-08-31 【Canvas电脑桌面】蓝底金圈纹桌面(1920*1080)
2017-08-31 有趣的HTML5/CSS3艺术网站
2013-08-31 Foreda8上安装CMake2.8.1.2
生当作人杰 死亦为鬼雄 至今思项羽 不肯过江东
点击右上角即可分享
微信分享提示