oracle:实现滑动平均效果
问题背景
1321. 餐馆营业额变化增长 - 力扣(LeetCode)
实现思路:
(1) 首先找到对应的7天之末的终点,分别是2019-01-07,2019-01-08,2019-01-09,2019-01-10
.
在Oracle
中可以通过lead
函数实现上述效果.但是需要考虑的两种情况:
- 时间序列连续与不连续
- 时间序列的重复与不重复
注:这里的重复是完全相同
(2) 假定在上述(1)已经完全解决,那么只需要从表customer
筛选出customer_id
在7天终结结点序列
的样例,组成新表tmp
,然后将其与customer
表全连接即可,达到初步想要的结果.但事实上,这样的样例并非所有的都满足要求,所以需要进一步筛选.而其对应的筛选条件就是:customer
的visited_on
与tmp
的visited_on
的差值在[0,6]
之间即可.
(3) 然后按照tmp
的visited_on
分组统计即可.
注:当7天终结结点序列
中的时间戳
在customer
中存在重复,在分组统计之前,需要对关联表进行去重.然后该节点会重复num
次计算(num:表示时间戳重复的次数
).
这里的7天终结结点序列
就是滑动平均
的时间各时间节点.
期望的结果是:sum=each_sum avg=each_avg
而事实上是:sum2=num*each_sum,avg2=sum2/7
在本例中,2019-01-10
重复,
- 不去重,会出现重复计算
select cu.*,tmp.visited_on t_visited_on
from customer cu,
(select * from customer
where visited_on in (
select lead(visited_on,6 ) over(order by visited_on )
from (select distinct visited_on from customer))
and visited_on is not null ) tmp
where tmp.visited_on-cu.visited_on between 0 and 6
order by tmp.visited_on;
- 去重后(下面的结果才是期望得到的):
select distinct tmp2.*
from (
select cu.*,tmp.visited_on t_visited_on
from customer cu,
(select * from customer
where visited_on in (
select lead(visited_on,6 ) over(order by visited_on )
from (select distinct visited_on from customer))
and visited_on is not null ) tmp
where tmp.visited_on-cu.visited_on between 0 and 6
) tmp2
order by t_visited_on;
时间序列连续
数据准备
表customer
,特点:时间序列基本连续
create table customer(
customer_id number,
name varchar2(10),
visited_on date,
amount number
);
insert into customer values(1,'John',to_date('2019-01-01','yyyy-mm-dd'),100);
insert into customer values(2,'Daniel',to_date('2019-01-02','yyyy-mm-dd'),110);
insert into customer values(3,'Jade',to_date('2019-01-03','yyyy-mm-dd'),120);
insert into customer values(4,'Khaled',to_date('2019-01-04','yyyy-mm-dd'),130);
insert into customer values(5,'Winston',to_date('2019-01-05','yyyy-mm-dd'),110);
insert into customer values(6,'Elvis',to_date('2019-01-06','yyyy-mm-dd'),140);
insert into customer values(7,'Anna',to_date('2019-01-07','yyyy-mm-dd'),150);
insert into customer values(8,'Maria',to_date('2019-01-08','yyyy-mm-dd'),80);
insert into customer values(9,'Jaze',to_date('2019-01-09','yyyy-mm-dd'),110);
insert into customer values(1,'Jhon',to_date('2019-01-10','yyyy-mm-dd'),130);
insert into customer values(3,'Jade',to_date('2019-01-10','yyyy-mm-dd'),150);
完整的查询语句
select to_char(t_visited_on) visited_on,
sum(amount) amount,
round(sum(amount)/7,2) average_amount
from (
select distinct tmp2.*
from (
select cu.*,tmp.visited_on t_visited_on
from customer cu,
(select * from customer
where visited_on in (
select lead(visited_on,6 ) over(order by visited_on )
from (select distinct visited_on from customer))
and visited_on is not null ) tmp
where tmp.visited_on-cu.visited_on between 0 and 6
) tmp2
)
group by t_visited_on
order by t_visited_on;
结果与问题背景中的结果一致
时间序列不完全连续
数据准备
create table customer2(
customer_id number,
name varchar2(10),
visited_on date,
amount number
);
insert into customer2 values(44,'Ashley',to_date('2019-01-04','yyyy-mm-dd'),160);
insert into customer2 values(23,'Sabo',to_date('2019-01-04','yyyy-mm-dd'),70);
insert into customer2 values(38,'Moustafa',to_date('2019-01-05','yyyy-mm-dd'),90);
insert into customer2 values(30,'Halley',to_date('2019-01-06','yyyy-mm-dd'),140);
insert into customer2 values(5,'Elvis',to_date('2019-01-07','yyyy-mm-dd'),160);
insert into customer2 values(12,'Leslie',to_date('2019-01-08','yyyy-mm-dd'),100);
insert into customer2 values(23,'Sabo',to_date('2019-01-08','yyyy-mm-dd'),90);
insert into customer2 values(13,'Will',to_date('2019-01-09','yyyy-mm-dd'),170);
insert into customer2 values(20,'Brock',to_date('2019-01-10','yyyy-mm-dd'),160);
insert into customer2 values(29,'Leo',to_date('2019-01-10','yyyy-mm-dd'),90);
insert into customer2 values(33,'Isaac',to_date('2019-01-11','yyyy-mm-dd'),60);
insert into customer2 values(46,'Selena',to_date('2019-01-12','yyyy-mm-dd'),100);
insert into customer2 values(4,'Winston',to_date('2019-01-13','yyyy-mm-dd'),150);
insert into customer2 values(15,'Marti',to_date('2019-01-13','yyyy-mm-dd'),160);
表2:customer2
,问题要求与问题背景描述里面一致
对应的正确的查询结果:
[["2019-01-10",1230,175.71],
["2019-01-11",1060,151.43],
["2019-01-12",1070,152.86],
["2019-01-13",1240,177.14]]
具体实现
将上述完整程序
中的表替换成customer2
,其结果是:
/* Write your PL/SQL query statement below */
select to_char(t_visited_on) visited_on,
sum(amount) amount,
round(sum(amount)/7,2) average_amount
from (
select distinct tmp2.*
from (
select cu.*,tmp.visited_on t_visited_on
from customer2 cu,
(select * from customer2
where visited_on in (
select lead(visited_on,6 ) over(order by visited_on )
from (select distinct visited_on from customer2))
and visited_on is not null ) tmp
where tmp.visited_on-cu.visited_on between 0 and 6
) tmp2
)
group by t_visited_on
order by t_visited_on;
时间序列的连续与不连续会影响,会影响滑动平均节点范围
的查找时.查找之前去重,是解决该问题的关键
.以下是两者的区别对比:
两个语句的结果:
--存在异常
select lead(visited_on,6 ) over(order by visited_on ) from customer2;
LEAD(VISITED_O
--------------
08-1月 -19
09-1月 -19
10-1月 -19
10-1月 -19
11-1月 -19
12-1月 -19
13-1月 -19
13-1月 -19
--正确
select lead(visited_on,6 ) over(order by visited_on )
from (select distinct visited_on from customer2);
LEAD(VISITED_O
--------------
10-1月 -19
11-1月 -19
12-1月 -19
13-1月 -19
时间序列存在重复样例
为了方便,对表customer
中插入一列,使表中存在重复列.
insert into customer values(9,'Jaze',to_date('2019-01-09','yyyy-mm-dd'),110);
使用上述程序的查询结果为:
理应2019-01-10
中amount
变成1110
,平均值也应该变化,但是没有,是因为distinct
出现了误伤
所以需要对该约束
进行修改,具体程序如下:
/* Write your PL/SQL query statement below */
select to_char(t_visited_on) visited_on,
sum(amount)/(count(*)/(select sum(case when t_visited_on-visited_on between 0 and 6 then 1 else 0 end) from customer) ) amount,
round(sum(amount)/(count(*)/(select sum(case when t_visited_on-visited_on between 0 and 6 then 1 else 0 end) from customer) )/7,2) average_amount
from (
select tmp2.*
from (
select cu.*,tmp.visited_on t_visited_on
from customer cu,
(select * from customer
where visited_on in (
select lead(visited_on,6 ) over(order by visited_on )
from (select distinct visited_on from customer))
and visited_on is not null ) tmp
where tmp.visited_on-cu.visited_on between 0 and 6
) tmp2
)
group by t_visited_on
order by t_visited_on;
结论
结论:在时间序列连续或不连续情况下,找到准确的时间区间间隔是关键.这个案例为Oracle
中滑动平均
提供思路,具体如下:
(1)确定滑动窗口的时间节点,假设以2019-03月为例
,时间间隔为3(num)
,则对应的时间节点为:2019-03-01,2019-03-04,2019-03-07......
;并依据该时间点筛选原表,组成tmp1
(2)将表tmp1
与原表全连接,条件:时间差在[0,(3-1)num-1]
,
关联表的重复需要慎重处理,然后再分组统计.
使用开窗函数的方法
1321. 餐馆营业额变化增长 题解 - 力扣(LeetCode)
实现滑动平均
,也可以直接应用Oracle
中窗口函数,其在查询过程中为当前记录提供一个相关记录集,相应的记录集也会随之改变.分析函数+over()
是常见的搭配形式.(oracle学习笔记(六):oracle中排序函数及其应用_数据库_八千里路云和月-CSDN博客,中总结了常见的窗口函数在排序中的应用.
实际应用中,一旦使用了窗口函数,都会为其产生一个可操作的记录集合.而对该记录,可以进一步使用窗口子句,来进一步限定窗口范围.常用的窗口子句包括两类:
利用rows子句的行方式进行限定;
其原理是,在已确定的窗口中,各记录已经按照某种标准进行了排序;而当前记录处于窗口中的特定位置;rows子句以当前记录为参照物,可以向前向后推移,并形成新的结果集,作为最终的操作窗口.
案例:在Emp
表中,获得员工信息的同时,获得相邻三个员工的工资综合.
SQL> select e.*,
2 sum(sal) over(order by empno rows between 1 preceding and 1 following) e_total
3 from emp e;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO E_TOTAL
---------- ---------- --------- ---------- -------------- ---------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-12月-80 1856.92 20 4885.12
7499 ALLEN SALESMAN 7698 20-2月 -81 3028.2 300 30 7300.89
7521 WARD SALESMAN 7698 22-2月 -81 2415.77 500 30 10468.5
7566 JONES MANAGER 7839 02-4月 -81 5024.53 20 9856.07
7654 MARTIN SALESMAN 7698 28-9月 -81 2415.77 1400 30 12298.45
7698 BLAKE MANAGER 7839 01-5月 -81 4858.15 30 11599.67
7782 CLARK MANAGER 7839 09-6月 -81 4325.75 10 13709.3
7788 SCOTT ANALYST 7566 19-4月 -87 4525.4 20 16837.15
7839 KING PRESIDENT 17-11月-81 7986 10 15293.19
7844 TURNER SALESMAN 7698 08-9月 -81 2781.79 0 30 12963.94
7876 ADAMS CLERK 7788 23-5月 -87 2196.15 20 6954.48
7900 JAMES CLERK 7698 03-12月-81 1976.54 30 8698.09
7902 FORD ANALYST 7566 03-12月-81 4525.4 20 8990.91
7934 MILLER CLERK 7782 23-1月 -82 2488.97 10 7014.37
已选择14行。
其中rows between 1 preceding and 1 following
不一定返回三条记录.若排序之后,该记录只有位于首\或者末,则是两条记录的平均值.rows子句与位置相关.因此,在窗口函数中,必须含有排序子句order by
.如果布用将会报错.
利用range子句的值方式进行限定.
按照列值进行窗口的进一步限制.
案例:
SQL> select e.*,
2 count(1) over(partition by deptno order by sal range between 500 preceding and 500 following) new
3 from emp e;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO NEW
---------- ---------- --------- ---------- -------------- ---------- ---------- ---------- ----------
7934 MILLER CLERK 7782 23-1月 -82 2488.97 10 1
7782 CLARK MANAGER 7839 09-6月 -81 4325.75 10 1
7839 KING PRESIDENT 17-11月-81 7986 10 1
7369 SMITH CLERK 7902 17-12月-80 1856.92 20 2
7876 ADAMS CLERK 7788 23-5月 -87 2196.15 20 2
7788 SCOTT ANALYST 7566 19-4月 -87 4525.4 20 3
7902 FORD ANALYST 7566 03-12月-81 4525.4 20 3
7566 JONES MANAGER 7839 02-4月 -81 5024.53 20 3
7900 JAMES CLERK 7698 03-12月-81 1976.54 30 3
7521 WARD SALESMAN 7698 22-2月 -81 2415.77 500 30 4
7654 MARTIN SALESMAN 7698 28-9月 -81 2415.77 1400 30 4
7844 TURNER SALESMAN 7698 08-9月 -81 2781.79 0 30 4
7499 ALLEN SALESMAN 7698 20-2月 -81 3028.2 300 30 2
7698 BLAKE MANAGER 7839 01-5月 -81 4858.15 30 1
已选择14行。
unbounded和current now
unbounded:可以直接代替数值,表示内有任何限制;
current now:直接代表当前行.
对于order by 子句,如果没有显式指定rows
和range
子句,那么就相当于使用了rows between unbounded preceding and current row 或者 range between unbounded preceding and current row
.但是,当排序后存在相同排名时,则情况会存在些许差异.
基于上述新的知识点,解决上述问题.
SQL> select distinct visited_on,amount,round(amount/7,2)
2 from (
3 select to_char(visited_on) visited_on,
4 sum(amount) over(order by visited_on range between 6 preceding and 0 following ) amount
5 from customer cu
6 )
7 where visited_on in (
8 select lead(visited_on,6 ) over(order by visited_on )
9 from (select distinct visited_on from customer))
10 and visited_on is not null;
VISITED_ON AMOUNT ROUND(AMOUNT/7,2)
-------------- ---------- -----------------
07-1月 -19 860 122.86
08-1月 -19 840 120
09-1月 -19 950 135.71
10-1月 -19 1110 158.57
SQL> select distinct visited_on,amount,round(amount/7,2)
2 from (
3 select to_char(visited_on) visited_on,
4 sum(amount) over(order by visited_on range between 6 preceding and 0 following ) amount
5 from customer2 cu
6 )
7 where visited_on in (
8 select lead(visited_on,6 ) over(order by visited_on )
9 from (select distinct visited_on from customer2))
10 and visited_on is not null;
VISITED_ON AMOUNT ROUND(AMOUNT/7,2)
-------------- ---------- -----------------
10-1月 -19 1230 175.71
11-1月 -19 1060 151.43
13-1月 -19 1240 177.14
12-1月 -19 1070 152.86
参考:
<21天学通Oracle,张朝明,陈丹>