窗口函数【分析函数】
窗口函数【分析函数】【聚合函数 over(paratition by 字段 / orderr by 字段) as 别名】
Mysql8.0 支持窗口函数【Window Function】,也称分析函数
与分组聚合函数类似,但是 每一行数据都生成一个结果
聚合窗口函数:SUM、AVG、COUNT、MAX、MIN等等
1. 窗口函数 VS 传统聚合函数
传统聚合函数 group by
窗口函数【分析函数】===> 可以显示出原始数据
select
year,
country,
product,
profit,
sum(profit) OVER (partition by country) as country_profit
from
sales
order by
country,
year,
product,
profit
2. 专用窗口函数
2.1 排名函数
row_number():唯一
rank():跳过
dense_rank():不跳过
percent_rank()
2.2 分组内的第一或最后一名
first_value()
last_value()
lead():相对当前数据,向后查第几个
lag():相对当前数据,向前查几个
2.3 第一行到当前行求和
# Write your MySQL query statement below
# 有一队乘客在等着上巴士。然而,巴士有1000 千克 的重量限制,所以其中一部分乘客可能无法上巴士。
# 找出 最后一个 上巴士且不超过重量限制的乘客,并报告 person_name 。题目测试用例确保顺位第一的人可以上巴士且不会超重。
select person_name from
(select
turn Turn,
person_id ID,
person_name,
weight Weight,
sum(weight) over(order by turn ROWS between UNBOUNDED PRECEDING AND CURRENT ROW) Total_Weight
from Queue) temp
where Total_Weight <= 1000
order by Total_Weight desc
limit 1
- 主动请求确认消息的用户
# 24 小时窗口内两次请求确认消息的用户的 ID
select distinct(user_id) user_id from
(select
user_id,
time_stamp,
lag(time_stamp) over(partition by user_id order by time_stamp) time_lag
from Confirmations) temp
where time_lag is not null and
timestampdiff(second, time_lag, time_stamp) <= 86400
2.3 分析函数
cume_dist():累积到现在,这个数据占了多少
nth_value()
ntile()
3. 应用
1532. 最近的三笔订单
- 先窗口函数
SELECT
order_id,
order_date,
customer_id,
row_number() over(partition by customer_id order by order_date DESC) as rank_tag
FROM orders
order_id | order_date | customer_id | rank_tag |
---|---|---|---|
8 | 2020-08-03 | 1 | 1 |
1 | 2020-07-31 | 1 | 2 |
10 | 2020-07-15 | 1 | 3 |
5 | 2020-06-10 | 1 | 4 |
9 | 2020-08-07 | 2 | 1 |
6 | 2020-08-01 | 2 | 2 |
2 | 2020-07-30 | 2 | 3 |
7 | 2020-08-01 | 3 | 1 |
3 | 2020-07-31 | 3 | 2 |
4 | 2020-07-29 | 4 | 1 |
- 再左连接
select name customer_name, temp.customer_id customer_id, order_id, order_date from
(select
order_id,
order_date,
customer_id,
row_number() over(partition by customer_id order by order_date DESC) as rank_tag
from orders) temp left join Customers on
temp.customer_id = Customers.customer_id
- 再根据 rank_tag 筛选,并排序
select name customer_name, temp.customer_id customer_id, order_id, order_date from
(select
order_id,
order_date,
customer_id,
row_number() over(partition by customer_id order by order_date DESC) as rank_tag
from orders) temp left join Customers on
temp.customer_id = Customers.customer_id
where rank_tag <= 3
order by customer_name, customer_id, order_date desc
1077. 项目员工Ⅲ
select project_id, employee_id from
(select project_id,
employee_id,
rank() over(partition by project_id order by experience_years desc) as `rank` from
(select project_id, p.employee_id, experience_years from Project p left join Employee e on
p.employee_id = e.employee_id) temp1) temp2
where `rank` = 1;
1285. 找到连续区间的开始和结束数字【技巧】
等差数列 - 等差数列,值才会相等
所以才会是一组
group by(log_id - rk)
log_id | rk |
---|---|
1 | 1 |
2 | 2 |
3 | 3 |
7 | 4 |
8 | 5 |
10 | 6 |
# Write your MySQL query statement below
select min(log_id) start_id, max(log_id) end_id from
(select
log_id,
rank() over(order by log_id) rk from Logs) temp
group by (log_id - rk)
1709. 访问日期之间最大的空档期
# Write your MySQL query statement below
# now:2021-1-1
# 求:最大间隔
# 1. 特殊情况:最后一个,now - max
select user_id, max(date_diff) biggest_window from
(select user_id, datediff('2021-1-1', max(visit_date)) date_diff from UserVisits
group by user_id
union
select user_id, max(datediff(next, visit_date)) date_diff from
(select
user_id,
visit_date,
lead(visit_date) over(partition by user_id order by visit_date) next
from UserVisits) temp
where temp.next is not null
group by user_id) temp2
group by user_id
1412. 查找成绩处于中游的学生【中游学生】
select temp1.student_id, student_name from
(select distinct student_id from Exam
where student_id not in
(select student_id from
(select exam_id,
student_id,
score,
rank() over(partition by exam_id order by score) rk1, # min:rk1 = 1
rank() over(partition by exam_id order by score desc) rk2 # max:rk2 = 1
from Exam) temp
where rk1 = 1 or rk2 = 1)) temp1 left join Student on
temp1.student_id = Student.student_id
order by temp1.student_id
550. 游戏玩法分析 Ⅳ
select round(
(select count(distinct player_id) from
(select
player_id,
event_date,
count(1) over(partition by player_id) size,
lead(event_date) over(partition by player_id order by event_date) lead1,
rank() over (partition by player_id order by event_date) rk
from Activity) temp
where rk = 1 and datediff(lead1, event_date) = 1)
/
(select count(distinct player_id) from Activity), 2) fraction
585. 2016年的投资
# Write your MySQL query statement below
# 2016 年
# 1. 他在 2015 年的投保额 (tiv_2015) 至少跟一个其他投保人在 2015 年的投保额相同。
# 2. 他所在的城市必须与其他投保人都不同(也就是说 (lat, lon) 不能跟其他任何一个投保人完全相同)。
# select round(sum(tiv_2016), 2) tiv_2016 from Insurance
# where pid not in
select round(sum(tiv_2016), 2) tiv_2016 from Insurance where
pid in
(select pid from
(select
pid,
count(1) over (partition by lat, lon) ct1, # 唯一坐标 ct1 = 1
count(1) over (partition by tiv_2015) ct2 # 2015相同 ct2 > 1
from Insurance) temp
where ct1 = 1 and ct2 > 1)
1164. 指定日期的产品价格
# Write your MySQL query statement below
# 1. 在 2019-08-16 时全部产品的价格
# 2. 假设所有产品在修改前的价格都是 10
## 一、有符合条件的
## 二、没有符合条件
select product_id, price from
(select
product_id,
new_price price,
change_date,
rank() over(partition by product_id order by change_date desc) rk
from Products
where change_date <= '2019-08-16') temp
where rk = 1
union
select product_id, 10 from Products
where product_id not in
(select product_id from Products
where change_date <= '2019-08-16')