Hive 窗口函数

举例: row_number() over(partition by clue_id order by state_updated desc)

业务举例:

select distinct a.clue_id,
a.car_price,
a.service_amount,
a.buy_car_service_price,
a.created_at,
substr(a.state_updated,1,10) as state_updated
from
(
select distinct order_id,
clue_id,
car_price, --车辆成交价 deal_price
service_amount, --售车服务费
buy_car_service_price , --收车应收服务费
state_updated , --状态变更时间
created_at,
row_number()over(partition by clue_id order by state_updated DESC) as rank
from guazi_dw_dwd.dwd_ctob_trade_online_status_transfer_ymd -- 已定时间 --快卖线上化日志表 --ods.ods_ctob_trade_order
where dt = CAST(date_add('day', -1, current_date) AS VARCHAR)
and state = '10320000' ---10920000为已售
and substr(state_updated,1,10) >= '2019-06-10'
)a
where a.rank = 1

 

排序规则:

a   row_number  rank    dense_rank
------------------------
A   1    1    1
D   2   2   2
B   3   2   2
C   4   4   3
G   5   5   4
E   6   6   5
F   7   7   6
 
说明:
row_number: 不管排名是否有相同的,都按照顺序1,2,3…..n 
rank: 排名相同的名次一样,同一排名有几个,后面排名就会跳过几次 
dense_rank: 排名相同的名次一样,且后面名次不跳跃
 

一、sum(), min(), max(), avg() 等聚合函数

二、row_number(), rank(), dense_rank(), ntile() 等新增加序号列

三、lag(), lead(), first_value(), last_value() 等函数

四、grouping set, cube, roll up 等函数

链接:https://www.jianshu.com/p/9fda829b1ef1?from=timeline

 

常用的分析函数如下所列:

row_number() over(partition by … order by …)
rank() over(partition by … order by …)
dense_rank() over(partition by … order by …)
count() over(partition by … order by …)
max() over(partition by … order by …)
min() over(partition by … order by …)
sum() over(partition by … order by …)
avg() over(partition by … order by …)
first_value() over(partition by … order by …)
last_value() over(partition by … order by …)
lag() over(partition by … order by …)
lead() over(partition by … order by …)

 

posted @ 2019-08-27 15:56  数据分析笔记(自用)  阅读(225)  评论(0编辑  收藏  举报