Hive之累计报表生成

Hive之累计报表生成

1. 原始数据

u01 2019/1/21 5
u02 2019/1/23 6
u03 2019/1/22 8
u04 2019/1/20 3
u01 2019/1/23 6
u01 2019/2/21 8
u02 2019/1/23 6
u01 2019/2/22 4

2. 建表映射上述数据

create table action (userId string, visitDate string, visitCount int) row format delimited fields terminated by "\t";

 

 

3. 按照用户和月份分组生成某用户的当月总访问次数

create table action_amount
as
select tmp.userid,tmp.month,sum(tmp.visitcount) amount from (select userid,from_unixtime(unix_timestamp(visitdate,'yyyy/mm/dd'),'yyyy-mm') month,visitcount from action) tmp group by tmp.userid,tmp.month;

4. 通过两个表的自连接,建立临时表

create table action_tmp
as
select a.amount as a_amount,b.*
from action_amount a join action_amount b on a.userid=b.userid
where a.month <= b.month;

 

5. 将上述表按照userid和month分组

select userid,month,max(amount) as amount,sum(a_amount) as accumulate
from action_tmp
group by userid,month;

6. 使用加窗函数完成累计报表生成

select userid, month,amount,
sum(amount) over(partition by userid order by month rows between unbounded preceding and current row) as accumulate
from action_amount;

 

 

posted @ 2019-12-02 22:36  Striving_For_Dream  阅读(587)  评论(0编辑  收藏  举报
/*快速评论*/ #div_digg { position: fixed; bottom: 10px; right: 15px; border: 2px solid #ECD7B1; padding: 10px; width: 140px; background-color: #fff; border-radius: 5px 5px 5px 5px !important; box-shadow: 0 0 0 1px #5F5A4B, 1px 1px 6px 1px rgba(10, 10, 0, 0.5); }