Hive 组内排序,计算TopN

需求场景,组内排序,例如统计某个用户前10次消费金额,如求某个相同id或组内的top值

步骤一,创建测试表

create table tmp_partition_test 
(
name string,
subject string,
score int
)

步骤二,插入测试数据

insert into tmp_partition_test (name,subject,score)
values  ('张多','语文',87    );
insert into tmp_partition_test (name,subject,score)
values  ('张多','数学',95)    ;
insert into tmp_partition_test (name,subject,score)
values  ('张多','英语',68)    ;
insert into tmp_partition_test (name,subject,score)
values  ('王就','语文',94)    ;
insert into tmp_partition_test (name,subject,score)
values  ('王就','数学',56)    ;
insert into tmp_partition_test (name,subject,score)
values  ('王就','英语',84)    ;
insert into tmp_partition_test (name,subject,score)
values  ('蔡结','语文',64)    ;
insert into tmp_partition_test (name,subject,score)
values  ('蔡结','数学',86)    ;
insert into tmp_partition_test (name,subject,score)
values  ('蔡结','英语',84)    ;
insert into tmp_partition_test (name,subject,score)
values  ('金胡','语文',65)    ;
insert into tmp_partition_test (name,subject,score)
values  ('金胡','数学',85)    ;
insert into tmp_partition_test (name,subject,score)
values  ('金胡','英语',78)    ;

步骤二,进行数据查询,对比结果

 

select name,subject,score,
rank() over(partition by subject order by score desc) rp,    --RANK() 排序相同时会重复,总数不会变
dense_rank() over(partition by subject order by score desc) drp,  --DENSE_RANK() 排序相同时会重复,总数会减少
row_number() over(partition by subject order by score desc) rmp   --ROW_NUMBER() 会根据顺序计算
from tmp_partition_test;

 

 

 

 

 

 

参考连接:hive分组排序(rank函数+partiton实现) - 王怀宇 - 博客园 (cnblogs.com)

 

posted @ 2021-08-06 15:17  欣欣姐  Views(424)  Comments(0Edit  收藏  举报