Hive 组内排序,计算TopN
需求场景,组内排序,例如统计某个用户前10次消费金额,如求某个相同id或组内的top值
步骤一,创建测试表
create table tmp_partition_test ( name string, subject string, score int )
步骤二,插入测试数据
insert into tmp_partition_test (name,subject,score) values ('张多','语文',87 ); insert into tmp_partition_test (name,subject,score) values ('张多','数学',95) ; insert into tmp_partition_test (name,subject,score) values ('张多','英语',68) ; insert into tmp_partition_test (name,subject,score) values ('王就','语文',94) ; insert into tmp_partition_test (name,subject,score) values ('王就','数学',56) ; insert into tmp_partition_test (name,subject,score) values ('王就','英语',84) ; insert into tmp_partition_test (name,subject,score) values ('蔡结','语文',64) ; insert into tmp_partition_test (name,subject,score) values ('蔡结','数学',86) ; insert into tmp_partition_test (name,subject,score) values ('蔡结','英语',84) ; insert into tmp_partition_test (name,subject,score) values ('金胡','语文',65) ; insert into tmp_partition_test (name,subject,score) values ('金胡','数学',85) ; insert into tmp_partition_test (name,subject,score) values ('金胡','英语',78) ;
步骤二,进行数据查询,对比结果
select name,subject,score, rank() over(partition by subject order by score desc) rp, --RANK() 排序相同时会重复,总数不会变 dense_rank() over(partition by subject order by score desc) drp, --DENSE_RANK() 排序相同时会重复,总数会减少 row_number() over(partition by subject order by score desc) rmp --ROW_NUMBER() 会根据顺序计算 from tmp_partition_test;
参考连接:hive分组排序(rank函数+partiton实现) - 王怀宇 - 博客园 (cnblogs.com)