Hive函数:rank()、dense_rank()
数据准备:
G1,KING,5000 G1,BING,5000 G2,FING,5000 G1,FORD,3000 G2,SCOTT,3000 G1,JONES,2975 G2,BLAKE,2850 G1,CLARK,2450 G1,ALLEN,1600 G1,CELL1,NULL G2,CELL2,NULL CREATE EXTERNAL TABLE test_data ( gid string, ename STRING, sal INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile location '/user/jc_rc_ftp/test_data'; select * from test_data l; +--------+----------+--------+--+ | l.gid | l.ename | l.sal | +--------+----------+--------+--+ | G1 | CELL1 | NULL | | G2 | CELL2 | NULL | | G1 | KING | 5000 | | G1 | BING | 5000 | | G2 | FING | 5000 | | G1 | FORD | 3000 | | G2 | SCOTT | 3000 | | G1 | JONES | 2975 | | G2 | BLAKE | 2850 | | G1 | CLARK | 2450 | | G1 | ALLEN | 1600 | +--------+----------+--------+--+
rank
---rank()over(order by 列名排序)的结果是不连续的,如果有4个人,其中有3个是并列第1名,那么最后的排序结果结果如:1 1 1 4
SQL> select gid,ename,sal,rank() over(order by sal desc) from test_data;
---rank() over (partition by 分组字段 order by 排序字段 顺序)
注意:使用rank()over(order by 排序字段 顺序)排序的时候,空值是最大的
select gid,ename,sal,rank()over(partition by gid order by coalesce(sal,0) desc) from test_data; +------+--------+-------+----------------+--+ | gid | ename | sal | rank_window_0 | +------+--------+-------+----------------+--+ | G1 | BING | 5000 | 1 | | G1 | KING | 5000 | 1 | | G1 | FORD | 3000 | 3 | | G1 | JONES | 2975 | 4 | | G1 | CLARK | 2450 | 5 | | G1 | ALLEN | 1600 | 6 | | G1 | CELL1 | NULL | 7 | | G2 | FING | 5000 | 1 | | G2 | SCOTT | 3000 | 2 | | G2 | BLAKE | 2850 | 3 | | G2 | CELL2 | NULL | 4 | +------+--------+-------+----------------+--+
dense_rank
---dense_rank()over(order by 列名排序)的结果是连续的,如果有4个人,其中有3个是并列第1名,那么最后的排序结果如:1 1 1 2
如果排序字段为null,可能造成在排序时将null字段排在最前面,影响排序的正确性。
所以建议将 dense_rank()over(order by 列名 排序)改为dense_rank()over(order by 列名排序 nulls last)
---rank()over(order by 列名排序)的结果是不连续的,如果有4个人,其中有3个是并列第1名,那么最后的排序结果结果如:1 1 1 4 SQL> select gid,ename,sal,rank() over(order by sal desc) from test_data; +------+--------+-------+----------------+--+ | gid | ename | sal | rank_window_0 | +------+--------+-------+----------------+--+ | G2 | FING | 5000 | 1 | | G1 | BING | 5000 | 1 | | G1 | KING | 5000 | 1 | | G2 | SCOTT | 3000 | 4 | | G1 | FORD | 3000 | 4 | | G1 | JONES | 2975 | 6 | | G2 | BLAKE | 2850 | 7 | | G1 | CLARK | 2450 | 8 | | G1 | ALLEN | 1600 | 9 | | G1 | CELL1 | NULL | 10 | | G2 | CELL2 | NULL | 10 | +------+--------+-------+----------------+--+ ---rank() over (partition by 分组字段 order by 排序字段 顺序) 注意:使用rank()over(order by 排序字段 顺序)排序的时候,空值是最大的 select gid,ename,sal,rank()over(partition by gid order by coalesce(sal,0) desc) from test_data; +------+--------+-------+----------------+--+ | gid | ename | sal | rank_window_0 | +------+--------+-------+----------------+--+ | G1 | BING | 5000 | 1 | | G1 | KING | 5000 | 1 | | G1 | FORD | 3000 | 3 | | G1 | JONES | 2975 | 4 | | G1 | CLARK | 2450 | 5 | | G1 | ALLEN | 1600 | 6 | | G1 | CELL1 | NULL | 7 | | G2 | FING | 5000 | 1 | | G2 | SCOTT | 3000 | 2 | | G2 | BLAKE | 2850 | 3 | | G2 | CELL2 | NULL | 4 | +------+--------+-------+----------------+--+ ---dense_rank()over(order by 列名排序)的结果是连续的,如果有4个人,其中有3个是并列第1名,那么最后的排序结果如:1 1 1 2 如果排序字段为null,可能造成在排序时将null字段排在最前面,影响排序的正确性。 所以建议将 dense_rank()over(order by 列名 排序)改为dense_rank()over(order by 列名排序 nulls last) select gid,ename,sal,dense_rank()over(order by sal desc) from test_data; +------+--------+-------+----------------------+--+ | gid | ename | sal | dense_rank_window_0 | +------+--------+-------+----------------------+--+ | G2 | FING | 5000 | 1 | | G1 | BING | 5000 | 1 | | G1 | KING | 5000 | 1 | | G2 | SCOTT | 3000 | 2 | | G1 | FORD | 3000 | 2 | | G1 | JONES | 2975 | 3 | | G2 | BLAKE | 2850 | 4 | | G1 | CLARK | 2450 | 5 | | G1 | ALLEN | 1600 | 6 | | G2 | CELL2 | NULL | 7 | | G1 | CELL1 | NULL | 7 | +------+--------+-------+----------------------+--+ select gid,ename,sal,dense_rank()over(partition by gid order by sal desc) from test_data; +------+--------+-------+----------------------+--+ | gid | ename | sal | dense_rank_window_0 | +------+--------+-------+----------------------+--+ | G1 | BING | 5000 | 1 | | G1 | KING | 5000 | 1 | | G1 | FORD | 3000 | 2 | | G1 | JONES | 2975 | 3 | | G1 | CLARK | 2450 | 4 | | G1 | ALLEN | 1600 | 5 | | G1 | CELL1 | NULL | 6 | | G2 | FING | 5000 | 1 | | G2 | SCOTT | 3000 | 2 | | G2 | BLAKE | 2850 | 3 | | G2 | CELL2 | NULL | 4 | +------+--------+-------+----------------------+--+
基础才是编程人员应该深入研究的问题,比如:
1)List/Set/Map内部组成原理|区别
2)mysql索引存储结构&如何调优/b-tree特点、计算复杂度及影响复杂度的因素。。。
3)JVM运行组成与原理及调优
4)Java类加载器运行原理
5)Java中GC过程原理|使用的回收算法原理
6)Redis中hash一致性实现及与hash其他区别
7)Java多线程、线程池开发、管理Lock与Synchroined区别
8)Spring IOC/AOP 原理;加载过程的。。。
【+加关注】。