Hive sql 查询数据库查询 top-n
数据库查询*分组排序取top n
要求:按照课程分组,查找每个课程最高的两个成绩。
数据文件如下:
第一列no为学号,第二列course为课程,第三列score为分数
mysql> select * from lesson;
+-------+---------+-------+
| no | course | score |
+-------+---------+-------+
| N0101 | Marth | 100 |
| N0102 | English | 12 |
| N0102 | Chinese | 55 |
| N0102 | History | 58 |
| N0102 | Marth | 25 |
| N0103 | English | 100 |
| N0103 | Chinese | 87 |
| N0103 | History | 88 |
| N0103 | Marth | 72 |
| N0104 | English | 20 |
| N0104 | Chinese | 60 |
| N0104 | History | 88 |
| N0104 | Marth | 56 |
| N0105 | English | 56 |
| N0105 | Chinese | 88 |
| N0105 | History | 88 |
| N0201 | English | 66 |
| N0201 | Chinese | 77 |
| N0201 | History | 80 |
| N0201 | Marth | 100 |
| N0202 | English | 35 |
| N0202 | Chinese | 56 |
| N0202 | History | 86 |
| N0202 | Marth | 99 |
| N0203 | English | 100 |
| N0203 | Chinese | 87 |
| N0203 | History | 88 |
| N0203 | Marth | 57 |
| N0204 | English | 98 |
| N0204 | Chinese | 100 |
| N0204 | History | 66 |
| N0204 | Marth | 71 |
| N0205 | English | 98 |
| N0205 | Chinese | 100 |
| N0205 | History | 66 |
| N0205 | Marth | 71 |
| N0301 | English | 66 |
| N0301 | Chinese | 89 |
| N0301 | History | 68 |
| N0301 | Marth | 83 |
| N0302 | English | 76 |
| N0302 | Chinese | 99 |
| N0302 | History | 80 |
| N0302 | Marth | 74 |
| N0303 | English | 100 |
| N0303 | Chinese | 100 |
| N0303 | History | 88 |
| N0303 | Marth | 57 |
| N0304 | English | 76 |
| N0304 | Chinese | 100 |
| N0304 | History | 66 |
| N0304 | Marth | 86 |
| N0305 | English | 98 |
| N0305 | Chinese | 100 |
| N0305 | History | 40 |
| N0305 | Marth | 59 |
| N0306 | English | 52 |
| N0306 | Chinese | 87 |
| N0306 | History | 72 |
| N0306 | Marth | 71 |
| N0101 | Chinese | 55 |
| N0101 | History | 84 |
| N0101 | English | 82 |
| N0101 | English | 82 |
+-------+---------+-------+
64 rows in set
在hive上查询
select a.course,a.score from ( select course,score,row_number() over(partition by course order by score desc) as n from lesson )a where a.n<=2;
其中:
row_number() over(partition by course order by score desc)
意思是以课程分组,按成绩递减排序,并为每组中的数据打上行号的标记,从1开始。
这样,再在外层套一层过滤行号小于等于2的即可:-D
原文:https://blog.csdn.net/wguangliang/article/details/50167283