hive求不及格课程数大于等于2的学生的平均成绩及其排名
需求
给定一个表,包含三个字段;sid 学生ID,score 课程分数,cid 课程ID ,求不及格课程数大于2的学生的平均成绩及其成绩平均值后所在的排名
数据准备
create table scores_info_0208 ( sid int, score int, cid int ) row format delimited fields terminated by '\t' stored as orc tblproperties ("orc.compress" = "snappy"); insert into scores_info_0208 values (1, 90, 1), (1, 59, 2), (1, 48, 3), (2, 61, 1), (2, 39, 2), (2, 70, 3), (3, 87, 1), (3, 13, 2), (3, 63, 3), (4, 36, 1), (4, 9, 2), (4, 59, 3); select * from scores_info_0208;
解决思路
第一步:首先找出不及格成绩课程数大于等于2的学生,确定学生范围
select sid, sum(1) cnt from scores_info_0208 where score < 60 group by sid having cnt >= 2;
sid
|
cnt
|
4
|
3
|
1
|
2
|
第二步:计算所有学生课程的平均成绩
select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score from scores_info_0208;
sid
|
score
|
cid
|
avg_score
|
1
|
90
|
1
|
65.67
|
1
|
59
|
2
|
65.67
|
1
|
48
|
3
|
65.67
|
2
|
61
|
1
|
56.67
|
2
|
39
|
2
|
56.67
|
2
|
70
|
3
|
56.67
|
3
|
87
|
1
|
54.33
|
3
|
13
|
2
|
54.33
|
3
|
63
|
3
|
54.33
|
4
|
36
|
1
|
34.67
|
4
|
9
|
2
|
34.67
|
4
|
59
|
3
|
34.67
|
第三步:在第二步的基础上按 avg_score 排名
select sid, avg_score, pm from (select sid, avg_score, dense_rank() over (order by avg_score) pm from (select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score from scores_info_0208) t) y group by sid, avg_score, pm;
sid
|
avg_score
|
pm
|
1
|
65.67
|
4
|
3
|
54.33
|
2
|
2
|
56.67
|
3
|
4
|
34.67
|
1
|
第四步:将第三步结果和第一步结果进行关联即可
两种关联方式
1、第一步结果集作为主表
select t.sid, t1.avg_score, pm from (select sid, sum(1) cnt from scores_info_0208 where score < 60 group by sid having cnt >= 2) t left join (select sid, avg_score, pm from (select sid, avg_score, dense_rank() over (order by avg_score) pm from (select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score from scores_info_0208) t) y group by sid, avg_score, pm) t1 on t.sid = t1.sid
sid
|
avg_score
|
pm
|
1
|
65.67
|
4
|
4
|
34.67
|
1
|
2、第三步结果集作主表 使用 left semi join
select t.sid, t.avg_score, pm from (select sid, avg_score, pm from (select sid, avg_score, dense_rank() over (order by avg_score) pm from (select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score from scores_info_0208) t) y group by sid, avg_score, pm) t left semi join (select sid, sum(1) cnt from scores_info_0208 where score < 60 group by sid having cnt >= 2) t1 on t.sid = t1.sid
推荐使用 left semi join 进行计算,left semi join 比 in 效率更高,推荐使用
分类:
HIve
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· 一文读懂知识蒸馏
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下