hive求不及格课程数大于等于2的学生的平均成绩及其排名

需求

给定一个表,包含三个字段;sid 学生ID,score 课程分数,cid 课程ID ,求不及格课程数大于2的学生的平均成绩及其成绩平均值后所在的排名

数据准备

复制代码
create table scores_info_0208
(
    sid   int,
    score int,
    cid   int
) row format delimited
    fields terminated by '\t'
    stored as orc
    tblproperties ("orc.compress" = "snappy");

insert into scores_info_0208
values (1, 90, 1),
       (1, 59, 2),
       (1, 48, 3),
       (2, 61, 1),
       (2, 39, 2),
       (2, 70, 3),
       (3, 87, 1),
       (3, 13, 2),
       (3, 63, 3),
       (4, 36, 1),
       (4, 9, 2),
       (4, 59, 3);
select * from scores_info_0208;
复制代码

 

 

 解决思路

第一步:首先找出不及格成绩课程数大于等于2的学生,确定学生范围

select sid, sum(1) cnt
from scores_info_0208
where score < 60
group by sid
having cnt >= 2;
sid
cnt
4
3
1
2

 

第二步:计算所有学生课程的平均成绩

select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score
from scores_info_0208;
sid
score
cid
avg_score
1
90
1
65.67
1
59
2
65.67
1
48
3
65.67
2
61
1
56.67
2
39
2
56.67
2
70
3
56.67
3
87
1
54.33
3
13
2
54.33
3
63
3
54.33
4
36
1
34.67
4
9
2
34.67
4
59
3
34.67

 

第三步:在第二步的基础上按 avg_score 排名

select sid, avg_score, pm
from (select sid,
             avg_score,
             dense_rank() over (order by avg_score) pm
      from (select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score
            from scores_info_0208) t) y
group by sid, avg_score, pm;
sid
avg_score
pm
1
65.67
4
3
54.33
2
2
56.67
3
4
34.67
1

 

 

第四步:将第三步结果和第一步结果进行关联即可
两种关联方式
1、第一步结果集作为主表

复制代码
select t.sid, t1.avg_score, pm
from (select sid, sum(1) cnt
      from scores_info_0208
      where score < 60
      group by sid
      having cnt >= 2) t
         left join
     (select sid, avg_score, pm
      from (select sid,
                   avg_score,
                   dense_rank() over (order by avg_score) pm
            from (select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score
                  from scores_info_0208) t) y
      group by sid, avg_score, pm) t1
on t.sid = t1.sid
复制代码
sid
avg_score
pm
1
65.67
4
4
34.67
1

 

2、第三步结果集作主表 使用 left semi join

复制代码
select t.sid, t.avg_score, pm
from (select sid, avg_score, pm
      from (select sid,
                   avg_score,
                   dense_rank() over (order by avg_score) pm
            from (select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score
                  from scores_info_0208) t) y
      group by sid, avg_score, pm) t
         left semi join
     (select sid, sum(1) cnt
      from scores_info_0208
      where score < 60
      group by sid
      having cnt >= 2) t1
     on t.sid = t1.sid
复制代码
推荐使用 left semi join 进行计算,left semi join 比 in 效率更高,推荐使用

posted @   晓枫的春天  阅读(245)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· 一文读懂知识蒸馏
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
点击右上角即可分享
微信分享提示