hive求不及格课程数大于等于2的学生的平均成绩及其排名

需求

给定一个表,包含三个字段;sid 学生ID,score 课程分数,cid 课程ID ,求不及格课程数大于2的学生的平均成绩及其成绩平均值后所在的排名

数据准备

create table scores_info_0208
(
    sid   int,
    score int,
    cid   int
) row format delimited
    fields terminated by '\t'
    stored as orc
    tblproperties ("orc.compress" = "snappy");

insert into scores_info_0208
values (1, 90, 1),
       (1, 59, 2),
       (1, 48, 3),
       (2, 61, 1),
       (2, 39, 2),
       (2, 70, 3),
       (3, 87, 1),
       (3, 13, 2),
       (3, 63, 3),
       (4, 36, 1),
       (4, 9, 2),
       (4, 59, 3);
select * from scores_info_0208;

 

 

 解决思路

第一步:首先找出不及格成绩课程数大于等于2的学生,确定学生范围

select sid, sum(1) cnt
from scores_info_0208
where score < 60
group by sid
having cnt >= 2;
sid
cnt
4
3
1
2

 

第二步:计算所有学生课程的平均成绩

select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score
from scores_info_0208;
sid
score
cid
avg_score
1
90
1
65.67
1
59
2
65.67
1
48
3
65.67
2
61
1
56.67
2
39
2
56.67
2
70
3
56.67
3
87
1
54.33
3
13
2
54.33
3
63
3
54.33
4
36
1
34.67
4
9
2
34.67
4
59
3
34.67

 

第三步:在第二步的基础上按 avg_score 排名

select sid, avg_score, pm
from (select sid,
             avg_score,
             dense_rank() over (order by avg_score) pm
      from (select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score
            from scores_info_0208) t) y
group by sid, avg_score, pm;
sid
avg_score
pm
1
65.67
4
3
54.33
2
2
56.67
3
4
34.67
1

 

 

第四步:将第三步结果和第一步结果进行关联即可
两种关联方式
1、第一步结果集作为主表

select t.sid, t1.avg_score, pm
from (select sid, sum(1) cnt
      from scores_info_0208
      where score < 60
      group by sid
      having cnt >= 2) t
         left join
     (select sid, avg_score, pm
      from (select sid,
                   avg_score,
                   dense_rank() over (order by avg_score) pm
            from (select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score
                  from scores_info_0208) t) y
      group by sid, avg_score, pm) t1
on t.sid = t1.sid
sid
avg_score
pm
1
65.67
4
4
34.67
1

 

2、第三步结果集作主表 使用 left semi join

select t.sid, t.avg_score, pm
from (select sid, avg_score, pm
      from (select sid,
                   avg_score,
                   dense_rank() over (order by avg_score) pm
            from (select *, cast(avg(score) over (partition by sid) as decimal(10, 2)) avg_score
                  from scores_info_0208) t) y
      group by sid, avg_score, pm) t
         left semi join
     (select sid, sum(1) cnt
      from scores_info_0208
      where score < 60
      group by sid
      having cnt >= 2) t1
     on t.sid = t1.sid
推荐使用 left semi join 进行计算,left semi join 比 in 效率更高,推荐使用

posted @ 2022-02-09 15:18  晓枫的春天  阅读(235)  评论(0编辑  收藏  举报