Hive 刷题——出勤率问题
需求描述
现有用户出勤表(user_login)如下。
user_id (用户id) | course_id (课程id) | login_in (登录时间) | login_out (登出时间) |
---|---|---|---|
1 | 1 | 2022-06-02 09:08:24 | 2022-06-02 10:09:36 |
1 | 1 | 2022-06-02 11:07:24 | 2022-06-02 11:44:21 |
1 | 2 | 2022-06-02 13:50:24 | 2022-06-02 14:21:50 |
2 | 2 | 2022-06-02 13:50:10 | 2022-06-02 15:30:20 |
课程报名表(course_apply)如下。
course_id (课程id) | course_name (课程名称) | user_id (用户id) |
---|---|---|
1 | java | [1,2,3,4,5,6] |
2 | 大数据 | [1,2,3,6] |
3 | 前端 | [2,3,4,5] |
注:出勤率指用户看直播时间超过40分钟,求出每个课程的出勤率(结果保留两位小数)。
期望结果如下:
course_id <int> (课程id) | adr <decimal(16,2)> (出勤率) |
---|---|
1 | 0.33 |
2 | 0.50 |
3 | 0.25 |
参考实现
select coalesce(t.course_id,t1.course_id) course_id, cast(nvl(t.cnt,0) / t1.user_cnt as decimal(16, 2)) as adr from (select course_id, count(*) cnt from (select user_id, course_id, SUM(mins) mins_sum from (select user_id, course_id, (unix_timestamp(login_out) - unix_timestamp(login_in)) / 60 mins from user_login) t group by user_id, course_id) i where mins_sum > 40 group by course_id) t full join (select course_id, count(uid) user_cnt from (select course_id, uid from course_apply t lateral view explode(user_id) t as uid group by uid, course_id) t1 group by course_id) t1 on t.course_id = t1.course_id ;、