Hive 刷题——视频热度问题
需求描述
现有用户视频播放记录表(user_video_log)如下。
uid (球队名称) | video_id (视频id) | start_time (开始时间) | end_time (结束时间) | if_like (是否点赞) | if_retweet (是否喜欢) | comment_id (评论id) |
---|---|---|---|---|---|---|
101 | 2001 | 2021-09-24 10:00:00 | 2021-09-24 10:00:20 | 1 | 0 | null |
105 | 2002 | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 0 | 1 | null |
102 | 2002 | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 1 | 1 | null |
101 | 2002 | 2021-09-26 11:00:00 | 2021-09-26 11:00:30 | 0 | 1 | null |
视频信息表(video_info) 如下:
video_id (视频id) | author (作者id) | tag (标签) | duration (视频时长) |
---|---|---|---|
2001 | 901 | 旅游 | 30 |
2002 | 901 | 旅游 | 60 |
2003 | 902 | 影视 | 90 |
2004 | 902 | 美女 | 90 |
找出近一个月发布的视频中热度最高的top3视频。
注:热度=(a*视频完播率+b*点赞数+c*评论数+d*转发数)*新鲜度;
新鲜度=1/(最近无播放天数+1);
当前配置的参数a,b,c,d分别为100、5、3、2。
最近播放日期以 end_time-结束观看时间 为准,假设为T,则最近一个月按 [T-29, T] 闭区间统计。
当天日期使用视频中最大的end_time
结果中热度保留为整数,并按热度降序排序。
期望结果如下:
video_id <int> (视频id) | heat <decimal(16,2)> (热度) |
---|---|
2002 | 80.36 |
2001 | 20.33 |
解题思路
1、先计算热度因子
select video_id, sum((cast(if_like as int))) if_like, sum((cast(if_retweet as int))) if_retweet, sum(if(comment_id is null, 0, 1)) if_comment from user_video_log group by video_id
2、每个视频看完用户
select vi.video_id, sum(if(unix_timestamp(end_time) - unix_timestamp(start_time) = duration, 1, 0)) / sum(1) uv_rate from video_info vi join user_video_log uv on vi.video_id = uv.video_id group by vi.video_id
参考SQL
select video_id,heat from (select video_id, hot heat, rank() over (partition by video_id order by hot desc ) rn from (select t.video_id, (ceil(100 * uv_rate) + 5 * if_like + 3 * if_comment + 2 * if_retweet) / (new_date + 1) hot from (select u1.video_id, datediff(this_date, date_format(max(end_time), 'yyyy-MM-dd')) new_date from user_video_log uv join (select video_id, date_format(max(end_time), 'yyyy-MM-dd') this_date from user_video_log group by video_id) u1 on uv.video_id = u1.video_id group by u1.video_id, this_date) t join (select video_id, sum((cast(if_like as int))) if_like, sum((cast(if_retweet as int))) if_retweet, sum(if(comment_id is null, 0, 1)) if_comment from user_video_log group by video_id) t1 on t.video_id = t1.video_id join (select vi.video_id, sum(if(unix_timestamp(end_time) - unix_timestamp(start_time) = duration, 1, 0)) / sum(1) uv_rate from video_info vi join user_video_log uv on vi.video_id = uv.video_id group by vi.video_id) t2 on t.video_id = t2.video_id) t) i where rn <= 3;