Hive 刷题 查询首次登录系统后第二天登录系统的用户数占总用户数的百分比
需求描述
从用户登录信息表(temp_user_login)中查询首次登录后第二天仍然登录的用户占所有用户的比例,结果保留2位小数,使用百分数显示,
原始数据
desc temp_user_login; day_id string 登录日期 user_id int 用户ID if_login int 是否登录 1 是 0 否 select * from temp_user_login; 2022-03-01 1 1 2022-03-01 2 0 2022-03-01 3 0 2022-03-02 1 1 2022-03-02 2 1 2022-03-02 3 0 2022-03-03 1 1 2022-03-03 2 0 2022-03-03 3 1 2022-03-04 1 1 2022-03-04 2 0 2022-03-04 3 0 2022-03-05 1 1 2022-03-05 2 1 2022-03-05 3 1
解题思路
1、先计算总用户数,这个很 esay;
2、取出连续两天登录的用户,进行去重,进行计算即可
参考sql
select concat(cast(round(login_again / user_cnt * 100, 2) as string), '%') login_again_percent from (select sum(1) user_cnt from (select user_id from temp_user_login group by user_id) t) total_user join (select sum(1) login_again from (select t.user_id from (select user_id, day_id from (select user_id, day_id, rank() over (partition by user_id order by day_id) rk from (select user_id, day_id from temp_user_login where if_login = '1' group by user_id, day_id) t) t1 where rk = 1) t join (select user_id, day_id from (select user_id, day_id, rank() over (partition by user_id order by day_id) rk from (select user_id, day_id from temp_user_login where if_login = '1' group by user_id, day_id) t) t1 where rk = 2) t1 on t.user_id = t1.user_id and datediff(t1.day_id, t.day_id) = 1 group by t.user_id) again) again;