Hive 连续登录问题
需求说明
有以下一个表,记录了用户登录数据
desc temp_user_login; day_id string user_id int if_login int select* from temp_user_login order by user_id,day_id; 2022-03-01 1 1 2022-03-01 2 0 2022-03-01 3 0 2022-03-02 1 1 2022-03-02 2 1 2022-03-02 3 0 2022-03-03 1 1 2022-03-03 2 0 2022-03-03 3 1 2022-03-04 1 1 2022-03-04 2 0 2022-03-04 3 0 2022-03-05 1 1 2022-03-05 2 1 2022-03-05 3 1
现在要求取出 连续登录3天以上的用户明细
实现逻辑
这个问题可以使用等差数列思想来解决
1、先取出登录系统的用户信息:登录用户,登录日期,按照用户分组,排序
select user_id, day_id, row_number() over (partition by user_id order by day_id desc ) rn from temp_user_login where if_login = 1 order by 1, 2; 1 2022-03-01 5 1 2022-03-02 4 1 2022-03-03 3 1 2022-03-04 2 1 2022-03-05 1 2 2022-03-02 2 2 2022-03-05 1 3 2022-03-03 2 3 2022-03-05 1
2、使用rn构建等差数列
select user_id, day_id, rn, date_sub(day_id, rn) interval_day from (select user_id, day_id, row_number() over (partition by user_id order by day_id ) rn from temp_user_login where if_login = 1) t order by 1, 2; 1 2022-03-01 1 2022-02-28 1 2022-03-02 2 2022-02-28 1 2022-03-03 3 2022-02-28 1 2022-03-04 4 2022-02-28 1 2022-03-05 5 2022-02-28 2 2022-03-02 1 2022-03-01 2 2022-03-05 2 2022-03-03 3 2022-03-03 1 2022-03-02 3 2022-03-05 2 2022-03-03
3、聚合分组
select user_id from (select user_id, date_sub(day_id, rn) interval_day from (select user_id, day_id, row_number() over (partition by user_id order by day_id ) rn from temp_user_login where if_login = 1) t) t group by user_id, interval_day having count(1) >= 3 order by 1 1
这样既可以实现连续登录问题了