Hive 连续登录问题

需求说明

有以下一个表,记录了用户登录数据

desc temp_user_login;
day_id    string
user_id    int
if_login    int
select* from temp_user_login order by user_id,day_id;
2022-03-01    1    1
2022-03-01    2    0
2022-03-01    3    0
2022-03-02    1    1
2022-03-02    2    1
2022-03-02    3    0
2022-03-03    1    1
2022-03-03    2    0
2022-03-03    3    1
2022-03-04    1    1
2022-03-04    2    0
2022-03-04    3    0
2022-03-05    1    1
2022-03-05    2    1
2022-03-05    3    1

现在要求取出 连续登录3天以上的用户明细

实现逻辑

这个问题可以使用等差数列思想来解决

1、先取出登录系统的用户信息:登录用户,登录日期,按照用户分组,排序

select user_id, day_id, row_number() over (partition by user_id order by day_id desc ) rn
from temp_user_login
where if_login = 1
order by 1, 2;
1    2022-03-01    5
1    2022-03-02    4
1    2022-03-03    3
1    2022-03-04    2
1    2022-03-05    1
2    2022-03-02    2
2    2022-03-05    1
3    2022-03-03    2
3    2022-03-05    1

2、使用rn构建等差数列

select user_id, day_id, rn, date_sub(day_id,  rn) interval_day
from (select user_id, day_id, row_number() over (partition by user_id order by day_id   ) rn
      from temp_user_login
      where if_login = 1) t
order by 1, 2;
1    2022-03-01    1    2022-02-28
1    2022-03-02    2    2022-02-28
1    2022-03-03    3    2022-02-28
1    2022-03-04    4    2022-02-28
1    2022-03-05    5    2022-02-28
2    2022-03-02    1    2022-03-01
2    2022-03-05    2    2022-03-03
3    2022-03-03    1    2022-03-02
3    2022-03-05    2    2022-03-03

3、聚合分组

select user_id
from (select user_id, date_sub(day_id,  rn) interval_day
      from (select user_id, day_id, row_number() over (partition by user_id order by day_id   ) rn
            from temp_user_login
            where if_login = 1) t) t
group by user_id, interval_day
having count(1) >= 3
order by 1
1

这样既可以实现连续登录问题了

posted @ 2022-12-27 12:33  晓枫的春天  阅读(60)  评论(0编辑  收藏  举报