Hive 刷题——查询有新注册用户的当天的新用户数量、新用户的第一天留存率
需求描述
从用户登录明细表(user_login_detail)中首次登录算作当天新增,第二天也登录了算作一日留存
结果如下:
first_login
(注册时间)
|
register
(新增用户数)
|
retention
<decimal(16,2)>
(留存率)
|
2021-09-21
|
1
|
0.00
|
2021-09-22
|
1
|
0.00
|
2021-09-23
|
1
|
0.00
|
2021-09-24
|
1
|
0.00
|
2021-09-25
|
1
|
0.00
|
2021-09-26
|
1
|
0.00
|
2021-09-27
|
1
|
0.00
|
2021-10-04
|
2
|
0.50
|
2021-10-06
|
1
|
0.00
|
需要用到的表:
用户登录明细表:user_login_detail
user_id(用户id)
|
ip_address(ip地址)
|
login_ts(登录时间)
|
logout_ts(登出时间)
|
101
|
180.149.130.161
|
2021-09-21 08:00:00
|
2021-09-27 08:30:00
|
102
|
120.245.11.2
|
2021-09-22 09:00:00
|
2021-09-27 09:30:00
|
103
|
27.184.97.3
|
2021-09-23 10:00:00
|
2021-09-27 10:30:00
|
参考SQL
select login_date first_login, cast(register as string) register, cast(relonin / register as decimal(16, 2)) retention from (select t.login_date, count(t.user_id) register, sum(if(datediff(t1.login_date, t.login_date) = 1, 1, 0)) relonin from (select user_id, login_date from (select user_id, login_date, rank() over (partition by user_id order by login_date) rk from (select user_id, date(login_ts) login_date from user_login_detail group by user_id, date(login_ts)) i) i where rk = 1) t left join (select user_id, login_date from (select user_id, login_date, rank() over (partition by user_id order by login_date) rk from (select user_id, date(login_ts) login_date from user_login_detail group by user_id, date(login_ts)) i) i where rk = 2) t1 on t.user_id = t1.user_id group by t.login_date) i;