数仓sql场景:连续跟踪问题

1.需求

查询用户连续追踪客户的开始时间和结束时间,以下两种场景都需支持
第一种场景:
连续:中间没有断,如a追踪客户小明,第1次开始为2023-01-01,结束时间为:2023-01-03;第2次开始为2023-01-04,结束时间为:2023-01-07,则追踪时间为开始时间为2023-01-01,结束时间为:2023-01-07
不连续:中间有段,如a追踪客户小明,第1次开始为2023-01-01,结束时间为:2023-01-03;第2次开始为2023-01-06,结束时间为:2023-01-07,则追踪时间也为开始时间为2023-01-01,结束时间为:2023-01-07

2.应用场景

销售跟踪场景,电商关注收藏场景

3.示例数据

with a as
(select 'a' as id,'小明' as name,'2023-01-01' as start_date,'2023-01-03' as end_date 
union all
select 'a' as id,'小明' as name,'2023-01-04' as start_date,'2023-01-07' as end_date 
union all
select 'a' as id,'小明' as name,'2023-01-07' as start_date,'2023-01-11' as end_date 
union all
select 'b' as id,'小明' as name,'2023-01-12' as start_date,'2023-01-17' as end_date 
union all
select 'b' as id,'小明' as name,'2023-01-18' as start_date,'2023-01-21' as end_date 
union all
select 'b' as id,'小明' as name,'2023-01-22' as start_date,'2023-01-25' as end_date 
union all
select 'a' as id,'小明' as name,'2023-01-26' as start_date,'2023-01-30' as end_date
union all
select 'a' as id,'小明' as name,'2023-01-31' as start_date,'2023-02-03' as end_date)

4.sql实现

hivesql实现,在仓内实现结果直接应用

with a as
(select 'a' as id,'小明' as name,'2023-01-01' as start_date,'2023-01-03' as end_date 
union all
select 'a' as id,'小明' as name,'2023-01-04' as start_date,'2023-01-07' as end_date 
union all
select 'a' as id,'小明' as name,'2023-01-07' as start_date,'2023-01-11' as end_date 
union all
select 'b' as id,'小明' as name,'2023-01-12' as start_date,'2023-01-17' as end_date 
union all
select 'b' as id,'小明' as name,'2023-01-18' as start_date,'2023-01-21' as end_date 
union all
select 'b' as id,'小明' as name,'2023-01-22' as start_date,'2023-01-25' as end_date 
union all
select 'a' as id,'小明' as name,'2023-01-26' as start_date,'2023-01-30' as end_date
union all
select 'a' as id,'小明' as name,'2023-01-31' as start_date,'2023-02-03' as end_date)

select b.id,b.name,min(b.start_date),max(b.end_date) from 
(select *,row_number() over(order by start_date) -row_number() over(partition by id order by start_date) as rn from a) b group by b.id,b.name,b.rn;

连续查询结果:

不连续查询结果:

5.总结

这个场景题大部分数据开发者会因为连续和不连续的两个场景想复杂,需要多思考
posted @ 2024-06-20 17:49  技术即艺术  阅读(4)  评论(0编辑  收藏  举报