HiveSQL——打折日期交叉问题
需求描述
如下为平台商品促销数据:字段为品牌,打折开始日期,打折结束日期
create table temp_date_link ( brand_id varchar(10), start_date string, end_date string ) row format delimited fields terminated by '\t' stored as orc tblproperties ("orc.compress" = "snappy"); insert into temp_date_link(brand_id, start_date, end_date) values ('oppo', '2021-06-05', '2021-06-09'), ('oppo', '2021-06-11', '2021-06-21'), ('vivo', '2021-06-05', '2021-06-15'), ('vivo', '2021-06-09', '2021-06-21'), ('redmi', '2021-06-05', '2021-06-21'), ('redmi', '2021-06-09', '2021-06-15'), ('redmi', '2021-06-17', '2021-06-26'), ('huawei', '2021-06-05', '2021-06-26'), ('huawei', '2021-06-09', '2021-06-15'), ('huawei', '2021-06-17', '2021-06-21');
计算每个品牌总的打折销售天数,注意其中的交叉日期,比如vivo品牌,第一次活动时间为2021-06-05到2021-06-15,第二次活动时间为2021-06-09到2021-06-21其中9号到15号为重复天数,只统计一次,即vivo总打折天数为2021-06-05到2021-06-21共计17天。
实现方案
1、将当前行之前的最大的结束时间放到当前行
select brand_id, start_date, end_date, max(end_date) over (partition by brand_id order by start_date rows between unbounded preceding and 1 preceding) maxEndDate from temp_date_link; oppo 2021-06-05 2021-06-09 oppo 2021-06-11 2021-06-21 2021-06-09 redmi 2021-06-05 2021-06-21 redmi 2021-06-09 2021-06-15 2021-06-21 redmi 2021-06-17 2021-06-26 2021-06-21 vivo 2021-06-05 2021-06-15 vivo 2021-06-09 2021-06-21 2021-06-15 huawei 2021-06-05 2021-06-26 huawei 2021-06-09 2021-06-15 2021-06-26 huawei 2021-06-17 2021-06-21 2021-06-26
2、比较开始时间和移动下来的数据,如果开始时间大,则不需要操作,否则需要将移动下来的数据 +1 ,替换当前的开始时间,如果是第一行数据的 maxEndDate 为 null ,不需要操作
select brand_id, start_date, if(maxEndDate is null, start_date, if(start_date > maxEndDate, start_date, date_add(maxEndDate, 1))) start_date_new, end_date from (select brand_id, start_date, end_date, max(end_date) over (partition by brand_id order by start_date rows between unbounded preceding and 1 preceding) maxEndDate from temp_date_link) t; oppo 2021-06-05 2021-06-05 2021-06-09 oppo 2021-06-11 2021-06-11 2021-06-21 redmi 2021-06-05 2021-06-05 2021-06-21 redmi 2021-06-09 2021-06-22 2021-06-15 redmi 2021-06-17 2021-06-22 2021-06-26 vivo 2021-06-05 2021-06-05 2021-06-15 vivo 2021-06-09 2021-06-16 2021-06-21 huawei 2021-06-05 2021-06-05 2021-06-26 huawei 2021-06-09 2021-06-27 2021-06-15 huawei 2021-06-17 2021-06-27 2021-06-21
3、将每行的结束时间减去开始时间
select brand_id, datediff(end_date, start_date_new) day_nums from (select brand_id, start_date, if(maxEndDate is null, start_date, if(start_date > maxEndDate, start_date, date_add(maxEndDate, 1))) start_date_new, end_date from (select brand_id, start_date, end_date, max(end_date) over (partition by brand_id order by start_date rows between unbounded preceding and 1 preceding) maxEndDate from temp_date_link) t) t; oppo 4 oppo 10 redmi 16 redmi -7 redmi 4 vivo 10 vivo 5 huawei 21 huawei -12 huawei -6
4、分组,求和
select brand_id, sum(if(day_nums >= 0, day_nums + 1, 0)) day_nums from (select brand_id, datediff(end_date, start_date_new) day_nums from (select brand_id, start_date, if(maxEndDate is null, start_date, if(start_date > maxEndDate, start_date, date_add(maxEndDate, 1))) start_date_new, end_date from (select brand_id, start_date, end_date, max(end_date) over (partition by brand_id order by start_date rows between unbounded preceding and 1 preceding) maxEndDate from temp_date_link) t) t) t group by brand_id; vivo 17 oppo 16 redmi 22 huawei 22