数仓sql场景:迭代求结果问题

1.需求

2.应用场景

简单的线性预测销售额,比如三月份累计销售额应该达到2a+b,实际a+b+c看看差距

3.sql实现

这道题先需要去分析结果集,本质上是一个迭代累加的过程,先要得到如下结果

如果在面试数仓中实现了以上结果,基本上面试官会很通过,也在短时间内可以实现,实现sql如下
with tb as (
select 1 as s,'a' as pv
union all
select 2 as s,'b' as pv
union all
select 3 as s,'c' as pv
union all
select 4 as s,'d' as pv
union all
select 5 as s,'e' as pv
union all
select 6 as s,'f' as pv
)

select s,pv,concat_ws('+',collect_list(re1) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) from 
(select *,concat_ws('+',collect_list(pv) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as re1 from tb) tb1;
首先开窗将pv分组连接起来作为一个结果集,然后再将结果集分组连接起来。核心知识点:collect_list,concat_ws,开窗函数及开窗中的ROWS BETWEEN。其中`ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`为开始行到当前行

另一种实现方式,通过不断解析实现,分组实现,这种实现主要考察对sql的深度理解和组装,实现起来还是比较复杂的,sql如下
with tb as (
select '1' as s,'a' as pv
union all
select '2' as s,'b' as pv
union all
select '3' as s,'c' as pv
union all
select '4' as s,'d' as pv
union all
select '5' as s,'e' as pv
union all
select '6' as s,'f' as pv
),
tb1 as 
(select *,
concat_ws(',',collect_list(pv) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as pv_n,
reverse(concat_ws(',',collect_list(s) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))) as s_n 
from tb),
tb2 as 
(select s,pv,pv_n,s_n from tb1 
lateral view explode(SPLIT(pv_n,',')) tb1 as pv_n),
tb3 as 
(select *,row_number() over(partition by s order by pv_n) as rn from tb2)
select s,pv,concat_ws('+',collect_list(concat(pv_n,'*',split(s_n,',')[rn-1]))) as res from tb3 group by s,pv order by s;
该种实现方式包含的知识点比较多,包括collect_list,concat_ws,lateral view explode,开窗函数。其中将开窗出来的值作为切割出来的数组下标来使用这种思路在实际中不容易想到。但这个sql本质上有隐患,其中reverse函数如果s字段超过个位数后存在问题,当做练sql吧 实际中要多想想

上图中,要取pv_n*s_n[rn-1],然后再将结果连接起来

然后再分组连接起来

posted @ 2024-08-02 17:10  技术即艺术  阅读(8)  评论(0编辑  收藏  举报