hive—尽量少用表连接JOIN，多用UNION ALL+group by +计算函数

hive表连接没有SQL强，所以hive多构造大宽表，而不是，多个小表之间的表连接。
hive表连接 join可以用，但是，效率低。
下面，举一个可以用UNION ALL+group by +计算函数，代替表连接的例子。

- 需求：2019年每个用户的支付和退款金额汇总

--union all
select a.user_name,
sum(a.total_amount),
sum(a.refund_amount)
from
(select user_name,
sum(pay_amount) total_amount,
0 as refund_amount
from user_trade
where year(dt)=2019
group by user_name
union all
select user_name,
0 as total_amount,
sum(refund_amount) refund_amount
from user_refund
where year(dt)=2019
group by user_name)a
group by a.user_name;

-- full join（表连接也可以，但是效率低）

select coalesce(a.user_name,b.user_name),
if(a.total_amount is null, 0,a.total_amount),
if(b.refund_amount is null,0,b.refund_amount)
from
(select user_name,
sum(pay_amount) total_amount,
0 as refund_amount
from user_trade
where year(dt)=2019
group by user_name)a
full join
(select user_name,
0 as total_amount,
sum(refund_amount) refund_amount
from user_refund
where year(dt)=2019
group by user_name)b
on a.user_name=b.user_name;

PS：解释一下coalesce（）函数

posted @ 2022-01-14 16:16 马踏飞燕Beautiful 阅读(1420) 评论(0) 编辑收藏举报

刷新页面返回顶部

马踏飞燕Beautiful

优雅、明确、简单——喜欢的就争取、得到的就珍惜、失去了就忘记

hive—尽量少用表连接JOIN，多用UNION ALL+group by +计算函数

公告