hive—尽量少用表连接JOIN,多用UNION ALL+group by +计算函数
hive表连接没有SQL强,所以hive多构造大宽表,而不是,多个小表之间的表连接。
hive表连接 join可以用,但是,效率低。
下面,举一个可以用UNION ALL+group by +计算函数,代替表连接的例子。
- 需求:2019年每个用户的支付和退款金额汇总
--union all
select a.user_name,
sum(a.total_amount),
sum(a.refund_amount)
from
(select user_name,
sum(pay_amount) total_amount,
0 as refund_amount
from user_trade
where year(dt)=2019
group by user_name
union all
select user_name,
0 as total_amount,
sum(refund_amount) refund_amount
from user_refund
where year(dt)=2019
group by user_name)a
group by a.user_name;
-- full join(表连接也可以,但是效率低)
select coalesce(a.user_name,b.user_name),
if(a.total_amount is null, 0,a.total_amount),
if(b.refund_amount is null,0,b.refund_amount)
from
(select user_name,
sum(pay_amount) total_amount,
0 as refund_amount
from user_trade
where year(dt)=2019
group by user_name)a
full join
(select user_name,
0 as total_amount,
sum(refund_amount) refund_amount
from user_refund
where year(dt)=2019
group by user_name)b
on a.user_name=b.user_name;
PS:解释一下coalesce()函数
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 地球OL攻略 —— 某应届生求职总结
· 周边上新:园子的第一款马克杯温暖上架
· Open-Sora 2.0 重磅开源!
· 提示词工程——AI应用必不可少的技术
· .NET周刊【3月第1期 2025-03-02】