SQL常见问题
本文章持续记录工作中遇到的SQL的问题,持续更新中……
SQL常见问题
一、full join导致数据量翻倍
原因:空值会导致数据重复
错误SQL:
select coalesce(a.user_id,b.user_id,c.user_id,d.user_id,e.user_id,f.user_id) as user_id
from
(select user_id from table_06)a full join
(select user_id from table_05)b on a.user_id=b.user_id full join
(select user_id from table_04)c on a.user_id=c.user_id full join
(select user_id from table_03)d on a.user_id=d.user_id full join
(select user_id from table_02)e on a.user_id=e.user_id full join
(select user_id from table_01)f on a.user_id=f.user_id
正确SQL:
select coalesce(a.user_id,b.user_id,c.user_id,d.user_id,e.user_id,f.user_id) as user_id
from
(select user_id from table_06)a full join
(select user_id from table_05)b on a.user_id=b.user_id full join
(select user_id from table_04)c on coalesce(a.user_id,b.user_id)=c.user_id full join
(select user_id from table_03)d on coalesce(a.user_id,b.user_id,c.user_id)=d.user_id full join
(select user_id from table_02)e on coalesce(a.user_id,b.user_id,c.user_id,d.user_id)=e.user_id full join
(select user_id from table_01)f on coalesce(a.user_id,b.user_id,c.user_id,d.user_id,e.user_id)=f.user_id
二、left join 导致broadcast/mapjoin失效
原因:broadcast/mapjoin不经过reduce,读取文件后直接就会产生结果
小表有的key,left过程中不知道怎么处理。只能sortmerge join
错误SQL:
select count(1) from (
select count(1) from
(select pkg from trandw.dim_pub_app)a left join
(select gazj,pkg from trandw.dws_log_app_open_ds where dt='20220615' )b on a.pkg = b.pkg
)t ;
正确SQL:
select count(1) from (
select count(1) from
(select pkg from trandw.dim_pub_app)a inner join
(select gazj,pkg from trandw.dws_log_app_open_ds where dt='20220615' )b on a.pkg = b.pkg
)t ;
三、sparksql常用函数
conv(a,10,2)##10进制转2进制
datediff(b,a)##两个日期间隔
get_json_object##json格式解析函数
from_unixtime(1657493999,'yyyyMMdd') ##ts转固定格式 结果是日期格式的
date_add(from_unixtime(1657493999,'yyyy-MM-dd'),1) ##ts 日期+1
to_unix_timestamp('2022-01-01','yyyy-MM-dd') ##字符串转日期 结果日期格式
weekofyear(date_add(from_unixtime(to_unix_timestamp('2022-01-01','yyyy-MM-dd'),'yyyy-MM-dd'),4)); ##上三个联合应用,最后加一个周函数
搬砖多年终不得要领,遂载源码看之望得真经。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?