hive优化1-列转行
数据需求说明:
table1的数据是 客户id,基金差额,保险差额,理财差额,。。。
create table youhua1( -> custid int -> ,jijin_change float -> ,baoxian_change float -> ,cunkuan_change float) -> ; insert into youhua1 values(1,1.1,1.2,1.3),(2,2.67,2.34,2.1),(3,3.1,3.4,3.78); select * from youhua1; +--------+--------------+----------------+----------------+ | custid | jijin_change | baoxian_change | cunkuan_change | +--------+--------------+----------------+----------------+ | 1 | 1.1 | 1.2 | 1.3 | | 2 | 2.67 | 2.34 | 2.1 | | 3 | 3.1 | 3.4 | 3.78 | +--------+--------------+----------------+----------------+
原查询思路:
select custid ,"基金" as prod_name ,jijin_change as change from youhua1 union all select custid ,"保险" as prod_name ,baoxian_change change from youhua1 union all select custid ,"存款" as prod_name ,cunkuan_change as change from youhua1;
目标数据:客户1,基金,差额(基金)
客户1,保险,差额(保险)
...
+--------+-----------+------+ | custid | prod_name |change| +--------+-----------+------+ | 1 | 基金 | 1.1 | | 2 | 基金 | 2.67 | | 3 | 基金 | 3.1 | | 1 | 保险 | 1.2 | | 2 | 保险 | 2.34 | | 3 | 保险 | 3.4 | | 1 | 存款 | 1.3 | | 2 | 存款 | 2.1 | | 3 | 存款 | 3.78 | +--------+-----------+------+
优化方案:
当数据量很大时,这种方式会造成多次读表。
可以采用from前置的方式;
也可以先通过concat函数拼接jijin_change、baoxian_change、cunkuan_change三个字段成一列,然后再用列转行函数lateral view explode函数处理。
行转列:一个字段的多行数据合进一个列,通常可用collect_set+concat_ws;列转行:一个字段的一列数据拆到多个行,参考:https://blog.csdn.net/weixin_42003671/article/details/88132666
select custid,concat_ws(',',jijin_change,baoxian_change,cunkuan_change) from youhua1; #这是mysql处理,hive中concat_ws里类型要转string +--------+-----------------------------------------------------------+ | custid | concat_ws(',',jijin_change,baoxian_change,cunkuan_change) | +--------+-----------------------------------------------------------+ | 1 | 1.1,1.2,1.3 | | 2 | 2.67,2.34,2.1 | | 3 | 3.1,3.4,3.78 | +--------+-----------------------------------------------------------+
以下为hive处理:
select a.custid ,change from ( select custid ,concat_ws(',',cast(jijin_change as string),cast(baoxian_change as string),cast(cunkuan_change as string)) as all_change from youhua1 ) as a lateral view explode(split(a.all_change,',')) table_tmp as change;
1 1.1 1 1.2 1 1.3 2 2.67 2 2.34 2 2.1 3 3.1 3 3.4 3 3.78
发现没有加上基金、保险、存款的行标识:
可以在concat_ws之前先将基金、保险、存款的行标识和值打在一起,concat_ws("+","基金"+"jijin_change"),...
在explode时指定split(a.all_change, ','),最后select custid,split(change,"+")[0] as prod_name,split(change,"+")[1] as change
报bug,用“+”没有办法拆开,会报错,把+替换成'\+'、'/+'都不行,改换为‘-’
select a.custid ,split(change,'-')[0] as prod_name ,split(change,'-')[1] as change from ( select custid ,concat_ws(',',concat_ws('-','基金',cast(jijin_change as string)),concat_ws('-','保险',cast(baoxian_change as string)),concat_ws('-','存款',cast(cunkuan_change as string))) as all_change from youhua1 ) as a lateral view explode(split(a.all_change,',')) table_tmp as change;
1 基金 1.1 1 保险 1.2 1 存款 1.3 2 基金 2.67 2 保险 2.34 2 存款 2.1 3 基金 3.1 3 保险 3.4 3 存款 3.78
“+”号的方式:
concat_ws用“\+”,split用\\+“”
select a.custid ,split(change,'\\+')[0] as prod_name ,split(change,'\\+')[1] as change from ( select custid ,concat_ws(',',concat_ws('\+','基金',cast(jijin_change as string)),concat_ws('\+','保险',cast(baoxian_change as string)),concat_ws('\+','存款',cast(cunkuan_change as string))) as all_change from youhua1 ) as a lateral view explode(split(a.all_change,',')) table_tmp as change;