Hive学习小记-（12）横表与纵表的互相转换***

需求说明：这是一个横表转纵表与纵表转横表的故事，有点类似行列转换行转列：一个字段的多行数据合进一个列,通常可用collect_set+concat_ws；列转行：一个字段的一列数据拆到多个行，通常用explode

横表转纵表：

1.原横表数据：

cust_id1,jijin_bal,baoxian_bal,cunkuan_bal

转成纵表目标数据：

cust_id1,基金,bal

cust_id1,保险,bal

cust_id1,存款,bal

方法：concat_ws+lateral view explode +split --算是列转行？？其实是相当于把横表变成纵表参考：https://www.cnblogs.com/foolangirl/p/14145147.html

纵表转横表

2.原纵表数据：

cust_id1,基金,bal

cust_id1,保险,bal

cust_id1,存款,bal

转成目标横表数据：

cust_id1,jijin_bal,baoxian_bal,cunkuan_bal

方法一：case when

方法二：先转map:cust1,基金：bal,保险：bal,存款：bal ；再inline

转map参考：https://www.jianshu.com/p/02c2b8906893

explode inline参考：https://blog.csdn.net/huobumingbai1234/article/details/80559944) ！！！注意explode和inline的map类型

import pyspark
from pyspark.sql import SparkSession
​
sc=SparkSession.builder.master("local")\
    .appName('hive_col_row')\
    .config('spark.executor.memory','2g')\
    .config('spark.driver.memory','2g')\
    .enableHiveSupport()\
    .getOrCreate()
sc.sql(''' create table test_youhua.zongbiao(id int,prod_nm string,bal float) ''')
sc.sql(''' insert overwrite table test_youhua.zongbiao values(1,'jijin',1.1),(1,'baoxian',1.2),(1,'cunkuan',1.3),(2,'jijin',2.67),(2,'baoxian',2.34),(2,'cunkuan',2.1) ''')
sc.sql(''' select * from test_youhua.zongbiao ''').show()
+---+-------+----+
| id|prod_nm| bal|
+---+-------+----+
|  1|  jijin| 1.1|
|  1|baoxian| 1.2|
|  1|cunkuan| 1.3|
|  2|  jijin|2.67|
|  2|baoxian|2.34|
|  2|cunkuan| 2.1|
+---+-------+----+

方法一：case when纵表转横表

sc.sql(''' select id
           ,max(case when prod_nm='jijin' then bal else 0 end) as jijin_bal
           ,max(case when prod_nm='baoxian' then bal else 0 end) as baoxian_bal
           ,max(case when prod_nm='cunkuan' then bal else 0 end) as cunkuan_bal 
           from test_youhua.zongbiao group by id ''').show()

+---+---------+-----------+-----------+
| id|jijin_bal|baoxian_bal|cunkuan_bal|
+---+---------+-----------+-----------+
|  1|      1.1|        1.2|        1.3|
|  2|     2.67|       2.34|        2.1|
+---+---------+-----------+-----------+

方法二：先转map:cust1,基金：bal,保险：bal,存款：bal ；再inline转map参考：https://blog.csdn.net/huobumingbai1234/article/details/80559944inline参考：https://blog.csdn.net/weixin_42003671/article/details/88132666inline不支持map：https://www.jianshu.com/p/02c2b8906893   
lateral view inline与 lateral view explode功能类似！！！注意explode和inline的map类型sc.sql(''' select id
            ,str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string)))))
        from test_youhua.zongbiao 
        group by id ''').show()
运行结果，已经转换成了map格式：
1   {"jijin":"1.1","baoxian":"1.2","cunkuan":"1.3"}
2   {"jijin":"2.67","baoxian":"2.34","cunkuan":"2.1"}

# 这个不行啊，首先inline作用于struct这里map操作不了，explode和inline都是列转行函数，都是将map字段打散开的，相当于把map又做成纵表了sc.sql(''' select map_tmp_tbl.id,c1,c2 from  (
    select id
          ,str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))) as map_col
    from test_youhua.zongbiao 
        group by id
) as map_tmp_tbl  lateral view explode(map_tmp_tbl.map_col) t1 as c1,c2  ''').show()
+---+-------+----+
| id|     c1|  c2|
+---+-------+----+
|  1|  jijin| 1.1|
|  1|cunkuan| 1.3|
|  1|baoxian| 1.2|
|  2|cunkuan| 2.1|
|  2|baoxian|2.34|
|  2|  jijin|2.67|
+---+-------+----+

直接select map_col不就OK了？？转成横表成功！！

sc.sql(''' select map_tmp_tbl.id
                 ,map_col['jijin'] as jijin_bal
                 ,map_col['baoxian'] as baoxian_bal
                 ,map_col['cunkuan'] as cunkuan_bal from  (
    select id
          ,str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))) as map_col
    from test_youhua.zongbiao 
        group by id
) as map_tmp_tbl ''').show()

+---+---------+-----------+-----------+
| id|jijin_bal|baoxian_bal|cunkuan_bal|
+---+---------+-----------+-----------+
|  1|      1.1|        1.2|        1.3|
|  2|     2.67|       2.34|        2.1|
+---+---------+-----------+-----------+

原来最好存map类型是这个意思，后面想转横表区直接取map的key对应的value可以转横表，想转纵表可以用explode转纵表 eg: 这里的map_tmp_tbl就是存成了map类型，可以看出由explode可以转纵表，直接取map对应key的value值可以转横表

tips

前面转map用了： str_to_map(concat_ws(',',collect_set(concat_ws(':',prod_nm,cast(bal as string))))) 这里需要注意：collect_ws 这个函数会对元素数组去重，如果不去重用 collect_list 这个函数 str_to_map 函数也会去重，如果传入的键值对有重复，只保留一个。如果还是要传入重复的键值对，可以用下面的函数： regexp_replace(concat('{"',cast( concat_ws(',',collect_list(concat_ws(':',prod_nm,cast(bal as string)))) as string),'"}'),',','","') as map_col

但这时候是一个json串，而不是map!!!

sc.sql(''' select id
          ,regexp_replace(regexp_replace(concat('{"',concat_ws(',',collect_list(concat_ws(':',prod_nm,cast(bal as string)))),'"}'),',','","'),':','":"') as map_col
    from test_youhua.zongbiao group by id ''').show()

-- 跑出来的结果，但是这时候不是map格式，要用json串的形式来读：
1   {"jijin":"1.1","baoxian":"1.2","cunkuan":"1.3"}
2   {"jijin":"2.67","baoxian":"2.34","cunkuan":"2.1"}

sc.sql(''' select map_tmp_tbl.id
                 ,get_json_object(map_col,'$.jijin')
                 ,get_json_object(map_col,'$.baoxian')
                 ,get_json_object(map_col,'$.cunkuan') from  (
    select id
          ,regexp_replace(regexp_replace(concat('{"',concat_ws(',',collect_list(concat_ws(':',prod_nm,cast(bal as string)))),'"}'),',','","'),':','":"') as map_col
    from test_youhua.zongbiao 
            group by id
) as map_tmp_tbl ''').show()

+---+---------------------------------+-----------------------------------+-----------------------------------+
| id|get_json_object(map_col, $.jijin)|get_json_object(map_col, $.baoxian)|get_json_object(map_col, $.cunkuan)|
+---+---------------------------------+-----------------------------------+-----------------------------------+
|  1|                              1.1|                                1.2|                                1.3|
|  2|                             2.67|                               2.34|                                2.1|
+---+---------------------------------+-----------------------------------+-----------------------------------+

通过解析json串的形式也可以！

参考：Hive学习小记-（5）表字段变动频繁时用json格式

那其实提前存成map格式的好处是比较多的，一是加字段方便，而是横表纵表转换方便。

posted @ 2021-01-11 22:16 foolangirl 阅读(1984) 评论(0) 编辑收藏举报

刷新页面返回顶部

x_lulu