hive posexplode 函数的使用举例

问题1、在不借助其它表的情况下输出结果如下所示:

1
2
3
4
5
...

49
50

解题思路:首先借助  space(49)  函数生成49个空格,然后用split去切分为49个空白数组,这样依赖,可以使用侧写函数对其进行处理陈带有序号的空白,这里的序号是我们要用到的

答案

select id + start_num is_id, *
from (select 1  start_num,
             50 end_num) t lateral view posexplode(split(space(50 - 1), '')) t as id, val;

另一种形式是:

select row_number() over () as id
from (select split(replace(space(49), '', 'A'), ' ') as x) t lateral view explode(x) t as val

问题2:获取JSON 数组的key

[{"title":"笑傲江湖","author":"金庸"},{"title":"流星蝴蝶剑","author":"古龙"}]

有这样一个JSON,需要获取key

实现:

1、首先将 json 数组转换为 字符串 ;这里使用 字符串替换函数  translate 

select translate('[{"title":"笑傲江湖","author":"金庸"},{"title":"流星蝴蝶剑","author":"古龙"}]', '[]{}""', '') json_str;

 输出

title:笑傲江湖,author:金庸,title:流星蝴蝶剑,author:古龙

2、使用 regexp_replace 将中间的 , 替换成 ;

select regexp_replace(translate('[{"title":"笑傲江湖","author":"金庸"},{"title":"流星蝴蝶剑","author":"古龙"}]', '[]{}""', ''), '\,',
                      '\:') json_str;

输出

title:笑傲江湖:author:金庸:title:流星蝴蝶剑:author:古龙

这个时候就可以使用  posexplode 函数,侧写出 带有序号的数据了

select *
from (select regexp_replace(translate('[{"title":"笑傲江湖","author":"金庸"},{"title":"流星蝴蝶剑","author":"古龙"}]', '{}[]', ''),
                            '\,',
                            '\:') json) t lateral view posexplode(split(json, ':')) t1 as id, val;

结果 

"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    0    """title"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    1    """笑傲江湖"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    2    """author"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    3    """金庸"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    4    """title"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    5    """流星蝴蝶剑"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    6    """author"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    7    """古龙"""

此时,我们只要对数据过滤一下即可

select val
from (select regexp_replace(translate('[{"title":"笑傲江湖","author":"金庸"},{"title":"流星蝴蝶剑","author":"古龙"}]', '{}[]', ''),
                            '\,',
                            '\:') json) t lateral view posexplode(split(json, ':')) t1 as id, val
where id % 2 = 0
group by val;

最终结果:

"""author"""
"""title"""

问题3:

有如下一组数据

令狐冲,依琳,田伯光,东方    华山,衡山,黄沙,黑木崖
方正,依琳,田伯光,东方    少林,衡山,黄沙,黑木崖

希望得到一下结果:

令狐冲    华山
依琳    衡山
田伯光    黄沙
东方    黑木崖
方正    少林
依琳    衡山
田伯光    黄沙
东方    黑木崖

思路:使用两次 posexplode 函数进行侧写,两次结果需要限制序号相等

select name_1,part_1
from (select "令狐冲,依琳,田伯光,东方" name, '华山,衡山,黄沙,黑木崖' party
      union all
      select "方正,依琳,田伯光,东方", '少林,衡山,黄沙,黑木崖') t
         lateral view posexplode(split(name, ',')) t1 as id_1, name_1
         lateral view posexplode(split(party, ',')) t2 as id_2, part_1
where id_1 = id_2;
posted @ 2022-07-29 14:25  晓枫的春天  阅读(633)  评论(0编辑  收藏  举报