hive posexplode 函数的使用举例

问题1、在不借助其它表的情况下输出结果如下所示：

1
2
3
4
5
...

49
50

解题思路：首先借助 space(49) 函数生成49个空格，然后用split去切分为49个空白数组，这样依赖，可以使用侧写函数对其进行处理陈带有序号的空白，这里的序号是我们要用到的

答案

select id + start_num is_id, *
from (select 1  start_num,
             50 end_num) t lateral view posexplode(split(space(50 - 1), '')) t as id, val;

另一种形式是：

select row_number() over () as id
from (select split(replace(space(49), '', 'A'), ' ') as x) t lateral view explode(x) t as val

问题2：获取JSON 数组的key

[{"title":"笑傲江湖","author":"金庸"},{"title":"流星蝴蝶剑","author":"古龙"}]

有这样一个JSON，需要获取key

实现：

1、首先将 json 数组转换为字符串；这里使用字符串替换函数 translate

select translate('[{"title":"笑傲江湖","author":"金庸"},{"title":"流星蝴蝶剑","author":"古龙"}]', '[]{}""', '') json_str;

输出

title:笑傲江湖,author:金庸,title:流星蝴蝶剑,author:古龙

2、使用 regexp_replace 将中间的，替换成；

select regexp_replace(translate('[{"title":"笑傲江湖","author":"金庸"},{"title":"流星蝴蝶剑","author":"古龙"}]', '[]{}""', ''), '\,',
                      '\:') json_str;

输出

title:笑傲江湖:author:金庸:title:流星蝴蝶剑:author:古龙

这个时候就可以使用 posexplode 函数，侧写出带有序号的数据了

select *
from (select regexp_replace(translate('[{"title":"笑傲江湖","author":"金庸"},{"title":"流星蝴蝶剑","author":"古龙"}]', '{}[]', ''),
                            '\,',
                            '\:') json) t lateral view posexplode(split(json, ':')) t1 as id, val;

结果

"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    0    """title"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    1    """笑傲江湖"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    2    """author"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    3    """金庸"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    4    """title"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    5    """流星蝴蝶剑"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    6    """author"""
"""title"":""笑傲江湖"":""author"":""金庸"":""title"":""流星蝴蝶剑"":""author"":""古龙"""    7    """古龙"""

此时，我们只要对数据过滤一下即可

select val
from (select regexp_replace(translate('[{"title":"笑傲江湖","author":"金庸"},{"title":"流星蝴蝶剑","author":"古龙"}]', '{}[]', ''),
                            '\,',
                            '\:') json) t lateral view posexplode(split(json, ':')) t1 as id, val
where id % 2 = 0
group by val;

最终结果：

"""author"""
"""title"""

问题3：

有如下一组数据

令狐冲,依琳,田伯光,东方    华山,衡山,黄沙,黑木崖
方正,依琳,田伯光,东方    少林,衡山,黄沙,黑木崖

希望得到一下结果：

令狐冲    华山
依琳    衡山
田伯光    黄沙
东方    黑木崖
方正    少林
依琳    衡山
田伯光    黄沙
东方    黑木崖

思路：使用两次 posexplode 函数进行侧写，两次结果需要限制序号相等

select name_1,part_1
from (select "令狐冲,依琳,田伯光,东方" name, '华山,衡山,黄沙,黑木崖' party
      union all
      select "方正,依琳,田伯光,东方", '少林,衡山,黄沙,黑木崖') t
         lateral view posexplode(split(name, ',')) t1 as id_1, name_1
         lateral view posexplode(split(party, ',')) t2 as id_2, part_1
where id_1 = id_2;

posted @ 2022-07-29 14:25 晓枫的春天阅读(842) 评论(0) 收藏举报

刷新页面返回顶部

滴水穿石不是靠力，而是因为不舍昼夜。

hive posexplode 函数的使用举例

问题1、在不借助其它表的情况下输出结果如下所示：

问题2：获取JSON 数组的key

问题3：

公告