Impala & Hive 使用复杂数据类型
1. 环境
CDH 5.16.1
2. Hive 使用复杂数据类型
2.1 数据格式
1 zhangsan:man football,basketball
2 lisi:female sing,dance
2.2 Hive 建表
create table studentInfo(
id int,
info map<string,string> comment 'map<姓名,性别>',
favorite array<string> comment 'array[football,basketball]'
)
row format delimited fields terminated by '\t' --列分隔符
collection items terminated by ',' --array中各个item之间的分隔符
map keys terminated
by ':' --map中key和value之间的分隔符
lines terminated by '\n'; --行分隔符
2.3 导入数据
load data local inpath '/opt/module/jobs/student.txt' into table studentInfo;
2.3 执行查询
select * from studentInfo;
+-----------------+---------------------+----------------------------+--+
| studentinfo.id | studentinfo.info | studentinfo.favorite |
+-----------------+---------------------+----------------------------+--+
| 1 | {"zhangsan":"man"} | ["football","basketball"] |
| 2 | {"lisi":"female"} | ["sing","dance"] |
+-----------------+---------------------+----------------------------+--+
-- 对于map查询,map[key]
--对于array查询,array[index]
select id, info['zhangsan'],favorite[1] from studentInfo;
+-----+-------+-------------+--+
| id | sex | favorite |
+-----+-------+-------------+--+
| 1 | man | basketball |
| 2 | NULL | dance |
+-----+-------+-------------+--+
3. Impala 使用复杂类型
注意:Impala 只用parquet格式存储时,才能使用复杂数据类型
3.1 Hive中建表(parquet格式,导入数据
create table student_parquet(
id int,
info map<string,string> comment 'map<姓名,性别>',
favorite array<string> comment 'array[football,basketball]'
)
stored as parquet
insert overwrite table student_parquet select id,info,favorite from studentInfo;
3.2 刷新impala元数据
refresh default.student_parquet;
3.3 执行查询
select
id ,favorite_array.item,info_map.key,info_map.value
from student_parquet,
student_parquet.info as info_map,
student_parquet.favorite as favorite_array;
+----+------------+----------+--------+
| id | item | key | value |
+----+------------+----------+--------+
| 1 | football | zhangsan | man |
| 1 | basketball | zhangsan | man |
| 2 | sing | lisi | female |
| 2 | dance | lisi | female |
+----+------------+----------+--------+
select
id ,favorite_array.item
from student_parquet,
student_parquet.info as info_map,
student_parquet.favorite as favorite_array
where favorite_array.POS = 0;
+----+----------+
| id | item |
+----+----------+
| 1 | football |
| 2 | sing |
+----+----------+
select
id ,favorite_array.item,info_map.value
from student_parquet,
student_parquet.info as info_map,
student_parquet.favorite as favorite_array
where favorite_array.item = 'sing'
and info_map.key = 'lisi';
+----+------+--------+
| id | item | value |
+----+------+--------+
| 2 | sing | female |
+----+------+--------+
总结:
-
array 类型视为 一张表, 其列名为 item
-
map类型有两个列, 一个是key, 一个是value
参考:
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 为什么说在企业级应用开发中,后端往往是效率杀手?
· 用 C# 插值字符串处理器写一个 sscanf
· Java 中堆内存和栈内存上的数据分布和特点
· 开发中对象命名的一点思考
· .NET Core内存结构体系(Windows环境)底层原理浅谈
· 为什么说在企业级应用开发中,后端往往是效率杀手?
· DeepSeek 解答了困扰我五年的技术问题。时代确实变了!
· 本地部署DeepSeek后,没有好看的交互界面怎么行!
· 趁着过年的时候手搓了一个低代码框架
· 推荐一个DeepSeek 大模型的免费 API 项目!兼容OpenAI接口!