Impala & Hive 使用复杂数据类型
1. 环境
CDH 5.16.1
2. Hive 使用复杂数据类型
2.1 数据格式
1 zhangsan:man football,basketball
2 lisi:female sing,dance
2.2 Hive 建表
create table studentInfo(
id int,
info map<string,string> comment 'map<姓名,性别>',
favorite array<string> comment 'array[football,basketball]'
)
row format delimited fields terminated by '\t' --列分隔符
collection items terminated by ',' --array中各个item之间的分隔符
map keys terminated
by ':' --map中key和value之间的分隔符
lines terminated by '\n'; --行分隔符
2.3 导入数据
load data local inpath '/opt/module/jobs/student.txt' into table studentInfo;
2.3 执行查询
select * from studentInfo;
+-----------------+---------------------+----------------------------+--+
| studentinfo.id | studentinfo.info | studentinfo.favorite |
+-----------------+---------------------+----------------------------+--+
| 1 | {"zhangsan":"man"} | ["football","basketball"] |
| 2 | {"lisi":"female"} | ["sing","dance"] |
+-----------------+---------------------+----------------------------+--+
-- 对于map查询,map[key]
--对于array查询,array[index]
select id, info['zhangsan'],favorite[1] from studentInfo;
+-----+-------+-------------+--+
| id | sex | favorite |
+-----+-------+-------------+--+
| 1 | man | basketball |
| 2 | NULL | dance |
+-----+-------+-------------+--+
3. Impala 使用复杂类型
注意:Impala 只用parquet格式存储时,才能使用复杂数据类型
3.1 Hive中建表(parquet格式,导入数据
create table student_parquet(
id int,
info map<string,string> comment 'map<姓名,性别>',
favorite array<string> comment 'array[football,basketball]'
)
stored as parquet
insert overwrite table student_parquet select id,info,favorite from studentInfo;
3.2 刷新impala元数据
refresh default.student_parquet;
3.3 执行查询
select
id ,favorite_array.item,info_map.key,info_map.value
from student_parquet,
student_parquet.info as info_map,
student_parquet.favorite as favorite_array;
+----+------------+----------+--------+
| id | item | key | value |
+----+------------+----------+--------+
| 1 | football | zhangsan | man |
| 1 | basketball | zhangsan | man |
| 2 | sing | lisi | female |
| 2 | dance | lisi | female |
+----+------------+----------+--------+
select
id ,favorite_array.item
from student_parquet,
student_parquet.info as info_map,
student_parquet.favorite as favorite_array
where favorite_array.POS = 0;
+----+----------+
| id | item |
+----+----------+
| 1 | football |
| 2 | sing |
+----+----------+
select
id ,favorite_array.item,info_map.value
from student_parquet,
student_parquet.info as info_map,
student_parquet.favorite as favorite_array
where favorite_array.item = 'sing'
and info_map.key = 'lisi';
+----+------+--------+
| id | item | value |
+----+------+--------+
| 2 | sing | female |
+----+------+--------+
总结:
-
array 类型视为 一张表, 其列名为 item
-
map类型有两个列, 一个是key, 一个是value
参考: