随笔- 41 文章- 0 评论- 8 阅读- 12万

Impala & Hive 使用复杂数据类型

1. 环境

CDH 5.16.1

2. Hive 使用复杂数据类型

2.1 数据格式

1       zhangsan:man    football,basketball
2       lisi:female     sing,dance

2.2 Hive 建表

create table studentInfo(
    id int,
    info map<string,string>  comment 'map<姓名，性别>',
    favorite array<string> comment 'array[football,basketball]'
)
row format delimited fields terminated by '\t'    --列分隔符
collection items terminated by ','   --array中各个item之间的分隔符
map keys terminated 
by ':'        --map中key和value之间的分隔符
lines terminated by '\n';       --行分隔符

2.3 导入数据

load data local inpath '/opt/module/jobs/student.txt' into table studentInfo;

2.3 执行查询

select *  from studentInfo;

+-----------------+---------------------+----------------------------+--+
| studentinfo.id  |  studentinfo.info   |    studentinfo.favorite    |
+-----------------+---------------------+----------------------------+--+
| 1               | {"zhangsan":"man"}  | ["football","basketball"]  |
| 2               | {"lisi":"female"}   | ["sing","dance"]           |
+-----------------+---------------------+----------------------------+--+




-- 对于map查询，map[key]
--对于array查询，array[index]
select id, info['zhangsan'],favorite[1] from studentInfo;

+-----+-------+-------------+--+
| id  |  sex  |  favorite   |
+-----+-------+-------------+--+
| 1   | man   | basketball  |
| 2   | NULL  | dance       |
+-----+-------+-------------+--+

3. Impala 使用复杂类型

注意：Impala 只用parquet格式存储时，才能使用复杂数据类型

3.1 Hive中建表（parquet格式，导入数据

create table student_parquet(
    id int,
    info map<string,string>  comment 'map<姓名，性别>',
    favorite array<string> comment 'array[football,basketball]'
)
stored as parquet

insert overwrite table student_parquet select id,info,favorite from studentInfo;

3.2 刷新impala元数据

refresh default.student_parquet;

3.3 执行查询

select 
    id ,favorite_array.item,info_map.key,info_map.value
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array；

+----+------------+----------+--------+
| id | item       | key      | value  |
+----+------------+----------+--------+
| 1  | football   | zhangsan | man    |
| 1  | basketball | zhangsan | man    |
| 2  | sing       | lisi     | female |
| 2  | dance      | lisi     | female |
+----+------------+----------+--------+




select 
    id ,favorite_array.item
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array
where favorite_array.POS = 0;

+----+----------+
| id | item     |
+----+----------+
| 1  | football |
| 2  | sing     |
+----+----------+




select 
    id ,favorite_array.item,info_map.value
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array
where favorite_array.item = 'sing'
and info_map.key = 'lisi';

+----+------+--------+
| id | item | value  |
+----+------+--------+
| 2  | sing | female |
+----+------+--------+

总结：

array 类型视为一张表, 其列名为 item
map类型有两个列, 一个是key, 一个是value

参考：

posted @ 2020-02-12 01:01 大数据小码农阅读(2150) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 为什么说在企业级应用开发中，后端往往是效率杀手？
· DeepSeek 解答了困扰我五年的技术问题。时代确实变了！
· 本地部署DeepSeek后，没有好看的交互界面怎么行！
· 趁着过年的时候手搓了一个低代码框架
· 推荐一个DeepSeek 大模型的免费 API 项目！兼容OpenAI接口！

公告

昵称：大数据小码农
园龄： 5年5个月
粉丝： 11
关注： 0

2025年2月

日

一

二

三

四

五

六

大数据小码农

代码改变世界

Impala & Hive 使用复杂数据类型

1. 环境

2. Hive 使用复杂数据类型

2.1 数据格式

2.2 Hive 建表

2.3 导入数据

2.3 执行查询

3. Impala 使用复杂类型

3.1 Hive中建表（parquet格式，导入数据

3.2 刷新impala元数据

3.3 执行查询

公告

搜索

常用链接

我的标签

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论