hive行转列

一、问题

hive如何将

a       1,2,3
b       4,7
c       5

转化成为:

a       1
a       2
a       3
b       4
b       7
c       5

二、原始数据

cat row_column.txt
a       1,2,3
b       4,7
c       5

三、解决方案

3.1 遍历每一列

3.1.1 创建表

-- 创建表
create table tmp.row_column
(
col1 string,
col3 string
)
row format delimited fields terminated by '\t'
stored as textfile;
-- 载入数据
load data local inpath '/tmp/row_column.txt' into table row_column;

3.1.2 查看数据:

hive> select * from row_column;                                          
OK
a       1,2,3
b       4,7
c       5

3.1.3 遍历每一列

select col1,name 
from tmp.row_column
lateral view explode(split(col3,',')) col3 as name;
---------------------------------------------------------------
Total MapReduce CPU Time Spent: 2 seconds 20 msec
OK
a       1
a       2
a       3
b       4
b       7
c       5

3.2 数组遍历

3.2.1 创建表

create table tmp.row_column_array
(
  col1 string,
  col3 array<int>
)
row format delimited 
fields terminated by '\t'
collection items terminated by ','
stored as textfile;

3.2.2 加载数据

load data local inpath '/tmp/row_column.txt' into table tmp.row_column_array;

3.2.3 查看数据

hive> select * from tmp.row_column_array;
OK
a       [1,2,3]
b       [4,7]
c       [5]

3.2.4 查看每一列

select col1,name
from tmp.row_column_array
lateral view explode(col3) col3 as name;

3.2.5 结果

a       1
a       2
a       3
b       4
b       7
c       5

四、补充

查看使用逗号分割的列

select t.list[0],t.list[1],t.list[2] from (
select (split(col3,',')) list from tmp.row_column)t;
Total MapReduce CPU Time Spent: 1 seconds 740 msec
OK
1       2       3
4       7       NULL
5       NULL    NULL
Time taken: 15.264 seconds, Fetched: 3 row(s)

查看长度

select col1, size(split(col3,',')) list from tmp.row_column;
Total MapReduce CPU Time Spent: 1 seconds 690 msec
OK
a       3
b       2
c       1

 

posted @ 2015-03-28 15:29  李秋  阅读(2901)  评论(0编辑  收藏  举报