Hive:建表、插入数据、插入复合类型
新建hive表:
CREATE TABLE `test`(
`a` timestamp,
`b` struct<t:timestamp>)
--下面可选
[row format delimited fields terminated by '\t']
[STORED AS Parquet]
查看建好的表的结构:
hive> show create table test;
OK
CREATE TABLE `test`(
`a` timestamp,
`b` struct<t:timestamp>)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://ambari.master.com:8020/apps/hive/warehouse/test'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='2',
'numRows'='2',
'rawDataSize'='78',
'totalSize'='80',
'transient_lastDdlTime'='1532416983')
Time taken: 0.189 seconds, Fetched: 18 row(s)
插入数据:
hive 不支持直接用insert插入复合类型(如test表中struct类型列),可以用以下方式间接插入
insert into test
select unix_timestamp('1970-01-01 08:00:00'),
named_struct('t', cast(unix_timestamp('1970-01-01 08:00:00') as timestamp))
-- named_struct是生成struct类型的函数
from tmp_table limit 1;
补充:
Hive表数据文件如果是parquet类型,struct复合类型里的类型为timestamp的列在存取时候,时区换算会出错,所以可以用varchar类型代替timestamp。