Hive的日常操作笔记

1、insert into 与insert overwrite的区别

都是向 hive 表中插入数据，但 insert into 操作是以追加的方式向 hive 表尾部追加数据，而 insert overwrite 操作则是直接重写数据，即先删除 hive 表的数据，再执行写入操作。注意，如果 hive 表是分区表的话，insert overwrite 操作只会重写当前分区的数据，不会重写其他分区数据。

2、表中增加字段，数据重跑的问题

首先想到的是先在表中增加字段(也可以用replace)。

1）alter table 表名 add columns(new_col string);

然后重跑数据

2) insert overwrite table 表名 partition(dt='2020-08-29')

这种后果是，我们新增的字段new_col的值为空。

解决方案：

1）在1.1.0中表和分区的元数据就是分开处理的，在add或replace时加上cascade能同时更新表和分区，如果在添加字段的时候没有指定的cascade的情况

因为我们在重跑数据的时候，虽然HDFS上的数据更新了，但是我们查询的时候仍然查询的是旧的元数据信息。

2)新生成的分区是不会有问题的，Hive会自动维护新分区中的元数据

3)先drop掉老分区，再insert overwrite

4)使用如下命令来对分区添加 alter table 表名 partition(dt='2020-08-29') add columns(字段名类型);

3、常用操作

select * from 表名 where dt='0000-01-09' and type='one'

show create table 表名

insert into table 表名 partition(dt='0000-01-09', type='one') values('',null,null,'')

alter table 表名 drop partition(dt='0000-01-09',type='one’);

create table if not exists 表名

(
    ut    string    comment '',
    ype    string    comment '',
    sordid    string    comment '',
    sid    string    comment ''
)partitioned by (
   dt string,
type string) ;

posted @ 2020-09-10 20:43 xuzhujack 阅读(298) 评论(0) 编辑收藏举报

刷新页面返回顶部

xuzhujack

Walk the road you want to walk and do what you want to do , keep moving ahead and ... ...

Hive的日常操作笔记

公告