hive中表的创建和对表数据的操作

一、hive中表分为两种

　　1、内部表(管理表):

　　　　　　删除表的时候删除hdfs上的数据。

　　2、外部表

　　　　　　删除表的时候不删除hdfs上的数据。

　　　　　　外部表不能使用insert的方式插入数据，所有的数据来源，都是外部别人提供的，所以hive认为自己没有独占这份数据，所以删除hive表的时候，不会删　　　　　除表里面的数据

二、对hive表或者表中数据的操作;

　　1、insert into 一般强烈不建议使用这种方式来插入数据，因为会在HDFS上面产生小文件，影响HDFS的元数据管理

　　2、hive在建表的时候如果不使用分隔符,就默认使用\001.是一个asc码值,一个非打印字符。

　　3、在创建表的时候指定分隔符

　　　　创建内部表

　　　　create table if not exists stu2(id int,name string) row format delimited fileds terminated by '\t' stored as textfile location '/user/hive/warehouse/myhive/stu2';

　　　　创建外部表

　　　　create external table if not exists student(s_id string,s_name string) row format delimited fields terminated by '\t' stored as textfile location '/user/hive/warehouse/myhive/student';

　　4、根据查询结果创建表，并且将查询结果的数据放到新建的表里面去

　　　　　　create table stu3 as select * from stu2;这种方式用的比较多

　　　　　　根据已经存在的表结构创建表，这种方式只复制表结构:

　　　　　　create table stu4 like stu2;

　　5、查询表的类型:

　　　　desc formatted stu2;

　　6、如何向外部表里面加载数据呢?

　　　1、从本地文件系统向表中加载数据

　　　　　load data local inpath '/export/servers/hivedatas/student.csv' into table student;

　　　　　加载数据并覆盖已有数据

　　　　　load data local inpath '/export/servers/hivedatas/student.csv' overwrite into table student;

　　　2、从hdfs文件系统向表中加载数据（需要提前将数据上传到hdfs文件系统，其实就是一个移动文件的操作）

　　　　load data inpath '/hivedatas/techer.csv' into table techer;

　　7、分区表

　　　　一般会和内部表和外部表搭配使用。比如:内部分区表　　　　外部分区表

　　　　创建分区表的语法:

　　　　create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by '\t' stored as textfile;

　　　　create table score2(s_id string,c_id string, s_score int) partitioned by (year string,month string,day string) row format delimited fields terminated by '\t' stored 　　　　as textfile;

　　8、往分区表里面加载数据:

　　　加载数据到一个分区表

　　　load data local inpath '/export/servers/hivedatas/score.csv' into table score partition(month='201806');

　　　加载数据到多个分区表:

　　　load data local inpath '/export/servers/hivedatas/score.csv' into table score2 partition(year='2018',month='06',day='18')

　　9、查看分区

　　show partitions score2;

　　10、添加分区、删除分区

　　添加一个分区

　　alter table score add partition(month='201805');

　　同时添加多个分区

　　alter table score2 add partition(year='2018',month='09',day='10');

　　删除分区

　　alter table score drop partition(month='201809');

　　10、进行表的修复，需要手动修复

　　　　进行表的修复,说白了就是建立我们表与我们数据文件之间的一个关系映射

　　　　msck repair table score4;

　　11、分桶表

　　　　将数据按照指定的字段进行分成多个桶中去，说白了就是将数据按照字段进行划分，可以将数据按照字段划分到多个文件当中去。

　　　　开启hive的桶表功能，默认是false关闭得

　　　　set hive.enforce.bucketing=true;

　　　　设置reduce的个数，默认是-1

　　　　set mapreduce.job.reduces=3;

　　　　怎么创建桶表?

　　　　create table course(c_id string,c_name string,t_id string) clustered by (c_id) into 3 buckets row format delimited fields terminated by '\t' stored as textfile;

　　　　桶表的数据加载，由于桶表的数据加载通过hdfs dfs -put文件或者通过load data均不好使，只能通过insert overwrite例如:

　　　　insert overwrite table course select * from course_common cluster by(c_id);

　　12、hive当中表得修改

　　　　　　1、重命名　　alter table old_table_name rename to new_table_name;

　　　　　　2、增加/修改列信息

　　　　　　　　（1）查询表结构

　　　　　　　　　　desc score5;

　　　　　　　　（2）添加列

　　　　　　　　　　alter table score5 add columns (mycol string, mysco string);

　　　　　　　　（3）查询表结构

　　　　　　　　　　desc score5;

　　　　　　　　（4）更新列

　　　　　　　　　　alter table score5 change column mysco mysconew int;

　　　　　　　　（5）查询表结构

　　　　　　　　　　desc score5;

　　13、hive表中的多插入模式常用于生产环境(距离)

　　　　常用于实际生产环境当中，将一张表拆开成两部分或者多部分给score表加载数据

　　　　load data local inpath '/export/servers/hivedatas/score.csv' overwrite into table score partition(month='201806');

　　　　创建第一部分表：

　　　　create table score_first( s_id string,c_id string) partitioned by (month string) row format delimited fields terminated by '\t' ;

　　　　创建第二部分表：

　　　　create table score_second(c_id string,s_score int) partitioned by (month string) row format delimited fields terminated by '\t';

　　　　分别给第一部分与第二部分表加载数据

　　　　from score

　　　　 insert overwrite table score_first partition(month='201806') select s_id,c_id

　　　　 insert overwrite table score_second partition(month = '201806') select c_id,s_score;

　　14、外部表和外部分区表的一点小区别:

　　　创建外部表的时候可以通过指定的location将我们的数据放到指定的位置，外部表就可以加载数据了了解一下

　　　如果是外部分区表，数据需要放到对应的路径，而且还需要执行修复的命令 msck repair table xxxtb_name

posted on 2020-03-31 18:02 $王大少阅读(2210) 评论(0) 编辑收藏举报

刷新页面返回顶部

$王大少

导航

公告

hive中表的创建和对表数据的操作