Hive的基本常识

1、强制删除数据库：　

通常情况下数据库中有表是不能删除的，强制删除数据库使用“cascade”。如强制删除数据库“test”
drop database test cascade;

2、创建表指定分隔符，不指定分隔符默认是TextFile格式，输出时字段内容会挨着。

cmissh@hn0-stg02:~$ hdfs dfs -cat wasb://system@cmidapsystem01.blob.core.chinacloudapi.cn/hive/warehouse/rx801.db/test/000000_0
zhangsan10

创建表时指定分隔符，如使用 ， 作为分隔符，使用 row format delimited fields terminated by ',';

例：创建test表：
create table test (name string,age int,address string) row format delimited fields terminated by ',';
insert into test values ('zhangsan',25,'beijing')
通过show create table test;  可以看到文件存放的位置，wasb://system@cmidapsystem01.blob.core.chinacloudapi.cn/hive/warehouse/rx801.db/test
插入操作都以文件的形式记录在Blob中。

查看文件
hdfs dfs -ls wasb://system@cmidapsystem01.blob.core.chinacloudapi.cn/hive/warehouse/rx801.db/test
将会显示：wasb://system@cmidapsystem01.blob.core.chinacloudapi.cn/hive/warehouse/rx801.db/test/000000_0
通过查看可以看到是以分隔符 ， 进行分割的。
cmissh@hn0-stg02:~$ hdfs dfs -cat wasb://system@cmidapsystem01.blob.core.chinacloudapi.cn/hive/warehouse/rx801.db/test/000000_0
zhangsan,25,beijing

　3、表中添加字段。

表中添加字段使用：alter table 表名 add columns(新字段 字段类型);

例：test表中添加一个string类型的tel字段。
hive> alter table test add columns(tel string);
OK
Time taken: 0.312 seconds
hive> desc test;
OK
name                    string
age                     int
address                 string
tel                     string
Time taken: 0.185 seconds, Fetched: 4 row(s)

4、表中修改字段或类型

修改字段名使用：Alter table 表名 change column 原字段名称 现字段名称 数据类型;

例：将test表中的tel修改为int类型
hive> alter table test change column tel tel int;
OK
Time taken: 0.467 seconds
hive> desc test;
OK
name                    string
age                     int
address                 string
tel                     int
Time taken: 0.23 seconds, Fetched: 4 row(s)

　5、表中删除字段

Hive中不支持使用 alter table 表名 drop columns 这种语法，支持replace

语法为：alter table 表名 replace columns(列名1 类型，列名2 类型，列名3 类型,....)；
replace的使用是保留括号中的列

例：删除test表中的tel列。(将需要保留的列输入到括号中)
hive> alter table test replace columns(name string,age int,address string);
OK
Time taken: 0.303 seconds
hive> desc test;
OK
name                    string
age                     int
address                 string
Time taken: 0.179 seconds, Fetched: 3 row(s)

6、表重命名

语法格式：alter table 表名 rename to 新表名

例：将表test重命名为new_test
hive> alter table test rename to new_test;
OK
Time taken: 0.43 seconds

7、字段位置移动到某字段后面

更改字段的位置使用：Alter table 表名 change column 原字段名称 现字段名称 数据类型 after 字段名;

例：将test表中的tel字段移动到name字段后面。
hive> alter table test change column tel tel int after name;
OK
Time taken: 0.467 seconds
hive> desc test;
OK
name                    string
tel　　　　　　　　　　　　 int
age                     int
address                 string
Time taken: 0.255 seconds, Fetched: 4 row(s)

8、删除操作：

truncate 只能管理表，不能删除外部表中的数据。
删除表：
　　deop table 表名;
删除表中数据：
　　truncate table 表名;
删除具体的分区：
　　alter table 表名 drop partition(partition_name='分区名');
删除partition的部分数据：
　　insert overwrite table 表名 partition(分区名='条件') select * from table 表名 where 分区名='条件';
　　where后的条件是需要保留的数据的查询结果。
删除非分区表中的数据：
　　insert overwrite table 表名 select * from 表名 where 条件;
　　where后的条件是需要保留的数据的查询结果。

例：将表test中的内容只保留zhangsan的信息。
hive> select * from test;
OK
zhangsan        10      beijing
lisi    20      shanghai
wangwu  30      guangzhou
Time taken: 0.257 seconds, Fetched: 3 row(s)
hive> insert overwrite table test select * from test where name='zhangsan';
hive> select * from test;
OK
zhangsan        10      beijing
Time taken: 0.252 seconds, Fetched: 1 row(s)

9、插入操作：

　　insert into 和 insert overwrite 都能够向hive表中插入数据，两者的区别是：insert into 操作是以追加的方式向 hive 表尾部追加数据，而 insert overwrite 操作是直接重写数据，即先删除 hive 表中的数据，再执行写入操作。如果 hive 表是分区表的话，insert overwrite 操作只会重写当前分区的数据，不会重写其他分区的数据。

hive> select * from test;
OK
zhangsan        10      beijing
Time taken: 0.25 seconds, Fetched: 1 row(s)

使用 insert into 向表test中插入两条数据：
hive> insert into table test values ('lisi',20,'shanghai'),('wangwu',30,'guangzhou');
hive> select * from test;
OK
zhangsan        10      beijing
lisi    20      shanghai
wangwu  30      guangzhou
Time taken: 0.24 seconds, Fetched: 3 row(s)

使用 insert overwrite 重写表test：
hive> insert overwrite table test values ('cummins',100,'chaoyangqu');
hive> select * from test;
OK
cummins 100     chaoyangqu
Time taken: 0.266 seconds, Fetched: 1 row(s)

10、创建和已知表结构相同的表

语法格式：create table 新表名 like 表名;

例；创建一个和表test结构相同的表new_test
hive> desc test;
OK
name                    string
age                     int
address                 string
Time taken: 0.235 seconds, Fetched: 3 row(s)

hive> create table new_test like test;
OK
Time taken: 0.482 seconds

hive> desc new_test;
OK
name                    string
age                     int
address                 string
Time taken: 0.192 seconds, Fetched: 3 row(s)

11、创建分区表

创建分区需要使用：partitioned by (分区名 类型);

例：创建一个表testp,分区表为country。
hive> create table testp (uid int,uname string,usage int)
    > partitioned by (country string)
    > row format delimited
    > fields terminated by ','
    > ;
分区表country在表testp中是一个伪列。


将对应路径下的文件加载到表testp中
cmissh@hn0-stg02:~/qsw$ pwd
/home/cmissh/qsw
cmissh@hn0-stg02:~/qsw$ ls
test  test1  test2

新增一个分区列usa。
hive> load data local inpath '/home/cmissh/qsw' into table testp partition(country='usa');
　　指定文件夹会将文件内容依次加载到分区列中。
hive> select * from testp;
OK
1       aaa     18      usa
2       aaa     18      usa
3       aaa     18      usa
4       aaa     18      usa
5       aaa     18      usa
10      bbb     20      usa
20      bbb     20      usa
30      bbb     20      usa
40      bbb     20      usa
50      bbb     20      usa
100     ccc     10      usa
200     ccc     10      usa
300     ccc     10      usa
400     ccc     10      usa
500     ccc     10      usa
Time taken: 0.358 seconds, Fetched: 15 row(s)

再新建一个分区列china。
hive> load data local inpath '/home/cmissh/qsw' into table testp partition(country='china');
hive> select * from testp;
OK
1       aaa     18      china
2       aaa     18      china
3       aaa     18      china
4       aaa     18      china
5       aaa     18      china
10      bbb     20      china
20      bbb     20      china
30      bbb     20      china
40      bbb     20      china
50      bbb     20      china
100     ccc     10      china
200     ccc     10      china
300     ccc     10      china
400     ccc     10      china
500     ccc     10      china
1       aaa     18      usa
2       aaa     18      usa
3       aaa     18      usa
4       aaa     18      usa
5       aaa     18      usa
10      bbb     20      usa
20      bbb     20      usa
30      bbb     20      usa
40      bbb     20      usa
50      bbb     20      usa
100     ccc     10      usa
200     ccc     10      usa
300     ccc     10      usa
400     ccc     10      usa
500     ccc     10      usa
Time taken: 0.346 seconds, Fetched: 30 row(s)

hive> show partitions testp;
OK
country=china
country=usa
Time taken: 0.259 seconds, Fetched: 2 row(s)

删除china分区列。
hive> alter table testp drop partition(country='china');
hive> show partitions testp;
OK
country=usa
Time taken: 0.243 seconds, Fetched: 1 row(s)

12、清空分区表

alter table 表名 drop partition (country='值')；

例：删除表testp中的usa分区列
hive> select * from testp;
OK
1       aaa     18      usa
2       aaa     18      usa
3       aaa     18      usa
4       aaa     18      usa
5       aaa     18      usa
10      bbb     20      usa
20      bbb     20      usa
30      bbb     20      usa
40      bbb     20      usa
50      bbb     20      usa
100     ccc     10      usa
200     ccc     10      usa
300     ccc     10      usa
400     ccc     10      usa
500     ccc     10      usa
Time taken: 0.458 seconds, Fetched: 15 row(s)

hive> alter table testp drop partition (country='usa');
Dropped the partition country=usa
OK
Time taken: 0.944 seconds

hive> show partitions testp;
OK
Time taken: 0.3 seconds

13、hive查询分区数据

因为分区是一个伪列，虽然不存在文件中，但是可以通过where条件进行查询。

例；查询tetsp表中china分区列所对应的值。
hive> show partitions testp;
OK
country=china
country=india
country=usa
Time taken: 0.199 seconds, Fetched: 3 row(s)

hive> select * from testp where country='china';
OK
1       aaa     18      china
2       aaa     18      china
3       aaa     18      china
4       aaa     18      china
5       aaa     18      china
1001    abc     10      china
1002    abc     10      china
1003    abc     10      china
1004    abc     10      china
10      bbb     20      china
20      bbb     20      china
30      bbb     20      china
40      bbb     20      china
50      bbb     20      china
100     ccc     10      china
200     ccc     10      china
300     ccc     10      china
400     ccc     10      china
500     ccc     10      china
Time taken: 0.525 seconds, Fetched: 19 row(s)

14、Hive分桶

posted @ 2020-09-14 11:42 Lillard-Time 阅读(472) 评论(0) 收藏举报

刷新页面返回顶部

Lillard-Time

Hive的基本常识

公告