HIVE分区

HIVE分区

简单分区

hive> CREATE TABLE psn_2(
    > id int,
    > name string,
    > likes array<string>,
    > address map<string,string>
    > )
    > PARTITIONED BY (age int)
    > ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY ','
    > COLLECTION ITEMS TERMINATED BY '-'
    > MAP KEYS TERMINATED BY ':';
OK
Time taken: 20.107 seconds
hive> desc formtted psn_2;
FAILED: SemanticException [Error 10001]: Table not found formtted
hive> desc formatted psn_2;
OK
# col_name            	data_type           	comment             
	 	 
id                  	int                 	                    
name                	string              	                    
likes               	array<string>       	                    
address             	map<string,string>  	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
age                 	int                 	                    
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Sat Apr 25 23:45:22 CST 2020	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://mycluster/user/hive/warehouse/psn_2	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1587829522          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	colelction.delim    	-                   
	field.delim         	,                   
	mapkey.delim        	:                   
	serialization.format	,                   
Time taken: 0.682 seconds, Fetched: 37 row(s)
hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=10);
Loading data to table default.psn_2 partition (age=10)
Partition default.psn_2{age=10} stats: [numFiles=1, numRows=0, totalSize=419, rawDataSize=0]
OK
Time taken: 16.219 seconds
hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=20);
Loading data to table default.psn_2 partition (age=20)
Partition default.psn_2{age=20} stats: [numFiles=1, numRows=0, totalSize=419, rawDataSize=0]
OK
Time taken: 1.096 seconds
hive> select * from psn_2;
OK
1	小明1	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
2	小明2	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
3	小明3	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
4	小明4	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
5	小明5	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
6	小明6	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
7	小明7	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
1	小明1	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
2	小明2	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
3	小明3	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
4	小明4	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
5	小明5	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
6	小明6	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
7	小明7	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
Time taken: 0.656 seconds, Fetched: 14 row(s)

注意分区的字段不需要在table中定义出来,

这样的分区是在表的文件下面创建分区文件夹,文件夹的名称为分区键的值,文件夹下面则是数据内容。

多分区

hive> CREATE TABLE psn_3(
    > id int,
    > name string,
    > likes array<string>,
    > address map<string,string>
    > )
    > PARTITIONED BY (age int,sex string)
    > ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY ','
    > COLLECTION ITEMS TERMINATED BY '-'
    > MAP KEYS TERMINATED BY ':';
OK
Time taken: 20.107 seconds
hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=10,sex='man');

相同在hdfs的文件夹 psn_3/age=?/sex='?'/这样的组合文件夹。

查看data的数据时候:

[root@hadoopNode02 mgs]# hdfs dfs -cat /user/hive/warehouse/psn_2/age=10/data
1,小明1,lol-book-movie,beijing:shangxuetang-shanghai:pudong
2,小明2,lol-book-movie,beijing:shangxuetang-shanghai:pudong
3,小明3,lol-book-movie,beijing:shangxuetang-shanghai:pudong
4,小明4,lol-book-movie,beijing:shangxuetang-shanghai:pudong
5,小明5,lol-movie,beijing:shangxuetang-shanghai:pudong
6,小明6,lol-movie,beijing:shangxuetang-shanghai:pudong
7,小明7,lol-movie,beijing:shangxuetang-shanghai:pudong

age =10 的字段是文件夹带人给读入的

添加分区

添加分区是不按照顺序的

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];
 
partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

例如:

ALTEL TABLE table_name add PARTITION([这里添加进去的分区名称,定义好的分区所有字段])
ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/path/to/us/part080808'
                          PARTITION (dt='2008-08-09', country='us') location '/path/to/us/part080809';

删除分区

·添加的时候必须一次添加必须是指定分区的所有字段,但是删除可以分开删除字段,

ALTER TABLE table_name drop [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];
posted @ 2020-08-04 16:46  shaoyayu  阅读(371)  评论(0编辑  收藏  举报