Hive DDL之表操作

1、官网操作：LanguageManual DDL - Apache Hive - Apache Software Foundation

2、创建表CREATE TABLE table_name

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

3、数据集emp.txt

（1）数据集明细

第一列：员工编号；第二列：员工姓名；第三列：员工岗位；

第四列：员工上级领导的编号；第五列：员工的入职时间；第六列：员工薪资；

第七列：补贴；第八列：岗位编号。

[hadoop@hadoop000 ~]$ cd data/
[hadoop@hadoop000 data]$ ls
[hadoop@hadoop000 data]$ cat emp.txt

4、将emp.txt中的数据创建一个表（区分大小写）

hive (testzhang_db)> CREATE TABLE IF NOT EXISTS emp(
                   > empno int,
                   > ename string,
                   > job string,
                   > mgr int,
                   > hiredate string,
                   > sal double,
                   > comm double,
                   > deptno int
                   > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

5、查找emp表中的数据，目前为空

hive (testzhang_db)> select * from emp;
OK
Time taken: 0.192 seconds

6、查看emp表结构

hive (testzhang_db)> desc emp;
OK
empno               	int                 	                    
ename               	string              	                    
job                 	string              	                    
mgr                 	int                 	                    
hiredate            	string              	                    
sal                 	double              	                    
comm                	double              	                    
deptno              	int                 	                    
Time taken: 0.053 seconds, Fetched: 8 row(s)

7、查看具体表信息，比如在hdfs上的存储路径（不推荐的方式）

hive (testzhang_db)> desc extended emp;
OK
empno               	int                 	                    
ename               	string              	                    
job                 	string              	                    
mgr                 	int                 	                    
hiredate            	string              	                    
sal                 	double              	                    
comm                	double              	                    
deptno              	int                 	                    
	 	 
Detailed Table Information	Table(tableName:emp, dbName:testzhang_db, owner:hadoop, createTime:1629971674, lastAccessTime:0, 
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:empno, type:int, comment:null), FieldSchema(name:ename, type:string, comment:null), 
FieldSchema(name:job, type:string, comment:null), FieldSchema(name:mgr, type:int, comment:null), 
FieldSchema(name:hiredate, type:string, comment:null), FieldSchema(name:sal, type:double, comment:null), 
FieldSchema(name:comm, type:double, comment:null), FieldSchema(name:deptno, type:int, comment:null)], 
location:hdfs://hadoop000:8020/user/hive/warehouse/testzhang_db.db/emp, inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, 
serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{field.delim=/t, serialization.format=/t}), bucketCols:[], sortCols:[], parameters:{}, 
skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], 
parameters:{transient_lastDdlTime=1629971674}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)	
Time taken: 0.049 seconds, Fetched: 10 row(s)

[hadoop@hadoop000 bin]$ hadoop fs -ls /user/hive/warehouse/testzhang_db.db
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2021-08-26 17:54 /user/hive/warehouse/testzhang_db.db/emp
drwxr-xr-x   - hadoop supergroup          0 2021-08-06 19:59 /user/hive/warehouse/testzhang_db.db/helloworld

8、工作中，查看具体表信息，比如在hdfs上的存储路径

hive (testzhang_db)> desc formatted emp;
OK
# col_name            	data_type           	comment             
	 	 
empno               	int                 	                    
ename               	string              	                    
job                 	string              	                    
mgr                 	int                 	                    
hiredate            	string              	                    
sal                 	double              	                    
comm                	double              	                    
deptno              	int                 	                    
	 	 
# Detailed Table Information	 	 
Database:           	testzhang_db        	 
Owner:              	hadoop              	 
CreateTime:         	Thu Aug 26 17:54:34 CST 2021	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://hadoop000:8020/user/hive/warehouse/testzhang_db.db/emp	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1629971674          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	/t                  
	serialization.format	/t                  
Time taken: 0.11 seconds, Fetched: 34 row(s)

9、表已有，现在开始加载数据

（1）数据路径

[hadoop@hadoop000 data]$ pwd
/home/hadoop/data

（2）确认表中无数据

hive> use testzhang_db;
OK
Time taken: 0.009 seconds
hive> select * from emp;
OK
Time taken: 0.266 seconds

（3）加载（区分大小写）

hive (testzhang_db)> LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE emp;

10、确认数据加载成功

hive> select * from emp;

11、更改表名

hive> ALTER TABLE emp rename to emp2;

hive> show tables;
OK
emp2
helloworld

12、删除表

hive (testzhang_db)> drop table if exists emp2;
OK
Time taken: 0.197 seconds

posted @ 2021-08-26 13:47 酱汁怪兽阅读(97) 评论(0) 编辑收藏举报

刷新页面返回顶部

酱汁怪兽

Hive DDL之表操作

公告