Hive DDL之表操作

1、官网操作:LanguageManual DDL - Apache Hive - Apache Software Foundation

 

2、创建表CREATE TABLE table_name

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
 

  

3、数据集emp.txt

(1)数据集明细

第一列:员工编号;第二列:员工姓名;第三列:员工岗位;

第四列:员工上级领导的编号;第五列:员工的入职时间;第六列:员工薪资;

第七列:补贴;第八列:岗位编号。

[hadoop@hadoop000 ~]$ cd data/
[hadoop@hadoop000 data]$ ls
[hadoop@hadoop000 data]$ cat emp.txt

  

4、将emp.txt中的数据创建一个表(区分大小写)

hive (testzhang_db)> CREATE TABLE IF NOT EXISTS emp(
                   > empno int,
                   > ename string,
                   > job string,
                   > mgr int,
                   > hiredate string,
                   > sal double,
                   > comm double,
                   > deptno int
                   > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

  

5、查找emp表中的数据,目前为空

hive (testzhang_db)> select * from emp;
OK
Time taken: 0.192 seconds

  

6、查看emp表结构

hive (testzhang_db)> desc emp;
OK
empno               	int                 	                    
ename               	string              	                    
job                 	string              	                    
mgr                 	int                 	                    
hiredate            	string              	                    
sal                 	double              	                    
comm                	double              	                    
deptno              	int                 	                    
Time taken: 0.053 seconds, Fetched: 8 row(s)

  

7、查看具体表信息,比如在hdfs上的存储路径(不推荐的方式)

hive (testzhang_db)> desc extended emp;
OK
empno               	int                 	                    
ename               	string              	                    
job                 	string              	                    
mgr                 	int                 	                    
hiredate            	string              	                    
sal                 	double              	                    
comm                	double              	                    
deptno              	int                 	                    
	 	 
Detailed Table Information	Table(tableName:emp, dbName:testzhang_db, owner:hadoop, createTime:1629971674, lastAccessTime:0, 
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:empno, type:int, comment:null), FieldSchema(name:ename, type:string, comment:null),
FieldSchema(name:job, type:string, comment:null), FieldSchema(name:mgr, type:int, comment:null),
FieldSchema(name:hiredate, type:string, comment:null), FieldSchema(name:sal, type:double, comment:null),
FieldSchema(name:comm, type:double, comment:null), FieldSchema(name:deptno, type:int, comment:null)],
location:hdfs://hadoop000:8020/user/hive/warehouse/testzhang_db.db/emp, inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1,
serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{field.delim=/t, serialization.format=/t}), bucketCols:[], sortCols:[], parameters:{},
skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[],
parameters:{transient_lastDdlTime=1629971674}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.049 seconds, Fetched: 10 row(s)
[hadoop@hadoop000 bin]$ hadoop fs -ls /user/hive/warehouse/testzhang_db.db
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2021-08-26 17:54 /user/hive/warehouse/testzhang_db.db/emp
drwxr-xr-x   - hadoop supergroup          0 2021-08-06 19:59 /user/hive/warehouse/testzhang_db.db/helloworld

  

8、工作中,查看具体表信息,比如在hdfs上的存储路径

hive (testzhang_db)> desc formatted emp;
OK
# col_name            	data_type           	comment             
	 	 
empno               	int                 	                    
ename               	string              	                    
job                 	string              	                    
mgr                 	int                 	                    
hiredate            	string              	                    
sal                 	double              	                    
comm                	double              	                    
deptno              	int                 	                    
	 	 
# Detailed Table Information	 	 
Database:           	testzhang_db        	 
Owner:              	hadoop              	 
CreateTime:         	Thu Aug 26 17:54:34 CST 2021	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://hadoop000:8020/user/hive/warehouse/testzhang_db.db/emp	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1629971674          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	/t                  
	serialization.format	/t                  
Time taken: 0.11 seconds, Fetched: 34 row(s)

  

9、表已有,现在开始加载数据

(1)数据路径

[hadoop@hadoop000 data]$ pwd
/home/hadoop/data

(2)确认表中无数据

hive> use testzhang_db;
OK
Time taken: 0.009 seconds
hive> select * from emp;
OK
Time taken: 0.266 seconds

(3)加载(区分大小写)

hive (testzhang_db)> LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE emp;

  

10、确认数据加载成功

hive> select * from emp;

  

11、更改表名

hive> ALTER TABLE emp rename to emp2;
hive> show tables;
OK
emp2
helloworld

 

12、删除表

hive (testzhang_db)> drop table if exists emp2;
OK
Time taken: 0.197 seconds

  

 

posted @ 2021-08-26 13:47  酱汁怪兽  阅读(97)  评论(0编辑  收藏  举报