Hive DDL之表操作
1、官网操作:LanguageManual DDL - Apache Hive - Apache Software Foundation
2、创建表CREATE TABLE table_name
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later) [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later) [AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
3、数据集emp.txt
(1)数据集明细
第一列:员工编号;第二列:员工姓名;第三列:员工岗位;
第四列:员工上级领导的编号;第五列:员工的入职时间;第六列:员工薪资;
第七列:补贴;第八列:岗位编号。
[hadoop@hadoop000 ~]$ cd data/ [hadoop@hadoop000 data]$ ls [hadoop@hadoop000 data]$ cat emp.txt
4、将emp.txt中的数据创建一个表(区分大小写)
hive (testzhang_db)> CREATE TABLE IF NOT EXISTS emp( > empno int, > ename string, > job string, > mgr int, > hiredate string, > sal double, > comm double, > deptno int > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
5、查找emp表中的数据,目前为空
hive (testzhang_db)> select * from emp; OK Time taken: 0.192 seconds
6、查看emp表结构
hive (testzhang_db)> desc emp; OK empno int ename string job string mgr int hiredate string sal double comm double deptno int Time taken: 0.053 seconds, Fetched: 8 row(s)
7、查看具体表信息,比如在hdfs上的存储路径(不推荐的方式)
hive (testzhang_db)> desc extended emp; OK empno int ename string job string mgr int hiredate string sal double comm double deptno int Detailed Table Information Table(tableName:emp, dbName:testzhang_db, owner:hadoop, createTime:1629971674, lastAccessTime:0,
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:empno, type:int, comment:null), FieldSchema(name:ename, type:string, comment:null),
FieldSchema(name:job, type:string, comment:null), FieldSchema(name:mgr, type:int, comment:null),
FieldSchema(name:hiredate, type:string, comment:null), FieldSchema(name:sal, type:double, comment:null),
FieldSchema(name:comm, type:double, comment:null), FieldSchema(name:deptno, type:int, comment:null)],
location:hdfs://hadoop000:8020/user/hive/warehouse/testzhang_db.db/emp, inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1,
serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{field.delim=/t, serialization.format=/t}), bucketCols:[], sortCols:[], parameters:{},
skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[],
parameters:{transient_lastDdlTime=1629971674}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.049 seconds, Fetched: 10 row(s)
[hadoop@hadoop000 bin]$ hadoop fs -ls /user/hive/warehouse/testzhang_db.db Found 2 items drwxr-xr-x - hadoop supergroup 0 2021-08-26 17:54 /user/hive/warehouse/testzhang_db.db/emp drwxr-xr-x - hadoop supergroup 0 2021-08-06 19:59 /user/hive/warehouse/testzhang_db.db/helloworld
8、工作中,查看具体表信息,比如在hdfs上的存储路径
hive (testzhang_db)> desc formatted emp; OK # col_name data_type comment empno int ename string job string mgr int hiredate string sal double comm double deptno int # Detailed Table Information Database: testzhang_db Owner: hadoop CreateTime: Thu Aug 26 17:54:34 CST 2021 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://hadoop000:8020/user/hive/warehouse/testzhang_db.db/emp Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1629971674 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim /t serialization.format /t Time taken: 0.11 seconds, Fetched: 34 row(s)
9、表已有,现在开始加载数据
(1)数据路径
[hadoop@hadoop000 data]$ pwd /home/hadoop/data
(2)确认表中无数据
hive> use testzhang_db; OK Time taken: 0.009 seconds hive> select * from emp; OK Time taken: 0.266 seconds
(3)加载(区分大小写)
hive (testzhang_db)> LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE emp;
10、确认数据加载成功
hive> select * from emp;
11、更改表名
hive> ALTER TABLE emp rename to emp2;
hive> show tables; OK emp2 helloworld
12、删除表
hive (testzhang_db)> drop table if exists emp2; OK Time taken: 0.197 seconds