|NO.Z.00009|——————————|BigDataEnd|——|Hadoop&Hive.V09|——|Hive.v09|Hive_DDL数据库操作.v03|

一、分区表

### --- 分区表

~~~     Hive在执行查询时，一般会扫描整个表的数据。由于表的数据量大，全表扫描消耗时间长、效率低。
~~~     而有时候，查询只需要扫描表中的一部分数据即可，Hive引入了分区表的概念，
~~~     将表的数据存储在不同的子目录中，每一个子目录对应一个分区。
~~~     只查询部分分区数据时，可避免全表扫描，提高查询效率。
~~~     在实际中，通常根据时间、地区等信息进行分区。

### --- 分区表创建与数据加载

~~~     # 创建表
hive (mydb)> create table if not exists t3(
id int
,name string
,hobby array<string>
,addr map<String,string>
)
partitioned by (dt string)
row format delimited
fields terminated by ';'
collection items terminated by ','
map keys terminated by ':';

~~~     # 加载数据。
~~~     备注：分区字段不是表中已经存在的数据，可以将分区字段看成伪列

hive (mydb)> load data local inpath "/home/hadoop/data/t1.dat" into table t3 partition(dt="2021-08-20");
hive (mydb)> load data local inpath "/home/hadoop/data/t1.dat" into table t3 partition(dt="2021-08-21");

### --- 查看分区

hive (mydb)> show partitions t3;
OK
partition
dt=2021-08-20
dt=2021-08-21

### --- 新增分区并设置数据

~~~     # 增加一个分区，不加载数据
hive (mydb)> alter table t3 add partition(dt='2021-08-22');

~~~     # 增加多个分区，不加载数据
hive (mydb)> alter table t3 add partition(dt='2021-08-23') partition(dt='2021-08-24');

~~~     # 增加多个分区。准备数据
hive (mydb)> dfs -cp /user/hive/warehouse/mydb.db/t3/dt=2021-08-20 /user/hive/warehouse/mydb.db/t3/dt=2021-08-25;
hive (mydb)> dfs -cp /user/hive/warehouse/mydb.db/t3/dt=2021-08-20 /user/hive/warehouse/mydb.db/t3/dt=2021-08-26;

~~~     # 增加多个分区。加载数据
hive (mydb)> alter table t3 add
partition(dt='2021-08-25') location '/user/hive/warehouse/mydb.db/t3/dt=2021-08-25'
partition(dt='2021-08-26') location '/user/hive/warehouse/mydb.db/t3/dt=2021-08-26';

~~~     # 查询数据
select * from t3;

### --- 修改分区的hdfs路径

hive (mydb)> alter table t3 partition(dt='2021-08-20') set location '/user/hive/warehouse/t3/dt2021-08-22';

### --- 删除分区

~~~     # 可以删除一个或多个分区，用逗号隔开
hive (mydb)> alter table t3 drop partition(dt='2021-08-22'), partition(dt='2021-08-23');

二、分桶表

### --- 分桶表

~~~     当单个的分区或者表的数据量过大，分区不能更细粒度的划分数据，
~~~     就需要使用分桶技术将数据划分成更细的粒度。
~~~     将数据按照指定的字段进行分成多个桶中去，即将数据按照字段进行划分，
~~~     数据按照字段划分到多个文件当中去。分桶的原理：
~~~     MR中：key.hashCode % reductTask
~~~     Hive中：分桶字段.hashCode % 分桶个数

[root@linux123 ~]# vim /home/hadoop/data/course.dat
~~~     测试数据
1       java    90
1       c       78
1       python  91
1       hadoop  80
2       java    75
2       c       76
2       python  80
2       hadoop  93
3       java    98
3       c       74
3       python  89
3       hadoop  91
5       java    93
6       c       76
7       python  87
8       hadoop  88

~~~     # 创建分桶表
hive (mydb)> create table course(
id int,
name string,
score int
)
clustered by (id) into 3 buckets
row format delimited fields terminated by "\t";

~~~     # 创建普通表
hive (mydb)> create table course_common(
id int,
name string,
score int
)
row format delimited fields terminated by "\t";

~~~     # 普通表加载数据
hive (mydb)> load data local inpath '/home/hadoop/data/course.dat' into table course_common;

~~~     # 通过 insert ... select ... 给桶表加载数据
hive (mydb)> insert into table course select * from course_common;

~~~     # 输出参数
2021-08-23 18:11:49,279 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 11.61 sec
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 11.61 sec   HDFS Read: 16041 HDFS Write: 286 SUCCESS
OK
~~~     # 观察分桶数据。数据按照：(分区字段.hashCode) % (分桶数) 进行分区

~~~     # 查看分桶的数据

hive (mydb)> select * from course;
OK
course.id    course.name    course.score
3    hadoop    91
3    python    89
3    c    74
3    java    98
6    c    76
7    python    87
1    hadoop    80
1    python    91
1    c    78
1    java    90
8    hadoop    88
5    java    93
2    python    80
2    c    76
2    java    75
2    hadoop    93

### --- 备注：

~~~     分桶规则：分桶字段.hashCode % 分桶数
~~~     分桶表加载数据时，使用 insert... select ... 方式进行
~~~     网上有资料说要使用分区表需要设置 hive.enforce.bucketing=true，
~~~     那是Hive1.x 以前的版本；Hive 2.x 中，删除了该参数，始终可以分桶；

三、修改表 & 删除表

### --- 修改表/删除表

~~~     # 修改表名：rename
hive (mydb)> hive (mydb)>  alter table course_common rename to course_common1;

~~~     # 修改列名。change column
hive (mydb)> alter table course_common1 change column id cid int;

~~~     # 修改字段类型。change column
hive (mydb)> alter table course_common1 change column cid cid string;

~~~     # The following columns have types incompatible with the existing columns in their respective positions
~~~     # 修改字段数据类型时，要满足数据类型转换的要求。如int可以转为string，但是string不能转为int
~~~     # 增加字段。add columns
hive (mydb)> alter table course_common1 add columns (common string);

~~~     # 删除字段：replace columns
~~~     # 这里仅仅只是在元数据中删除了字段，并没有改动hdfs上的数据文件
hive (mydb)> alter table course_common1 replace columns(id string, cname string, score int);

~~~     # 删除表
hive (mydb)> drop table course_common1;

### --- HQL DDL命令小结：

~~~     # 主要对象：数据库、表
~~~     # 表的分类：
~~~     内部表。删除表时，同时删除元数据和表数据
~~~     外部表。删除表时，仅删除元数据，保留表中数据；生产环境多使用外部表
~~~     分区表。按照分区字段将表中的数据放置在不同的目录中，提高SQL查询的性能
~~~     分桶表。按照分桶字段，将表中数据分开。 分桶字段.hashCode % 分桶数据
~~~     主要命令：create、alter 、drop

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart

——W.S.Landor