Hive - Partitioning
Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.
Tables or partitions are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.
Hive将表组织为分区。 它是一种基于分区列(如日期,城市和部门)的值将表分为相关部分的方法。 使用分区,很容易查询一部分数据。
表或分区被细分为桶(buckets),以提供可用于更有效查询的数据的额外结构。 基于表的某些列的散列函数的值来工作。---> 译注:在本人看来分区就是类似于数据库的分区,存放在不同的位置,甚至可以分散到不同的服务器上;目的应该是相同的,那就是提高处理速度最终达到提升性能的终极目标。
For example, a table named Tab1 contains employee data such as id, name, dept, and yoj (i.e., year of joining). Suppose you need to retrieve the details of all employees who joined in 2012. A query searches the whole table for the required information. However, if you partition the employee data with the year and store it in a separate file, it reduces the query processing time. The following example shows how to partition a file and its data:
The following file contains employeedata table.
例如,名为Tab1的表包含雇员数据,例如id,name,dept和yoj(即,加入年份)。 假设您需要检索在2012年加入的所有员工的详细信息。查询在整个表中搜索所需的信息。 但是,如果您将雇员数据与年份分区并将其存储在单独的文件中,则会减少查询处理时间。 以下示例显示如何对文件及其数据进行分区:
以下文件包含employeedata表。
/tab1/employeedata/file1
id, name, dept, yoj 1, gopal, TP, 2012 2, kiran, HR, 2012 3, kaleel,SC, 2013 4, Prasanth, SC, 2013
The above data is partitioned into two files using year.上面数据如果按照year分区,那么会分配到两个分区文件中:
/tab1/employeedata/2012/file2
1, gopal, TP, 2012 2, kiran, HR, 2012
/tab1/employeedata/2013/file3
3, kaleel,SC, 2013 4, Prasanth, SC, 2013
Adding a Partition
We can add partitions to a table by altering the table. Let us assume we have a table called employee with fields such as Id, Name, Salary, Designation, Dept, and yoj.
Syntax:
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...; partition_spec: : (p_column = p_col_value, p_column = p_col_value, ...)
The following query is used to add a partition to the employee table. 给employee表增加以year列为准的分区;
hive> ALTER TABLE employee > ADD PARTITION (year=’2013’) > location '/2012/part2012';
Renaming a Partition
The syntax of this command is as follows.
ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;
The following query is used to rename a partition: 重命名分区,把employee表从原来的按照year列分区,重名为按照Yoj列分区;
hive> ALTER TABLE employee PARTITION (year=’1203’) > RENAME TO PARTITION (Yoj=’1203’);
Dropping a Partition
The following syntax is used to drop a partition:
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...;
The following query is used to drop a partition:
hive> ALTER TABLE employee DROP [IF EXISTS] > PARTITION (year=’1203’);
译注:本人日常中删除分区的脚本贴一个:
# 删除2017-01-18号的分区数据 ALTER TABLE temp.snapshort_adv_pos DROP IF EXISTS PARTITION (dt='2017-01-18');
----------
英文地址:https://www.tutorialspoint.com/hive/hive_partitioning.htm