Hive静态分区和动态分区,对应Mysql中的元数据信息

静态分区:

手动指定分区加载数据,就是常说的静态分区的使用。但是在日常工作中用的比较多的是动态分区。

创建:

hive> create table order_mulit_partition(
    > order_number string,
    > event_time string
    > )
    > PARTITIONED BY(event_month string, step string)
    > row format delimited fields terminated by '\t';

加载数据:

hive> load data local inpath '/opt/data/order_created.txt' overwrite into table order_mulit_partition PARTITION(event_month='201405', step='1');

查看数据:

hive> select * from order_mulit_partition;
OK
10703007267488  2014-05-01 06:01:12.334+01  NULL    201405  1
10101043505096  2014-05-01 07:28:12.342+01  NULL    201405  1
10103043509747  2014-05-01 07:50:12.33+01   NULL    201405  1
10103043501575  2014-05-01 09:27:12.33+01   NULL    201405  1
10104043514061  2014-05-01 09:03:12.324+01  NULL    201405  1

动态分区:

需求:按照不同部门作为分区导数据到目标表

以上需求如果用静态分区的话,数据量大你是不是很懵逼??所以这个需求一般采用动态分区来实现。

1、emp表的创建及内容

创建:

hive> create table emp(
    > empno int,
    > ename string,
    > job string,
    > mgr int,
    > hiredate string,
    > sal double,
    > comm double,
    > deptno int
    > )row format delimited fields terminated by '\t';

加载数据:

hive> load data local inpath '/home/hadoop/datas/emp.txt' overwrite into table emp;

字段内容:

hive> select * from emp;
OK
7369    SMITH   CLERK   7902    1980-12-17  800.0   NULL    20
7499    ALLEN   SALESMAN    7698    1981-2-20   1600.0  300.0   30
7521    WARD    SALESMAN    7698    1981-2-22   1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2    2975.0  NULL    20
7654    MARTIN  SALESMAN    7698    1981-9-28   1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-5-1    2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-6-9    2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987-4-19   3000.0  NULL    20
7839    KING    PRESIDENT   NULL    1981-11-17  5000.0  NULL    10
7844    TURNER  SALESMAN    7698    1981-9-8    1500.0  0.0 30
7876    ADAMS   CLERK   7788    1987-5-23   1100.0  NULL    20
7900    JAMES   CLERK   7698    1981-12-3   950.0   NULL    30
7902    FORD    ANALYST 7566    1981-12-3   3000.0  NULL    20
7934    MILLER  CLERK   7782    1982-1-23   1300.0  NULL    10
8888    HIVE    PROGRAM 7839    1988-1-23   10300.0 NULL    NULL

显示分区:

hive> show partitions order_mulit_partition;
OK
event_month=201405/step=1

2、创建目标表

hive> create table emp_dynamic_partition(
              > empno int, 
              > ename string, 
              > job string, 
              > mgr int, 
              > hiredate string, 
              > sal double, 
              > comm double)
              > PARTITIONED BY(deptno int)
              > row format delimited fields terminated by '\t';

3、 采用动态方式家在数据到目标表

加载之前先设置下面的参数(设置为非严格模式)

hive> set hive.exec.dynamic.partition.mode=nonstrict

开始加载数据

hive> insert into table emp_dynamic_partition partition(deptno)
select empno , ename , job , mgr , hiredate , sal , comm, deptno from emp;

注意: 上面加载数据方式并没有指定具体的分区,只是指出了分区字段。在select最后一个字段必须跟你的分区字段,这样就会自行根据deptno的value来分区。

4、验证

有值

hive> select * from emp_dynamic_partition;
OK
7782    CLARK   MANAGER 7839    1981-6-9    2450.0  NULL    10
7839    KING    PRESIDENT   NULL    1981-11-17  5000.0  NULL    10
7934    MILLER  CLERK   7782    1982-1-23   1300.0  NULL    10
7369    SMITH   CLERK   7902    1980-12-17  800.0   NULL    20
7566    JONES   MANAGER 7839    1981-4-2    2975.0  NULL    20
7788    SCOTT   ANALYST 7566    1987-4-19   3000.0  NULL    20
7876    ADAMS   CLERK   7788    1987-5-23   1100.0  NULL    20
7902    FORD    ANALYST 7566    1981-12-3   3000.0  NULL    20
7499    ALLEN   SALESMAN    7698    1981-2-20   1600.0  300.0   30
7521    WARD    SALESMAN    7698    1981-2-22   1250.0  500.0   30
7654    MARTIN  SALESMAN    7698    1981-9-28   1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-5-1    2850.0  NULL    30
7844    TURNER  SALESMAN    7698    1981-9-8    1500.0  0.0 30
7900    JAMES   CLERK   7698    1981-12-3   950.0   NULL    30
8888    HIVE    PROGRAM 7839    1988-1-23   10300.0 NULL    NULL

有分区(自动分区)

hive> show partitions emp_dynamic_partition;
OK
deptno=10
deptno=20
deptno=30
deptno=__HIVE_DEFAULT_PARTITION__
Time taken: 0.375 seconds, Fetched: 4 row(s)

5、各个分区中的内容

deptno=10

782 CLARK   MANAGER 7839    1981-6-9    2450.0  \N
7839    KING    PRESIDENT   \N  1981-11-17  5000.0  \N
7934    MILLER  CLERK   7782    1982-1-23   1300.0  \N

deptno=20

7369    SMITH   CLERK   7902    1980-12-17  800.0   \N
7566    JONES   MANAGER 7839    1981-4-2    2975.0  \N
7788    SCOTT   ANALYST 7566    1987-4-19   3000.0  \N
7876    ADAMS   CLERK   7788    1987-5-23   1100.0  \N
7902    FORD    ANALYST 7566    1981-12-3   3000.0  \N

deptno=30

7499    ALLEN   SALESMAN    7698    1981-2-20   1600.0  300.0
7521    WARD    SALESMAN    7698    1981-2-22   1250.0  500.0
7654    MARTIN  SALESMAN    7698    1981-9-28   1250.0  1400.0
7698    BLAKE   MANAGER 7839    1981-5-1    2850.0  \N
7844    TURNER  SALESMAN    7698    1981-9-8    1500.0  0.0
7900    JAMES   CLERK   7698    1981-12-3   950.0   \N

deptno=__HIVE_DEFAULT_PARTITION__

8888    HIVE    PROGRAM 7839    1988-1-23   10300.0 \N

6、对应mysql中元数据的信息:

mysql> select * from TBLS \G;
*************************** 1. row ***************************
            TBL_ID: 1
       CREATE_TIME: 1561364642
             DB_ID: 6
  LAST_ACCESS_TIME: 0
             OWNER: hadoop
         RETENTION: 0
             SD_ID: 1
          TBL_NAME: order_mulit_partition
          TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
*************************** 2. row ***************************
            TBL_ID: 2
       CREATE_TIME: 1561365034
             DB_ID: 6
  LAST_ACCESS_TIME: 0
             OWNER: hadoop
         RETENTION: 0
             SD_ID: 3
          TBL_NAME: emp_dynamic_partition
          TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
*************************** 3. row ***************************
            TBL_ID: 6
       CREATE_TIME: 1561366290
             DB_ID: 6
  LAST_ACCESS_TIME: 0
             OWNER: hadoop
         RETENTION: 0
             SD_ID: 6
          TBL_NAME: emp
          TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
3 rows in set (0.00 sec)

ERROR:
No query specified

mysql> select * from PARTITIONS;
+---------+-------------+------------------+-----------------------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME                         | SD_ID | TBL_ID |
+---------+-------------+------------------+-----------------------------------+-------+--------+
|       1 |  1561364780 |                0 | event_month=201405/step=1         |     2 |      1 |
|       6 |  1561371156 |                0 | deptno=10                         |    11 |      2 |
|       7 |  1561371157 |                0 | deptno=__HIVE_DEFAULT_PARTITION__ |    12 |      2 |
|       8 |  1561371158 |                0 | deptno=30                         |    13 |      2 |
|       9 |  1561371158 |                0 | deptno=20                         |    14 |      2 |
+---------+-------------+------------------+-----------------------------------+-------+--------+
5 rows in set (0.00 sec)

mysql> select * from  PARTITION_KEYS;
+--------+--------------+-------------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME   | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-------------+-----------+-------------+
|      1 | NULL         | event_month | string    |           0 |
|      1 | NULL         | step        | string    |           1 |
|      2 | NULL         | deptno      | int       |           0 |
+--------+--------------+-------------+-----------+-------------+
3 rows in set (0.00 sec)

mysql> select * from PARTITION_KEY_VALS;
+---------+----------------------------+-------------+
| PART_ID | PART_KEY_VAL               | INTEGER_IDX |
+---------+----------------------------+-------------+
|       1 | 201405                     |           0 |
|       1 | 1                          |           1 |
|       6 | 10                         |           0 |
|       7 | __HIVE_DEFAULT_PARTITION__ |           0 |
|       8 | 30                         |           0 |
|       9 | 20                         |           0 |
+---------+----------------------------+-------------+
6 rows in set (0.00 sec)

mysql> select * from DBS;
+-------+-----------------------+-------------------------------------------------+---------+------------+------------+
| DB_ID | DESC                  | DB_LOCATION_URI                                 | NAME    | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+-------------------------------------------------+---------+------------+------------+
|     1 | Default Hive database | file:/hive/warehouse                            | default | public     | ROLE       |
|     6 |                       | hdfs://hadoop000:8020/hive/warehouse/g6_hive.db | g6_hive | NULL       | USER       |
+-------+-----------------------+-------------------------------------------------+---------+------------+------------+
2 rows in set (0.00 sec)

mysql> select * from COLUMNS_V2;
+-------+---------+--------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME  | TYPE_NAME | INTEGER_IDX |
+-------+---------+--------------+-----------+-------------+
|     1 | NULL    | event_time   | string    |           1 |
|     1 | NULL    | order_number | string    |           0 |
|     2 | NULL    | comm         | double    |           6 |
|     2 | NULL    | empno        | int       |           0 |
|     2 | NULL    | ename        | string    |           1 |
|     2 | NULL    | hiredate     | string    |           4 |
|     2 | NULL    | job          | string    |           2 |
|     2 | NULL    | mgr          | int       |           3 |
|     2 | NULL    | sal          | double    |           5 |
|     6 | NULL    | comm         | double    |           6 |
|     6 | NULL    | deptno       | int       |           7 |
|     6 | NULL    | empno        | int       |           0 |
|     6 | NULL    | ename        | string    |           1 |
|     6 | NULL    | hiredate     | string    |           4 |
|     6 | NULL    | job          | string    |           2 |
|     6 | NULL    | mgr          | int       |           3 |
|     6 | NULL    | sal          | double    |           5 |
+-------+---------+--------------+-----------+-------------+
17 rows in set (0.00 sec)

mysql> select * from VERSION;
+--------+----------------+-------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT                     |
+--------+----------------+-------------------------------------+
|      1 | 1.1.0          | Set by MetaStore hadoop@192.168.0.3 |
+--------+----------------+-------------------------------------+
1 row in set (0.00 sec)
posted @ 2019-06-24 19:01  BBBone  阅读(501)  评论(0编辑  收藏  举报