(四)hive之分桶表

分桶表数据存储

分区针对的是数据的存储路径;分桶针对的是数据文件。
分区提供一个隔离数据和优化查询的便利方式。不过,并非所有的数据集都可形成合理的分区,特别是之前所提到过的要确定合适的划分大小这个疑虑。
分桶是将数据集分解成更容易管理的若干部分的另一个技术。

创建分桶表
create table stu_buck(id int, name string)
clustered by(id) 
into 4 buckets
row format delimited fields terminated by '\t';

查看表结构

hive (default)> desc formatted stu_buck;

导入数据到分桶表中

hive (default)> load data local inpath '/opt/module/datas/student.txt' into table stu_buck;

但是查看分桶状况:没有分成4个桶

此时需要设置强制分桶,以及设置reduce task的个数为最大
hive (default)> set hive.enforce.bucketing=true;
hive (default)> set mapreduce.job.reduces=-1;

然后,可以先不忙删除stu_buck,现将数据备份到一个普通的表:stu,然后清理stu_buck表,接着讲stu普通标的数据导入到stu_buck表;

# 创建普通表:
create table stu(id int, name string)
row format delimited fields terminated by '\t';
# 向普通的stu表中导入数据
load data local inpath '/opt/module/datas/student.txt' into table stu;
# 清空stu_buck表中数据
truncate table stu_buck;
select * from stu_buck;
不清空,导入可能会报东拉西扯的错:
0: jdbc:hive2://hadoop02:10000> insert into table stu_buck
0: jdbc:hive2://hadoop102:10000> select id, name from stu;
INFO  : Session is already open
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1610203409423_0004)

INFO  : Map 1: -/-      Reducer 2: 0/4
INFO  : Map 1: 0/1      Reducer 2: 0/4
INFO  : Map 1: 0(+0,-1)/1       Reducer 2: 0/4
INFO  : Map 1: 0(+0,-2)/1       Reducer 2: 0/4
INFO  : Map 1: 0(+0,-3)/1       Reducer 2: 0/4
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1610203409423_0004_2_00, diagnostics=[Task failed, taskId=task_1610203409423_0004_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Container launch failed for container_1610203409423_0004_01_000003 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1610272032441 found 1610270813048
Note: System times on machines may be out of sync. Check system time and time zones.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$Container.launch(TezContainerLauncherImpl.java:171)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$EventProcessor.run(TezContainerLauncherImpl.java:396)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
], TaskAttempt 1 failed, info=[Container launch failed for container_1610203409423_0004_01_000004 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1610272034447 found 1610270815087
Note: System times on machines may be out of sync. Check system time and time zones.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$Container.launch(TezContainerLauncherImpl.java:171)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$EventProcessor.run(TezContainerLauncherImpl.java:396)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
], TaskAttempt 2 failed, info=[Container launch failed for container_1610203409423_0004_01_000005 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1610272036485 found 1610270817125
Note: System times on machines may be out of sync. Check system time and time zones.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$Container.launch(TezContainerLauncherImpl.java:171)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$EventProcessor.run(TezContainerLauncherImpl.java:396)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
], TaskAttempt 3 failed, info=[Container launch failed for container_1610203409423_0004_01_000006 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1610272038514 found 1610270819144
Note: System times on machines may be out of sync. Check system time and time zones.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$Container.launch(TezContainerLauncherImpl.java:171)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$EventProcessor.run(TezContainerLauncherImpl.java:396)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1610203409423_0004_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1610203409423_0004_2_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:4, Vertex vertex_1610203409423_0004_2_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1610203409423_0004_2_00, diagnostics=[Task failed, taskId=task_1610203409423_0004_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Container launch failed for container_1610203409423_0004_01_000003 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1610272032441 found 1610270813048
Note: System times on machines may be out of sync. Check system time and time zones.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$Container.launch(TezContainerLauncherImpl.java:171)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$EventProcessor.run(TezContainerLauncherImpl.java:396)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
], TaskAttempt 1 failed, info=[Container launch failed for container_1610203409423_0004_01_000004 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1610272034447 found 1610270815087
Note: System times on machines may be out of sync. Check system time and time zones.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$Container.launch(TezContainerLauncherImpl.java:171)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$EventProcessor.run(TezContainerLauncherImpl.java:396)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
], TaskAttempt 2 failed, info=[Container launch failed for container_1610203409423_0004_01_000005 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1610272036485 found 1610270817125
Note: System times on machines may be out of sync. Check system time and time zones.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$Container.launch(TezContainerLauncherImpl.java:171)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$EventProcessor.run(TezContainerLauncherImpl.java:396)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
], TaskAttempt 3 failed, info=[Container launch failed for container_1610203409423_0004_01_000006 : org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1610272038514 found 1610270819144
Note: System times on machines may be out of sync. Check system time and time zones.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$Container.launch(TezContainerLauncherImpl.java:171)
        at org.apache.tez.dag.app.launcher.TezContainerLauncherImpl$EventProcessor.run(TezContainerLauncherImpl.java:396)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1610203409423_0004_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1610203409423_0004_2_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:4, Vertex vertex_1610203409423_0004_2_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (state=08S01,code=2)

然后重新load数据到表里面,分桶成功

hive (default)> insert into table stu_buck select id, name from stu;

posted @ 2021-01-10 19:54  Leo-Wong  阅读(578)  评论(0编辑  收藏  举报