hive中的bucket table
前言
bucket table(桶表)是对数据进行哈希取值,然后放到不同文件中存储
应用场景
当数据量比较大,我们需要更快的完成任务,多个map和reduce进程是唯一的选择。
但是如果输入文件是一个的话,map任务只能启动一个。
此时bucket table是个很好的选择,通过指定CLUSTERED的字段,将文件通过hash打散成多个小文件。
create table test (id int, name string ) CLUSTERED BY(id) SORTED BY(name) INTO 32 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘/t’;
执行insert前不要忘记设置
set hive.enforce.bucketing = true;
强制采用多个reduce进行输出
hive> INSERT OVERWRITE TABLE test select * from test09; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 32 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201103070826_0018, Tracking URL = http://hadoop00:50030/jobdetails.jsp?jobid=job_201103070826_0018 Kill Command = /home/hjl/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hadoop00:9001 -kill job_201103070826_0018 2011-03-08 11:34:23,055 Stage-1 map = 0%, reduce = 0% 2011-03-08 11:34:27,084 Stage-1 map = 6%, reduce = 0% ************************************************* Ended Job = job_201103070826_0018 Loading data to table test 5 Rows loaded to test OK Time taken: 175.036 seconds
hive的sunwg_test11文件夹下面出现了32个文件,而不是一个文件
[hadoop@hadoop00 ~]$ hadoop fs -ls /ticketdev/test Found 32 items -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000000_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000001_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000002_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000003_0 -rw-r–r– 3 ticketdev hadoop 8 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000004_0 -rw-r–r– 3 ticketdev hadoop 9 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000005_0 -rw-r–r– 3 ticketdev hadoop 8 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000006_0 -rw-r–r– 3 ticketdev hadoop 9 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000007_0 -rw-r–r– 3 ticketdev hadoop 9 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000008_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000009_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000010_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000011_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000012_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000013_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000014_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000015_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000016_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000017_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000018_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000019_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000020_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000021_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000022_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000023_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000024_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000025_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000026_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000027_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000028_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000029_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000030_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000031_0
文件被打散后,可以启动多个mapreduce task
当执行一些操作的时候,你会发现系统启动了32个map任务