hive的测试
比如两个文件导入的分割符号为, 导入hive的文件 1365,fei,bai 1377,bai,360 1300,fei,360 1388,fei,360 13899,,fei 13899,fei,368 那么导入到hive里的数据 create table test3(a string,b string,c string) row format delimited fields terminated by ',' stored as textfile; LOAD DATA LOCAL INPATH '/home/mmm/fzm/1.txt' OVERWRITE INTO TABLE test3; hive> select * from test3; OK 1365 fei bai 1377 bai 360 1300 fei 360 1388 fei 360 13899 fei 13899 fei 368 在hive 里要用=''和<>''来代替 oracle里的 is null 或者is not null select * from test3 where b=''; 和 select * from test3 where b<>'' 而且select a,min(b) from test3 group by a 里取出来的是字段是空的那行,这一点和oracle不相同,oracle如果多个字段聚合,会取出不是null的那行 hive> select * from test3; OK 1365 fei bai 1377 bai 360 1300 fei 360 1388 fei 360 13899 fei 13899 fei 368 Time taken: 2.157 seconds hive> select * from test3 where b=''; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201404230946_0174, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0174 Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201404230946_0174 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-05-29 10:08:21,673 Stage-1 map = 0%, reduce = 0% 2014-05-29 10:08:27,772 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.85 sec 2014-05-29 10:08:28,812 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.85 sec 2014-05-29 10:08:29,838 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.85 sec 2014-05-29 10:08:30,862 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.85 sec 2014-05-29 10:08:31,902 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.85 sec MapReduce Total cumulative CPU time: 1 seconds 850 msec Ended Job = job_201404230946_0174 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.85 sec HDFS Read: 309 HDFS Write: 11 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 850 msec OK 13899 fei Time taken: 16.898 seconds hive> select * from test3 where b<>''; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201404230946_0175, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0175 Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201404230946_0175 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-05-29 10:09:49,533 Stage-1 map = 0%, reduce = 0% 2014-05-29 10:09:55,600 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.08 sec 2014-05-29 10:09:56,625 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.08 sec 2014-05-29 10:09:57,649 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.08 sec 2014-05-29 10:09:58,674 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.08 sec MapReduce Total cumulative CPU time: 2 seconds 80 msec Ended Job = job_201404230946_0175 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 2.08 sec HDFS Read: 309 HDFS Write: 66 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 80 msec OK 1365 fei bai 1377 bai 360 1300 fei 360 1388 fei 360 13899 fei 368 Time taken: 14.569 seconds hive> select a,max(b) from test3 group by a; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201404230946_0176, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0176 Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201404230946_0176 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-05-29 10:13:15,159 Stage-1 map = 0%, reduce = 0% 2014-05-29 10:13:21,222 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.12 sec 2014-05-29 10:13:22,245 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.12 sec 2014-05-29 10:13:23,269 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.12 sec 2014-05-29 10:13:24,293 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.12 sec 2014-05-29 10:13:25,317 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.12 sec 2014-05-29 10:13:26,341 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.12 sec 2014-05-29 10:13:27,381 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.12 sec 2014-05-29 10:13:28,404 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.8 sec 2014-05-29 10:13:29,430 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.8 sec 2014-05-29 10:13:30,455 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.8 sec MapReduce Total cumulative CPU time: 4 seconds 800 msec Ended Job = job_201404230946_0176 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.8 sec HDFS Read: 309 HDFS Write: 46 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 800 msec OK 1300 fei 1365 fei 1377 bai 1388 fei 13899 fei Time taken: 20.759 seconds hive> select a,min(b) from test3 group by a; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201404230946_0177, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0177 Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201404230946_0177 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-05-29 10:13:47,975 Stage-1 map = 0%, reduce = 0% 2014-05-29 10:13:53,024 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec 2014-05-29 10:13:54,045 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec 2014-05-29 10:13:55,066 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec 2014-05-29 10:13:56,086 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec 2014-05-29 10:13:57,108 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.01 sec 2014-05-29 10:13:58,129 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.71 sec 2014-05-29 10:13:59,165 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.71 sec 2014-05-29 10:14:00,186 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.71 sec 2014-05-29 10:14:01,208 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.71 sec MapReduce Total cumulative CPU time: 4 seconds 710 msec Ended Job = job_201404230946_0177 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.71 sec HDFS Read: 309 HDFS Write: 43 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 710 msec OK 1300 fei 1365 fei 1377 bai 1388 fei 13899 Time taken: 18.55 seconds hive> select * from test3; OK 1365 fei bai 1377 bai 360 1300 fei 360 1388 fei 360 13899 fei 13899 fei 368 select a, count(b) from test3 group by a; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201404230946_0185, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0185 Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201404230946_0185 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-05-29 10:39:27,598 Stage-1 map = 0%, reduce = 0% 2014-05-29 10:39:32,650 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.91 sec 2014-05-29 10:39:33,690 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.91 sec 2014-05-29 10:39:34,713 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.91 sec 2014-05-29 10:39:35,737 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.91 sec 2014-05-29 10:39:36,760 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.91 sec 2014-05-29 10:39:37,783 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.39 sec 2014-05-29 10:39:38,823 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.39 sec 2014-05-29 10:39:39,849 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.39 sec 2014-05-29 10:39:40,874 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.39 sec MapReduce Total cumulative CPU time: 4 seconds 390 msec Ended Job = job_201404230946_0185 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.39 sec HDFS Read: 309 HDFS Write: 36 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 390 msec OK 1300 1 1365 1 1377 1 1388 1 13899 2 select a, count(distinct b) from test3 group by a; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201404230946_0186, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0186 Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201404230946_0186 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-05-29 10:40:14,211 Stage-1 map = 0%, reduce = 0% 2014-05-29 10:40:21,275 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.24 sec 2014-05-29 10:40:22,297 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.24 sec 2014-05-29 10:40:23,320 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.24 sec 2014-05-29 10:40:24,340 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.24 sec 2014-05-29 10:40:25,360 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.24 sec 2014-05-29 10:40:26,382 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.24 sec 2014-05-29 10:40:27,419 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.24 sec 2014-05-29 10:40:28,440 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.35 sec 2014-05-29 10:40:29,463 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.35 sec 2014-05-29 10:40:30,485 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.35 sec 2014-05-29 10:40:31,506 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.35 sec MapReduce Total cumulative CPU time: 5 seconds 350 msec Ended Job = job_201404230946_0186 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 5.35 sec HDFS Read: 309 HDFS Write: 36 SUCCESS Total MapReduce CPU Time Spent: 5 seconds 350 msec OK 1300 1 1365 1 1377 1 1388 1 13899 2 说明你不管distinct 没有 在hive里 空都算一个的 这点和oracle不一样