hive的测试

比如两个文件导入的分割符号为,
导入hive的文件
1365,fei,bai
1377,bai,360
1300,fei,360
1388,fei,360
13899,,fei
13899,fei,368
那么导入到hive里的数据
create table test3(a string,b string,c string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA LOCAL INPATH '/home/mmm/fzm/1.txt' OVERWRITE INTO TABLE test3;

hive> select  * from test3;
OK
1365    fei     bai
1377    bai     360
1300    fei     360
1388    fei     360
13899           fei
13899   fei     368
在hive 里要用=''和<>''来代替 oracle里的 is null  或者is not null  

select  * from test3 where b='';
和  select  * from test3 where b<>''

而且select  a,min(b) from test3 group by a 里取出来的是字段是空的那行,这一点和oracle不相同,oracle如果多个字段聚合,会取出不是null的那行

hive> select  * from test3;
OK
1365    fei     bai
1377    bai     360
1300    fei     360
1388    fei     360
13899           fei
13899   fei     368
Time taken: 2.157 seconds
hive> select  * from test3 where b='';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404230946_0174, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0174
Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job  -kill job_201404230946_0174
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-05-29 10:08:21,673 Stage-1 map = 0%,  reduce = 0%
2014-05-29 10:08:27,772 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.85 sec
2014-05-29 10:08:28,812 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.85 sec
2014-05-29 10:08:29,838 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.85 sec
2014-05-29 10:08:30,862 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.85 sec
2014-05-29 10:08:31,902 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.85 sec
MapReduce Total cumulative CPU time: 1 seconds 850 msec
Ended Job = job_201404230946_0174
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 1.85 sec   HDFS Read: 309 HDFS Write: 11 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 850 msec
OK
13899           fei
Time taken: 16.898 seconds
hive> select  * from test3 where b<>'';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404230946_0175, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0175
Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job  -kill job_201404230946_0175
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-05-29 10:09:49,533 Stage-1 map = 0%,  reduce = 0%
2014-05-29 10:09:55,600 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.08 sec
2014-05-29 10:09:56,625 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.08 sec
2014-05-29 10:09:57,649 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.08 sec
2014-05-29 10:09:58,674 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.08 sec
MapReduce Total cumulative CPU time: 2 seconds 80 msec
Ended Job = job_201404230946_0175
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 2.08 sec   HDFS Read: 309 HDFS Write: 66 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 80 msec
OK
1365    fei     bai
1377    bai     360
1300    fei     360
1388    fei     360
13899   fei     368
Time taken: 14.569 seconds
hive> select  a,max(b) from test3 group by a;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201404230946_0176, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0176
Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job  -kill job_201404230946_0176
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-05-29 10:13:15,159 Stage-1 map = 0%,  reduce = 0%
2014-05-29 10:13:21,222 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.12 sec
2014-05-29 10:13:22,245 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.12 sec
2014-05-29 10:13:23,269 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.12 sec
2014-05-29 10:13:24,293 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.12 sec
2014-05-29 10:13:25,317 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.12 sec
2014-05-29 10:13:26,341 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.12 sec
2014-05-29 10:13:27,381 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.12 sec
2014-05-29 10:13:28,404 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.8 sec
2014-05-29 10:13:29,430 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.8 sec
2014-05-29 10:13:30,455 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.8 sec
MapReduce Total cumulative CPU time: 4 seconds 800 msec
Ended Job = job_201404230946_0176
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 4.8 sec   HDFS Read: 309 HDFS Write: 46 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 800 msec
OK
1300    fei
1365    fei
1377    bai
1388    fei
13899   fei
Time taken: 20.759 seconds
hive> select  a,min(b) from test3 group by a;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201404230946_0177, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0177
Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job  -kill job_201404230946_0177
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-05-29 10:13:47,975 Stage-1 map = 0%,  reduce = 0%
2014-05-29 10:13:53,024 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2014-05-29 10:13:54,045 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2014-05-29 10:13:55,066 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2014-05-29 10:13:56,086 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2014-05-29 10:13:57,108 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
2014-05-29 10:13:58,129 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.71 sec
2014-05-29 10:13:59,165 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.71 sec
2014-05-29 10:14:00,186 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.71 sec
2014-05-29 10:14:01,208 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.71 sec
MapReduce Total cumulative CPU time: 4 seconds 710 msec
Ended Job = job_201404230946_0177
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 4.71 sec   HDFS Read: 309 HDFS Write: 43 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 710 msec
OK
1300    fei
1365    fei
1377    bai
1388    fei
13899
Time taken: 18.55 seconds



hive> select  * from test3;
OK
1365    fei     bai
1377    bai     360
1300    fei     360
1388    fei     360
13899           fei
13899   fei     368

select a, count(b) from test3 group by a;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201404230946_0185, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0185
Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job  -kill job_201404230946_0185
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-05-29 10:39:27,598 Stage-1 map = 0%,  reduce = 0%
2014-05-29 10:39:32,650 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.91 sec
2014-05-29 10:39:33,690 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.91 sec
2014-05-29 10:39:34,713 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.91 sec
2014-05-29 10:39:35,737 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.91 sec
2014-05-29 10:39:36,760 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.91 sec
2014-05-29 10:39:37,783 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.39 sec
2014-05-29 10:39:38,823 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.39 sec
2014-05-29 10:39:39,849 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.39 sec
2014-05-29 10:39:40,874 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.39 sec
MapReduce Total cumulative CPU time: 4 seconds 390 msec
Ended Job = job_201404230946_0185
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 4.39 sec   HDFS Read: 309 HDFS Write: 36 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 390 msec
OK
1300    1
1365    1
1377    1
1388    1
13899   2

select a, count(distinct b) from test3 group by a;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201404230946_0186, Tracking URL = http://BJ-YZ-103R-63-33:50030/jobdetails.jsp?jobid=job_201404230946_0186
Kill Command = /apps/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job  -kill job_201404230946_0186
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-05-29 10:40:14,211 Stage-1 map = 0%,  reduce = 0%
2014-05-29 10:40:21,275 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.24 sec
2014-05-29 10:40:22,297 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.24 sec
2014-05-29 10:40:23,320 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.24 sec
2014-05-29 10:40:24,340 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.24 sec
2014-05-29 10:40:25,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.24 sec
2014-05-29 10:40:26,382 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.24 sec
2014-05-29 10:40:27,419 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.24 sec
2014-05-29 10:40:28,440 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.35 sec
2014-05-29 10:40:29,463 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.35 sec
2014-05-29 10:40:30,485 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.35 sec
2014-05-29 10:40:31,506 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.35 sec
MapReduce Total cumulative CPU time: 5 seconds 350 msec
Ended Job = job_201404230946_0186
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 5.35 sec   HDFS Read: 309 HDFS Write: 36 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 350 msec
OK
1300    1
1365    1
1377    1
1388    1
13899   2

说明你不管distinct 没有  在hive里 空都算一个的 这点和oracle不一样

 

posted @ 2017-10-19 08:49  沙漠里的小鱼  阅读(799)  评论(0编辑  收藏  举报