1、建立一个测试的目录
[root@localhost hadoop-1.1.1]# bin/hadoop dfs -mkdir /hadoop/input
2、建立测试文件
[root@localhost test]# vi test.txt hello hadoop hello World Hello Java Hey man i am a programmer
3、将测试文件放到测试目录中
[root@localhost hadoop-1.1.1]# bin/hadoop dfs -put ./test/test.txt /hadoop/input
4、执行wordcount程序
[root@localhost hadoop-1.1.1]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /hadoop/input/* /hadoop/output
/hadoop/output目录必须不存在,否则会报错:
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /hadoop/output already exists
因为Hadoop执行的是耗费资源的运算,产生的结果默认是不能被覆盖的。
执行成功的话,显示下面的信息:
[root@localhost hadoop-1.1.1]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /hadoop/input/* /hadoop/output 13/01/17 00:36:06 INFO input.FileInputFormat: Total input paths to process : 1 13/01/17 00:36:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/01/17 00:36:06 WARN snappy.LoadSnappy: Snappy native library not loaded 13/01/17 00:36:07 INFO mapred.JobClient: Running job: job_201301162205_0006 13/01/17 00:36:08 INFO mapred.JobClient: map 0% reduce 0% 13/01/17 00:36:14 INFO mapred.JobClient: map 100% reduce 0% 13/01/17 00:36:22 INFO mapred.JobClient: map 100% reduce 33% 13/01/17 00:36:24 INFO mapred.JobClient: map 100% reduce 100% 13/01/17 00:36:25 INFO mapred.JobClient: Job complete: job_201301162205_0006 13/01/17 00:36:25 INFO mapred.JobClient: Counters: 29 13/01/17 00:36:25 INFO mapred.JobClient: Job Counters 13/01/17 00:36:25 INFO mapred.JobClient: Launched reduce tasks=1 13/01/17 00:36:25 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6863 13/01/17 00:36:25 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/01/17 00:36:25 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/01/17 00:36:25 INFO mapred.JobClient: Launched map tasks=1 13/01/17 00:36:25 INFO mapred.JobClient: Data-local map tasks=1 13/01/17 00:36:25 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9207 13/01/17 00:36:25 INFO mapred.JobClient: File Output Format Counters 13/01/17 00:36:25 INFO mapred.JobClient: Bytes Written=78 13/01/17 00:36:25 INFO mapred.JobClient: FileSystemCounters 13/01/17 00:36:25 INFO mapred.JobClient: FILE_BYTES_READ=128 13/01/17 00:36:25 INFO mapred.JobClient: HDFS_BYTES_READ=170 13/01/17 00:36:25 INFO mapred.JobClient: FILE_BYTES_WRITTEN=48059 13/01/17 00:36:25 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=78 13/01/17 00:36:25 INFO mapred.JobClient: File Input Format Counters 13/01/17 00:36:25 INFO mapred.JobClient: Bytes Read=62 13/01/17 00:36:25 INFO mapred.JobClient: Map-Reduce Framework 13/01/17 00:36:25 INFO mapred.JobClient: Map output materialized bytes=128 13/01/17 00:36:25 INFO mapred.JobClient: Map input records=5 13/01/17 00:36:25 INFO mapred.JobClient: Reduce shuffle bytes=128 13/01/17 00:36:25 INFO mapred.JobClient: Spilled Records=22 13/01/17 00:36:25 INFO mapred.JobClient: Map output bytes=110 13/01/17 00:36:25 INFO mapred.JobClient: CPU time spent (ms)=1650 13/01/17 00:36:25 INFO mapred.JobClient: Total committed heap usage (bytes)=176492544 13/01/17 00:36:25 INFO mapred.JobClient: Combine input records=12 13/01/17 00:36:25 INFO mapred.JobClient: SPLIT_RAW_BYTES=108 13/01/17 00:36:25 INFO mapred.JobClient: Reduce input records=11 13/01/17 00:36:25 INFO mapred.JobClient: Reduce input groups=11 13/01/17 00:36:25 INFO mapred.JobClient: Combine output records=11 13/01/17 00:36:25 INFO mapred.JobClient: Physical memory (bytes) snapshot=180088832 13/01/17 00:36:25 INFO mapred.JobClient: Reduce output records=11 13/01/17 00:36:25 INFO mapred.JobClient: Virtual memory (bytes) snapshot=756244480 13/01/17 00:36:25 INFO mapred.JobClient: Map output records=12 [root@localhost hadoop-1.1.1]#
5、查看结果
wordcount程序统计目标文件中的单词个数,将结果输出到/hadoop/output/part-r-00000文件中
[root@localhost hadoop-1.1.1]# bin/hadoop dfs -ls /hadoop/output Found 3 items -rw-r--r-- 1 root supergroup 0 2013-01-17 00:36 /hadoop/output/_SUCCESS drwxr-xr-x - root supergroup 0 2013-01-17 00:36 /hadoop/output/_logs -rw-r--r-- 1 root supergroup 78 2013-01-17 00:36 /hadoop/output/part-r-00000 [root@localhost hadoop-1.1.1]#
[root@localhost hadoop-1.1.1]# bin/hadoop dfs -cat /hadoop/output/part-r-00000 Hello 1 Hey 1 Java 1 World 1 a 1 am 1 hadoop 1 hello 2 i 1 man 1 programmer 1 [root@localhost hadoop-1.1.1]#