上次的程序只是操作文件系统,本次运行一个真正的MapReduce程序。
运行的是官方提供的例子程序wordcount,这个例子类似其他程序的hello world。
1. 首先确认启动的正常:运行 start-all.sh
2. 执行jps命令检查:NameNode,DateNode,SecondaryNameNode,ResourceManager,NodeManager是否已经启动正常。这里我遇到了一个问题,NodeManager没有正常启动。错误信息如下:
2014-01-07 13:46:21,442 FATAL org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Failed to initialize mapreduce.shuffle java.lang.IllegalArgumentException: The ServiceName: mapreduce.shuffle set in yarn.nodemanager.aux-services is invalid.The valid service name should only contain a-zA-Z0-9_ and can not start with numbers at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:98) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:218) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:188) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:338) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:386)
经过检查,是配置文件中有点错误,请修改yarn-site.xml文件,更改为如下内容(原因不明)
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
3. 准备数据:在hadoop文件系统中增加input/file1.txt和input/file2.txt
[root@dbserver mapreduce]# hadoop fs -ls /input Found 2 items -rw-r--r-- 1 root supergroup 12 2013-12-06 16:22 /input/file1.txt -rw-r--r-- 1 root supergroup 13 2013-12-06 16:22 /input/file2.txt
[root@dbserver mapreduce]# hadoop fs -cat /input/file1.txt Hello World
[root@dbserver mapreduce]# hadoop fs -cat /input/file2.txt Hello Hadoop
4. 例子程序的位置在:/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
hadoop jar ./hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output
画面输出内容
14/01/07 14:00:37 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032 14/01/07 14:00:38 INFO input.FileInputFormat: Total input paths to process : 2 14/01/07 14:00:38 INFO mapreduce.JobSubmitter: number of splits:2 14/01/07 14:00:38 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 14/01/07 14:00:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/01/07 14:00:38 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 14/01/07 14:00:38 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class 14/01/07 14:00:38 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 14/01/07 14:00:38 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 14/01/07 14:00:38 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 14/01/07 14:00:38 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 14/01/07 14:00:38 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/01/07 14:00:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/01/07 14:00:38 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 14/01/07 14:00:38 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 14/01/07 14:00:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1389074273046_0001 14/01/07 14:00:38 INFO impl.YarnClientImpl: Submitted application application_1389074273046_0001 to ResourceManager at localhost/127.0.0.1:8032 14/01/07 14:00:38 INFO mapreduce.Job: The url to track the job: http://dbserver:8088/proxy/application_1389074273046_0001/ 14/01/07 14:00:38 INFO mapreduce.Job: Running job: job_1389074273046_0001 14/01/07 14:00:48 INFO mapreduce.Job: Job job_1389074273046_0001 running in uber mode : false 14/01/07 14:00:48 INFO mapreduce.Job: map 0% reduce 0% 14/01/07 14:00:58 INFO mapreduce.Job: map 100% reduce 0% 14/01/07 14:01:04 INFO mapreduce.Job: map 100% reduce 100% 14/01/07 14:01:05 INFO mapreduce.Job: Job job_1389074273046_0001 completed successfully 14/01/07 14:01:05 INFO mapreduce.Job: Counters: 43 File System Counters FILE: Number of bytes read=55 FILE: Number of bytes written=236870 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=229 HDFS: Number of bytes written=25 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=15178 Total time spent by all reduces in occupied slots (ms)=4384 Map-Reduce Framework Map input records=2 Map output records=4 Map output bytes=41 Map output materialized bytes=61 Input split bytes=204 Combine input records=4 Combine output records=4 Reduce input groups=3 Reduce shuffle bytes=61 Reduce input records=4 Reduce output records=3 Spilled Records=8 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=108 CPU time spent (ms)=2200 Physical memory (bytes) snapshot=568229888 Virtual memory (bytes) snapshot=2566582272 Total committed heap usage (bytes)=392298496 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=25 File Output Format Counters Bytes Written=25
5. 查看运行结果:
[root@dbserver mapreduce]# hadoop fs -ls /output Found 2 items -rw-r--r-- 1 root supergroup 0 2014-01-07 14:01 /output/_SUCCESS -rw-r--r-- 1 root supergroup 25 2014-01-07 14:01 /output/part-r-00000 [root@dbserver mapreduce]# hadoop fs -cat /output/part-r-00000 Hadoop 1 Hello 2 World 1