hadoop yarn 实战错误汇总
1.hadoop yarn 运行wordcount时执行完成,但是返回错误
错误信息如下:
15/09/05 03:48:02 INFO mapreduce.Job: Job job_1441395011668_0001 failed with state FAILED due to: Application application_1441395011668_0001 failed 2 times due to AM Container for appattempt_1441395011668_0001_000002 exited with exitCode: 1 For more detailed output, check application tracking page:http://macmaster.hadoop:8088/proxy/application_1441395011668_0001/AThen, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1441395011668_0001_02_000001 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 Failing this attempt. Failing the application. 15/09/05 03:48:02 INFO mapreduce.Job: Counters: 0
有可能是mapreduce.jobhistory.address没有配置,因为yarn要读取jobhistory信息来获取是否执行成功,可以修改yarn-site.xml如下:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>macmaster.hadoop:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>macmaster.hadoop:19888</value> </property> </configuration>
2.60000 millis timeout while waiting for channel to be ready for read. ch
有可能是读写等待超时引起的错误,我是执行randomtextwriter和randomwriter时引起的,由于CPU和内存性能较差,并且计算数据量较大,引起了读取hdfs时很慢导致超时,可以添加hdfs-site.xml如下:
<property> <name>dfs.datanode.socket.write.timeout</name> <value>600000</value> #其中默认为60000 </property> <property> <name>dfs.socket.timeout</name> <value>600000</value> #其中默认为60000
</property>