riverphoenix

导航

 

 

以下记录的是自己的hadoop集群遇到的问题及解决方法:

 

1、文件权限设置的问题:

 

错误提示:

 

[hadoop@sc706-26 bin]$ start-all.sh
starting namenode, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-sc706-26.out
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-sc706-26.log (Permission denied)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:177)
at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:290)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:164)
at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:216)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:257)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:133)
192.168.153.91: starting datanode, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-sc706-28.out
192.168.153.92: starting datanode, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-sc706-29.out
192.168.153.90: starting datanode, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-sc706-27.out
192.168.153.89: starting secondarynamenode, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-sc706-26.out
starting jobtracker, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jobtracker-sc706-26.out
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jobtracker-sc706-26.log (Permission denied)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:177)
at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:290)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:164)
at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:216)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:257)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:133)
192.168.153.91: starting tasktracker, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-sc706-28.out
192.168.153.92: starting tasktracker, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-sc706-29.out
192.168.153.90: starting tasktracker, logging to /home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-sc706-27.out
解决方法:修改错误中提示的不能找到的logs里文件的权限,之后重启hadoop集群

 

[root@sc706-26 logs]# chmod 777 hadoop-hadoop-namenode-sc706-26.log
[root@sc706-26 logs]# chmod 777 hadoop-hadoop-jobtracker-sc706-26.out.5
[root@sc706-26 logs]# chmod 777 hadoop-hadoop-jobtracker-sc706-26.log
2、namenode格式化问题:
 错误提示:

 

hadoop dfsadmin -report
11/07/15 10:35:15 INFO ipc.Client: Retrying connect to server: sc706-26/192.168.153.89:9000. Already tried 0 time(s)
解决方法:

 

重新格式化namenode,注意:格式化之前必须删除namenode及datanode上的name、data、tmp文件,防止格式化造成的namespaceID不一致(这是一个很经典的错误,以下就是namespaceID不一致造成错误的一个datanode的日志文件),不过格式化后hdfs上的数据会丢失。

 

2010-06-25 10:04:57,812 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = sc706-29/192.168.153.92
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop ... ranch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2010-06-25 10:05:00,089 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/hadoop/data: namenode namespaceID = 1214734841; datanode namespaceID = 1600742075
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)

 

3、ssh的问题:
hadoop集群启动时出现连接超时即连接不上datanode时,可能会是ssh无密码登陆的问题,ssh登陆datanode试试,如不要密码就能成功登陆,证明ssh配置正确,如不能,重新设置ssh,在datanode上追加完id_dsa.pub文件后,注意:要修改NameNode和DataNode上的.ssh和authorized_keys的权限,chmod命令,参数755,完成后测试下即可。

 

 
posted on 2012-05-21 11:22  riverphoenix  阅读(1047)  评论(0编辑  收藏  举报