hadoop遇到问题总结
问题一
# hadoop fs -ls时出现错误如下:
# hadoop fs -ls
11/08/31 22:51:39 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
Bad connection to FS. command aborted.
解决方案:
1. 格式化namenode:
# hadoop namenode -format
2. 重新启动hadoop
# sh stop-all.sh
# sh start-all.sh
3. 查看后台进程
# jps
13508 NameNode
11008 SecondaryNameNode
14393 Jps
11096 JobTracker
此时namenode启动
4. 运行
# hadoop fs -ls
12/01/31 14:04:39 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/01/31 14:04:39 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Found 1 items
drwxr-xr-x - root supergroup 0 2012-01-31 13:57 /user/root/test
问题二
# hadoop fs -put ../conf input 时出现错误如下:
12/01/31 16:01:25 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/01/31 16:01:25 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
12/01/31 16:01:26 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: File /user/root/input/ssl-server.xml.example could only be replicated to 0 nodes, instead of 1
put: File /user/root/input/ssl-server.xml.example could only be replicated to 0 nodes, instead of 1
12/01/31 16:01:26 ERROR hdfs.DFSClient: Exception closing file /user/root/input/ssl-server.xml.example : java.io.IOException: File /user/root/input/ssl-server.xml.example could only be replicated to 0 nodes, instead of 1
解决方案:
这个问题是由于没有添加节点的原因,也就是说需要先启动namenode,再启动datanode,然后启动jobtracker和tasktracker。这样就不会存在这个问题了。 目前解决办法是分别启动节点#hadoop-daemon.sh start namenode #$hadoop-daemon.sh start datanode
1. 重新启动namenode
# hadoop-daemon.sh stop namenode
stopping namenode
# hadoop-daemon.sh start namenode
starting namenode, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-namenode-www.keli.com.out
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
2. 重新启动datanode
# hadoop-daemon.sh stop datanode
stopping datanode
# hadoop-daemon.sh start datanode
starting datanode, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-datanode-www.keli.com.out
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
3. 切换到hadoop的bin目录
# cd /usr/hadoop-0.21.0/bin/
4. 浏览hdfs目录
[root@www bin]# hadoop fs -ls
12/01/31 16:09:45 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/01/31 16:09:45 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Found 4 items
drwxr-xr-x - root supergroup 0 2012-01-31 16:01 /user/root/input
drwxr-xr-x - root supergroup 0 2012-01-31 15:24 /user/root/test
-rw-r--r-- 1 root supergroup 0 2012-01-31 14:37 /user/root/test-in
drwxr-xr-x - root supergroup 0 2012-01-31 14:32 /user/root/test1
5. 删除hdfs中的input目录
[root@www bin]# hadoop fs -rmr input
12/01/31 16:10:09 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/01/31 16:10:09 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Deleted hdfs://m106:9000/user/root/input
6. 上传数据到hdfs中的input目录
[root@www bin]# hadoop fs -put ../conf input
12/01/31 16:10:14 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/01/31 16:10:14 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
7. 浏览input目录,检查已上传的数据
[root@www bin]# hadoop fs -ls input
12/01/31 16:10:21 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/01/31 16:10:21 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Found 16 items
-rw-r--r-- 1 root supergroup 3426 2012-01-31 16:10 /user/root/input/capacity-scheduler.xml
-rw-r--r-- 1 root supergroup 1335 2012-01-31 16:10 /user/root/input/configuration.xsl
-rw-r--r-- 1 root supergroup 757 2012-01-31 16:10 /user/root/input/core-site.xml
-rw-r--r-- 1 root supergroup 321 2012-01-31 16:10 /user/root/input/fair-scheduler.xml
-rw-r--r-- 1 root supergroup 2237 2012-01-31 16:10 /user/root/input/hadoop-env.sh
-rw-r--r-- 1 root supergroup 1650 2012-01-31 16:10 /user/root/input/hadoop-metrics.properties
-rw-r--r-- 1 root supergroup 4644 2012-01-31 16:10 /user/root/input/hadoop-policy.xml
-rw-r--r-- 1 root supergroup 252 2012-01-31 16:10 /user/root/input/hdfs-site.xml
-rw-r--r-- 1 root supergroup 4141 2012-01-31 16:10 /user/root/input/log4j.properties
-rw-r--r-- 1 root supergroup 2997 2012-01-31 16:10 /user/root/input/mapred-queues.xml
-rw-r--r-- 1 root supergroup 430 2012-01-31 16:10 /user/root/input/mapred-site.xml
-rw-r--r-- 1 root supergroup 25 2012-01-31 16:10 /user/root/input/masters
-rw-r--r-- 1 root supergroup 26 2012-01-31 16:10 /user/root/input/slaves
-rw-r--r-- 1 root supergroup 1243 2012-01-31 16:10 /user/root/input/ssl-client.xml.example
-rw-r--r-- 1 root supergroup 1195 2012-01-31 16:10 /user/root/input/ssl-server.xml.example
-rw-r--r-- 1 root supergroup 250 2012-01-31 16:10 /user/root/input/taskcontroller.cfg
[root@www bin]#
问题三
Hadoop启动datanode时出现Unrecognized option: -jvm 和 Could not create the Java virtual machine.
[root@www bin]# hadoop-daemon.sh start datanode
starting datanode, logging to /usr/hadoop-0.20.203.0/bin/../logs/hadoop-root-datanode-www.keli.com.out
Unrecognized option: -jvm
Could not create the Java virtual machine.
解决办法:
在hadoop安装目录/bin/hadoop中有如下一段shell:
CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode' if [[ $EUID -eq 0 ]]; then HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS" else HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS" fi |
其中的
if [[ $EUID -eq 0 ]]; then
HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
如果 $EUID 为 0,什么意思呢?
有效用户标识号(EUID):该标识号负责标识以什么用户身份来给新创建的进程赋所有权、检查文件的存取权限和检查通过系统调用kill向进程发送软中断信号的许可权限。
在root用户下echo $EUID,echo结果为 0。
ok,在root下会有-jvm选项添加上去,上面说的Unrecognized option: -jvm难道就是这里产生的。
两个想法。一个想法是自己改了这shell代码,去掉里面的-jvm。另外一个想法是既然这里要求 $EUID -eq 0,那别用$EUID不为0的(root用户)用户运行即可。果断试试,换上普通用户根据文档提示做。ok,成功。好奇的再试试第一个想法,其实暂时还是不太想动源码。但是这shell动动也没妨,果断去掉上面的-jvm,直接把上面的if else 结构直接去掉改为
HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS",
同样运行成功。
问题四
[root@www bin]# jps
3283 NameNode
2791 SecondaryNameNode
2856 JobTracker
3348 Jps
hadoop没有启动datanode
解决办法:
format之后之前的datanode会有一个ID,这个ID没有删除,所以会拒绝当前Namenode链接和分配。所以需要删除原来的datanode中的hdfs目录。
[root@freepp ~]# rm -rf /hadoopdata/
重启hadoop
[root@www bin]# jps
4132 Jps
3907 NameNode
4056 DataNode
2791 SecondaryNameNode
2856 JobTracker