hadoop的血泪史 (转)
Hadoop配置问题
启动hadoop后,50070页面显示正常,Live Nodes:0 DFS Used:100%
namenode log显示如下错误:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-XXX/dfs/data: namenode namespaceID = 1009927204; datanode namespaceID = 785353449 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:385) at org.apache.hadoop.hdfs.server.datanode.DataNode.
解决办法:
错误提示namespaceIDs不一致,(http://blog.csdn.net/wh62592855/article/details/5752199)给出两种解决办法。具体可参考http://yaoyinjie.blog.51cto.com/3189782/819156
集群配置: 启动hadoop后,50070页面显示正常,Live Nodes:0 DFS Used:100%
namenode log显示如下错误:
WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
解决办法:
/etc/hosts配置问题,具体可参考http://www.liufofu.com/201209687.html
Hbase配置问题
启动hbase后,jps,HMaster出现后消失,60010页面无法链接,HRegionServer无法用stop关闭
hbase log 显示如下错误:
ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: HMaster Aborted
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1944)
解决办法:
低级错误——hbase-site.xml配置文件中切记要对应namenode运行的端口号
<property>
<name>hbase.rootdir</name>
<value>hdfs://namenode:9000/hbase</value>
</property>
直到60010页面出现正常,自己都吧自己感动了,附图纪念一下!
事实上该也卖弄正常运行了,但hbase log中仍然存在未知错误,错误显示为:
INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server namenode/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
不知道该unknown error是否会有影响。以下链接给出相关解答,但本人照做后仍出现该错误提示。
http://www.cnblogs.com/wukenaihe/archive/2013/03/15/2961029.html
上述的unknown error的确存在问题
后来建立分布式集群,Eclipse运行程序会出现以下错误
13/04/16 10:52:14 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
13/04/16 10:52:14 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
13/04/16 10:52:14 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
可能是程序中默认zookeeper的地址是127.0.0.1 故在程序中设置
config.set("hbase.zookeeper.quorum", "zookeeper IP");
最后运行成功!