zookeeper崩溃后修复

可能出现该问题的情况:强制关机,数据量过大,集群意外关闭。

使用cloudera搭建hadoop集群,由于使用ubuntu系统,根目录空间分配不足,导致数据录入一部分,集群崩溃,后来对ubuntu系统的根目录进行设置,扩大了根目录的空间,但是zookeeper中一台机器的节点一直无法启动。


错误日志如下:

2015-12-29 15:50:43,900 INFO org.apache.zookeeper.server.persistence.FileSnap: Reading snapshot /var/lib/zookeeper/version-2/snapshot.1300000000
2015-12-29 15:50:43,932 ERROR org.apache.zookeeper.server.persistence.Util: Last transaction was partial.
2015-12-29 15:50:43,932 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: Unable to load database on disk
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:167)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:156)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
2015-12-29 15:50:43,942 ERROR org.apache.zookeeper.server.quorum.QuorumPeerMain: Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:156)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:167)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
... 4 more


从网上搜索到方法进行解决:

cat /etc/zookeeper/conf/zoo.cfg

找到dataDir=/var/lib/zookeeper

切换到路径/var/lib/zookeeper

cd /var/lib/zookeeper

查看目录下的文件:

ls

存在version-2

删除version-2

mv ./version-2 ./version-2.bak

然后在zookeeper的实例中添加角色主机,启动成功。


参考:zookeeper无法启动"Unable to load database on disk"

自己的虚拟机集群,一次强制关机后,发现slave2的zookeeper起不来了。

下午5点29:53.411   INFO    org.apache.zookeeper.server.quorum.QuorumPeerConfig
Reading configuration from: /var/run/cloudera-scm-agent/process/517-zookeeper-server/zoo.cfg
下午5点29:53.420   INFO    org.apache.zookeeper.server.quorum.QuorumPeerConfig
Defaulting to majority quorums
下午5点29:53.423   INFO    org.apache.zookeeper.server.DatadirCleanupManager  
autopurge.snapRetainCount set to 5
下午5点29:53.424   INFO    org.apache.zookeeper.server.DatadirCleanupManager  
autopurge.purgeInterval set to 24
下午5点29:53.430   INFO    org.apache.zookeeper.server.DatadirCleanupManager  
Purge task started.
下午5点29:53.434   ERROR   org.apache.zookeeper.server.DatadirCleanupManager  
Error occured while purging.
org.apache.zookeeper.server.persistence.FileTxnSnapLog$DatadirException: Missing data directory /var/lib/zookeeper/version-2, automatic data directory creation is disabled (zookeeper.datadir.autocreate is false). Please create this directory manually.
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:102)
at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
at org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)

Removing data from /var/zookeeper/version-2 then restart seems to “fix” the problem (it gets a snapshot from one of the other nodes in the quorum).
This is Zookeeper 3.3.5+19.5-1~squeeze-cdh3, i.e. from Cloudera’s distribution.

看了老外的文章,下面是处理方法:

more /etc/zookeeper/conf.dist/zoo.cfg
找到datadir
[root@slave2 zookeeper]# pwd
/var/lib/zookeeper
[root@slave2 zookeeper]# ls
myid version-2 version-2.bak
清空version-2目录下的所有文件

posted on 2016-03-19 13:44  1130136248  阅读(752)  评论(0编辑  收藏  举报

导航