ZooKeeper Service Start Failure
Problem
Zookeeper down or can not start zk when installation accompanying no zk log directory generated.
Diagnosis
The real issue is ZK crashing, check the ZK log we can see:
2022-04-26 07:29:31,101 [myid:1] - ERROR [main:ZooKeeperServerMain@90] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.base/java.io.DataInputStream.readInt(DataInputStream.java:397)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:67)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:725)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:743)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:711)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:792)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:352)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:258)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:303)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:494)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:665)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:758)
at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:130)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:159)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:112)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:67)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:140)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
2022-04-26 07:29:31,104 [myid:1] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 1
Workaround
List the workaround method that the customer can use to get away from the blocker before final solution releases.
Solution
This is due to the ZK disk file being corrupted due to out of disk. When ZK starts, it will try to read back the disk file and crashes due to the file size being zero. We can check that by checking the ZK data folder on all ZK nodes:
ls ~/tigergraph/data/zk/version-2/ -lth
1.Remove the 0 byte log file and start the zk. 2.Use sudo ./install.sh if it happed when install TG 3.
sudo ./install.sh
gadmin init cluster --with-config ~/.tg.cfg