ZooKeeper Service Start Failure

Problem

Zookeeper down or can not start zk when installation accompanying no zk log directory generated.

Error message

Error Code:

ZooKeeper service start failure due to ZK#1:check_ready timeout

Diagnosis

The real issue is ZK crashing, check the ZK log we can see:

2022-04-26 07:29:31,101 [myid:1] - ERROR [main:ZooKeeperServerMain@90] - Unexpected exception, exiting abnormally
java.io.EOFException
    at java.base/java.io.DataInputStream.readInt(DataInputStream.java:397)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:67)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:725)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:743)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:711)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:792)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:352)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:258)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:303)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:494)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:665)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:758)
    at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:130)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:159)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:112)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:67)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:140)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
2022-04-26 07:29:31,104 [myid:1] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 1

Workaround

List the workaround method that the customer can use to get away from the blocker before final solution releases.

Solution

This is due to the ZK disk file being corrupted due to out of disk. When ZK starts, it will try to read back the disk file and crashes due to the file size being zero. We can check that by checking the ZK data folder on all ZK nodes:

ls ~/tigergraph/data/zk/version-2/ -lth
ZooKeeper image

1.Remove the 0 byte log file and start the zk. 2.Use sudo ./install.sh if it happed when install TG 3.

sudo ./install.sh
gadmin init cluster --with-config ~/.tg.cfg