Journal Log Files Took Lots Of Space Usage

Problem

journal log files took lots of space usage after improper shutdown/errors and increased quickly.

Error message

The customer imporerly restarted twice, and then disk space usage has gone up a lot; 300GB of data was loaded but each node in a six-node cluser has used up 3.5TB of space, and the journal log files were still increasing quickly, so the customer stopped the service. The ERROR logs below: GSE ERROR logs:

E0504 13:00:14.575253 12453 journal_manager.cpp:104] cannot init system within 10skTimeout, sleep for 5 seconds and retry.
E0504 13:00:29.588791 12453 journal_manager.cpp:104] cannot init system within 10skTimeout, sleep for 5 seconds and retry.
E0504 13:00:31.998577 13594 zookeeper_context.cpp:1053] Recursive delete /tigergraph/dict/objects/__services/RLS-GSE/_runtime_nodes/RLS-GSE_1_1 failed. PathCount:1 Rc:no node
E0504 13:00:32.863590 13596 recordio.cpp:432] Error happens when reading from disk. Status: kEof
…….(multiple lines)
E0504 14:09:01.088362 13596 service_daemon.hpp:640] Vertex Id Manager
….id listing report
E0504 14:09:01.127012 13596 service_server.cpp:264] Start 1 responser threads

GSE INFO logs:

I0504 14:13:27.107749 5107 service_combiner.cpp:205] Engine_GSE|iDELETE_VID,437.GPE_2_1.1651637570882.N,NNN,0,0,0|ERROR|cannot delete vertices: 1234567890 1224567890 … …
….Above line is repeated for many logs with unique vertex ids. Many vertex ids on 1 line.
GSE warning log:
W0504 12:59:31.938236 13594 recordio.cpp:606] CRC error when finding record, file: /<<<app directory>>>/tigergraph/data/gstore/0/1/ids//_topic_GSE1journal//GSE1journal_1_0.log, block offset: 602435559, on disk crc: 0x2000000, data crc: edfe68f9, data size: 0
…many lines of different files. This directory now has many files.

Diagnosis

GSE journal log file is corrupted due to the unexpected shutdown, and GBAR also can’t worked any more.

Workaround

The customer accepted to clear the all data space in order to recover the cluster, then enable the GSE compaction, at last restore the data.

Solution

Releasing the disk space by runing "CLEAR GRAPH STORE -HARD ", and then enable the GSE compaction.

Note

The above only impacts 3.1.x and 3.0.x.