Import Graph failing

Symptoms

As part of fixing the known_issue GSE reusing segment ID (CORE-1314) after upgrading to 3.5.2 (which has the fix for that bug) we need to run "CLEAR GRAPH STORE" and then import data to make sure that there are no left-overs of this bug While trying to import graph the import fails:

GSQL > IMPORT GRAPH ALL FROM "/app/tigergraph/tigergraph/backup/Graph_Export_for_PERF"
*.gsql files found at /app/tigergraph/tigergraph/backup/Graph_Export_for_PERF
Skipping unzip of "ExportedGraph.zip"
Dropping all, about 1 minute ...
Abort all active loading jobs
Resetting GPE...
Successfully reset GPE and GSE
Stopping GPE GSE
Successfully stopped GPE GSE in 0.006 seconds
Clearing graph store...
Successfully cleared graph store
Starting GPE GSE RESTPP
Successfully started GPE GSE RESTPP in 0.084 seconds
The catalog was reset and the graph store was cleared.
Importing ACL Privileges
For more info, please check log at node 'm1': /home/tigergraph/tigergraph/log/gsql/ERROR.20220526-084505.933
GSQL > exit

Error Messages

E@20220526 11:10:55.533  (BaseClient.java:369) io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 4.997041853s. [closed=[], open=[[buffered_nanos=494031, remote_addr=/10.139.140.154:9166]]]
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 4.997041853s. [closed=[], open=[[buffered_nanos=494031, remote_addr=/10.139.140.154:9166]]]
        at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
        at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
        at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
        at com.tigergraph.schema.grpc.BaseClient.a(BaseClient.java:352)
        at com.tigergraph.schema.grpc.BaseClient.a(BaseClient.java:300)
        at com.tigergraph.schema.grpc.t.d(InformantClient.java:96)
        at com.tigergraph.schema.grpc.t.b(InformantClient.java:177)
        at com.tigergraph.schema.grpc.t.e(InformantClient.java:206)
        at com.tigergraph.schema.grpc.a.mX(AdminServiceClient.java:485)
        at com.tigergraph.schema.operation.DeleteExpiredTokenOperation.a(DeleteExpiredTokenOperation.java:169)
        at com.tigergraph.schema.operation.MetadataUpdateOperation.nK(MetadataUpdateOperation.java:190)
        at com.tigergraph.schema.operation.BaseOperation.af(BaseOperation.java:51)
        at com.tigergraph.schema.operation.BaseOperation.run(BaseOperation.java:39)
        at com.tigergraph.schema.k.run(TokenCleaner.java:94)
        at java.base/java.lang.Thread.run(Thread.java:834)

Diagnosis

Looking at the above error message and seeing reference for DEADLINE_EXCEEDED we thought we might need to increase the variable "GSQL.WaitServiceOnlineTimeoutSec" time-window, so increased it first to 120s and then to 600s but none of the help in this case. Checking GSQL logs this is the relevant part for the import graph:

I@20220526 09:16:29.666 tigergraph|127.0.0.1:53754|00000000009 (InformantClient.java:195) GPE: Warmup, GSE: Warmup, RESTPP: Online
I@20220526 09:16:29.666 tigergraph|127.0.0.1:53754|00000000009 (BaseOperation.java:104) com.tigergraph.schema.operation.DropAllOperation@5d7c6629[keepUser=false,catalog=<null>,globalCatalog=<null>,systemErrors=[],permissionErrors=[],semanticErrors=[],hasParent=false,completed=true,succeeded=true]
I@20220526 09:16:29.667 tigergraph|127.0.0.1:53754|00000000009 (ExternalUtil.java:957) Save backup catalog to /home/tigergraph/tigergraph/data/gsql/backup/catalog.20220526-091629
I@20220526 09:16:29.667 tigergraph|127.0.0.1:53754|00000000009 (AdminServiceClient.java:242) call tigergraph.admin.GsqlService/DownloadCatalog; timeout: 20 SECONDS
I@20220526 09:16:29.686 tigergraph|127.0.0.1:53754|00000000009 (AdminServiceClient.java:285) tigergraph.admin.GsqlService/DownloadCatalog took 19ms
I@20220526 09:16:29.692 tigergraph|127.0.0.1:53754|00000000009 (BaseHandler.java:358) Failed|POST|/gsql/command|text/plain; charset=UTF-8|19632ms
I@20220526 09:16:29.692 tigergraph|127.0.0.1:53754|00000000009 (SessionManager.java:396) Remove current thread 'pool-5-thread-2' from session '00000000009'.
I@20220526 09:16:39.128  (ReplicaProducer.java:47) Writing Replica 8119 to Kafka
I@20220526 09:16:39.233  (ReplicaProducer.java:51) Replica 8119 has been written in Kafka
I@20220526 09:16:39.233  (ReplicationManager.java:123) lock ReplicaProducer RUNNING (1)
I@20220526 09:16:39.234  (AdminServiceClient.java:242) call tigergraph.admin.GsqlService/UpdateCatalog; timeout: 20 SECONDS
I@20220526 09:16:39.237  (AdminServiceClient.java:285) tigergraph.admin.GsqlService/UpdateCatalog took 3ms
I@20220526 09:16:39.237  (ReplicationManager.java:128) unlock ReplicaProducer RUNNING (0)
I@20220526 09:16:48.269  (SessionManager.java:235) Add session with cookie: {"sessionId":"00000000009","serverId":"1_1653572705932","gShellTest":false,"terminalWidth":158,"compileThread":0,"clientPath":"/app/tigergraph/tigergraph/backup/Graph_Export_for_PERF","fromGraphStudio":false,"fromGsqlClient":true,"fromGsqlServer":false,"clientCommit":"7edb256d9750ab4451d27eef605e58e9adcedc7a","sessionParameters":{},"sessionAborted":false,"loadingProgressAborted":false,"auth":"Basic dGlnZXJncmFwaDp0aWdlcmdyYXBo","metadataUpdateSeqId":0}
I@20220526 09:16:48.269 tigergraph|127.0.0.1:56430|00000000009 (SessionManager.java:305) Add current thread 'pool-5-thread-2' into session '00000000009'.
I@20220526 09:16:48.269 tigergraph|127.0.0.1:56430|00000000009 (BaseHandler.java:228) Protocol: HTTP/1.1
I@20220526 09:16:48.269 tigergraph|127.0.0.1:56430|00000000009 (BaseHandler.java:229) BaseHandler: AbortClientSessionHandler
I@20220526 09:16:48.270 tigergraph|127.0.0.1:56430|00000000009 (BaseHandler.java:231) Received|POST|/gsql/abortclientsession|18
I@20220526 09:16:48.270 tigergraph|127.0.0.1:56430|00000000009 (AbortClientSessionHandler.java:43) AbortSession/LoadingProgress added for session 00000000009
I@20220526 09:16:48.271 tigergraph|127.0.0.1:56430|00000000009 (SessionManager.java:435) Skip interrupt thread 'pool-5-thread-2' in session: 00000000009
I@20220526 09:16:48.271 tigergraph|127.0.0.1:56430|00000000009 (BaseHandler.java:358) Successful|POST|/gsql/abortclientsession|text/plain; charset=UTF-8|2ms
I@20220526 09:16:48.271 tigergraph|127.0.0.1:56430|00000000009 (SessionManager.java:396) Remove current thread 'pool-5-thread-2' from session '00000000009'.

Given the time constraint in having this resolved ASAP that was affecting the application implementation and holding several teams we opted for a workaround

Workaround

The workaround was to restore from a backup via "gbar" instead of importing graph, which worked fine.

Solution

Since we did not have enough time from the customer to conduct a proper troubleshooting to understand the root cause and given the not enough provided information and logs we were not able to understand the issue and we agreed with the customer that if this issue re-occurs we will be enabling enhanced logging (DEBUG) and invest more time.