System Memory in Critical state. Request aborted.

Problem

Queries are aborting due to a system’s memory being in critical state

Error message

You will find the following types of lines accompanying each other in the GPE log:

Error message #1:

I0602 15:03:11.516336 4131475 gsystem.cpp:690] System_GSystem|GSystemWatcher|Critical|SysMinFreePct|9|FreePct|5

Error message #2:

W0602 10:40:52.128535 4135083 enginedriverservice.cpp:137] SLPE_TransferFunds::kstep expansion,19.RESTPP_1_1.1654141252117.N,NNN,60,0,0,S Aborting: error_: System Memory in Critical state. Request aborted.

The request name and request id from the above error will vary for each different query

Diagnosis

1.The system is considered “critical” when the available free memory reaches (by default) 10% or below on any one of the nodes in a cluster. At that point, queries (new/existing) will begin to abort as part of Tigergraph’s memory protection.

2.The node of the GPE log from which you are seeing this error is most likely the one that is running low in memory.

3.You can use the Linux free command to verify memory on all nodes. Example: grun all "free"

Workaround

As a workaround (though not recommended), it is possible to adjust the memory state thresholds. You can run the gadmin config entry GPE.BasicConfig.Env command to modify the GPE environment variables, and add the SysMinFreePct with a new value to the end of the existing values (separated by semicolon).

For example:

✔ New: LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; CPUPROFILE=/tmp/tg_cpu_profiler; CPUPROFILESIGNAL=12; MALLOC_CONF=prof:true,prof_active:false;SysMinFreePct=5

In the above example, the new threshold of free memory available before the system goes into “critical” state has been set to 5% (instead of the default 10%). == Solution

The most effective solution in this case is to increase the amount of memory per node. If you are encountering “critical” memory state for your normal day-to-day workload, then there is simply not enough memory available to accommodate that.

You can also consider spreading out your workload across less busier times of the day. That way you do not have too many queries running concurrently and competing for system resources.