TigerGraph architecture and components

TigerGraph is a distributed graph analytics platform. TigerGraph supports real-time graph data updates.

The TigerGraph platform is designed to be deployed on a general-purpose Linux server. This can be on a local server or cloud VM.

Within TigerGraph, we use Kafka message passing to coordinate internal components. RESTPP, an enhanced RESTful API server, plays the core role of task management. Users can interact with the system in multiple ways - GUI, REST API, and CLI.

Architecture image

As shown in Figure 1: Overview of TigerGraph platform components, users can use the GSQL client to interact with the database. Additionally, they can interact with the database directly with the gadmin CLI or GraphStudio graphical user interface to interact with TigerGraph as well.

Finally, TigerGraph provides a REST API. Using this API, you can directly communicate with the data platform through RESTPP which can enable enterprise grade workloads involving high request volume.

Users can also use the AdminPortal graphical user interface to query system resources and manage the system and users.

1. TigerGraph Platform Overview

The command tool of the gAdmin management system is also provided, which is convenient for high-end customers to directly use commands to query and update system settings.

Component Description

GraphStudio GUI

A visual user interaction based on a web browserinterface. Users can use it to easily manipulate the graph database. It includesThe following components: database schema designer, data mapper, data loader,Graph browser and query editor. (Ie Schema Designer, DataMapper, Data Loader, Graph Explorer, QueryEditor.)

Admin Portal

Allows system administrators to monitor and manager TigerGraph via a browser based GUI. This includes the following components: control panel, monitoring interface, management interface.

GSQL Client

A remote client to allow GSQL CLI access on the TigerGraph system

Nginx

A free and open source HTTP server and proxy server used internally by TigerGraph for routing requests. Nginx will be installed automatically when TigerGraph is installed

REST++ or RESTPP

TigerGraph component used to process client and internal REST requests. RESTPP routes requests to the component that needs to process them. Additionally, RESTPP provides users with a number of ready to use API endpoints and generates API endpoints for each TigerGraph query.

Apache Kafka

A free and open source scalable, distributed messaging system. Kafka is automatically installed when TigerGraph is installed. Kafka is utilized to pass messages between many components in TigerGraph

Apache ZooKeeper

A free and open source software providing a scalable, distributed, and fault tolerant metadata storage. ZooKeeper is automatically installed when TigerGraph is. ZooKeeper is used by TigerGraph to store graph and cluster metadata

gadmin

The CLI application used to configure and manage TigerGraph. Similar to mysqladmin in the MySQL database.

Controller (CTRL)

A TigerGraph core infrastructure component that manages configuration, services, and task scheduling. Additionally, the Controller recceives executes commands from gadmin.

Executor (EXE)

A TigerGraph core infrastructure component that runs on every node and executes commands from the Controller. Primarily handles process and file management, but also many other operations.

GPE

The TigerGraph graph processing engine. The heart of the database, the GPE stores graph data in memory and runs computations on it. The GPE is interacted with via RESTPP and GSQL.

GSE (IDS)

While named the Graph Store Engine, this does not store graph data. Stores a mapping between vertex IDs provided by the user to internal IDs used by the GPE.

GSQL

The language used to define queries for the GPE. Allows user to define the graph schema as well as create, read, update, and delete graph data. Also manages user authentication.

Dictionary (DICT)

A component used to store graph metadata which runs on top of ZooKeeper. Stores schema and user information.

ETCD

A distributed, high availability key-value storage system. Used by the TigerGraph infrastructure to store metadata.

GBAR

Graph Backup And Restore (GBAR) handles backup and restore operations for the system

2. Installation and Upgrade FAQ

The TigerGraph platform installation process includes both single node and cluster installation in either an interactive or non-interactive manner. Additionally, TigerGraph allows for upgrading to another version and uninstallation.

2.1 Installation FAQ

TODO: installation script, overview of non-interactive installation, replication VS partition factor. Include hardware requirements: https://docs.tigergraph.com/admin/admin-guide/hw-and-sw-requirements

  1. Incorrect directory permissions

    Possible issue

    Installation may fail due to incorrect file permissions on relevant directories during the install process. The error message will indicate no write permission on the relevant directory or that the relevant file does not exist.

    Suggested troubleshooting methods

    • After starting the installation, use the ps or top command to check that the correct account is running the installation.

    • If the installation log notes issues with file not existing, check the file permission of the corresponding parent directory.

    • Check that the installation was started with sudo.

    Recommended solution

    • Do one or both:

      • Use kill to stop any installation processes that may still be running.

      • Use chown to correct the user or group permissions under the installation directory.

    • Then, restart the installation.

  2. License expired

    Possible issue

    After installation, the license is expired and related functions are not avaialble.

    Suggested troubleshooting methods

    Use gadmin license status to view information about the license

    Recommended solution

    If the license is expired, please contact your TigerGraph account rep or the TigerGraph Support team.

2.2 Upgrade FAQ

  1. How to update the existing version to TigerGraph 3.1.5 from 3.0.0 or greater?

    Recommended solution

    Before updating, the installer will ask to confirm that some precheck steps are run

    1. Download and unzip the following tar package:

      wget http://service.tigergraph.com/download/upgrade-3.1.5-check/upgrade-3.1.5-gsql-
      check.tar.gz

      tar -xzvf upgrade-3.1.5-check/upgrade-3.1.5-gsql-check.tar.gz

    2. There are two files in the package:

      gsql-check-3.1.5.jar
      run_gsql_check.sh
    3. Run the check command as follows:

      bash run_gsql_check.sh
    4. If the check passes, continue as is. Otherwise, please open a ticket with TigerGraph Support and provide the following files:

      tar -czvf /tmp/catalog_check.tar.gz /tmp/gsql_check_catalog/
      $(gadmin config get System.DataRoot)/gstore/0/part/config.yaml
  1. Operating system not supported

    Possible issue

    The installation failed due to not being a supported OS

    Suggested troubleshooting methods

    Confirm if the error is due to the OS not being supported or due to a system package missing.

    Recommended solution

    • If due to system package missing, please install the missing package

    • If due to operating system not supported, confirm that you are using one of our certified operating systems:

    OS

    On-premises hosting

    Java JDK version

    GCC version (C/C++)

    RedHat 6.5 to 6.9

    Yes

    1.8.0_141

    4.8.2

    RedHat 7.0 to 7.8

    Yes

    1.8.0_141

    4.8.2

    RedHat 8.0 to 8.2

    Yes

    1.8.0_141

    4.8.2

    CentOS 6.5 to 6.9

    Yes

    1.8.0_141

    4.8.2

    CentOS 7.0 to 7.4

    Yes

    1.8.0_141

    4.8.2

    CentOS 8.0 to 8.2

    Yes

    1.8.0_141

    4.8.2

    Ubuntu 14.04 LTS Ubuntu 16.04 LTS Ubuntu 18.04 LTS

    Yes

    1.8.0_141

    4.8.4

    Debian 8

    Yes

    1.8.0_141

    4.8.4

  2. Clock synchronization

    Possible issue

    Many issues may occur within a TigerGraph cluster as a result of clocks out of sync. When encountering a problem, this is the first thing to check

    Suggested troubleshooting methods

    grun_p all date will output the time from all nodes. If more than a few seconds off, the clocks need to be resyned.

    Recommended solution: Please use an NTP implementation that matches your OS to keep clocks synchronized - chronyd on CentOS or timesyncd on Ubuntu. Customers may also use the now deprecated ntpd or ntpdate.

    To resync:

    • Centos

      sudo chryonyc -a makestep
    • Ubuntu

      sudo systemctl restart system-timesyncd

      After the clock is resynced there is no need to restart TigerGraph services.

  1. How to use REST API on TGCloud

    Recommended solution

    RESTPP auhtnaticaiton is activate on TGCloud. To use the REST API, users must first generate an authentication token.

    1. Generate a secret in the GraphStudio Admin portal. This can also be done via GSQL CLI: "create secret YOUR_ALIAS"

    2. Using the generated secret, run the following command to generate a RESTPP token:

      curl -X get 'https://YOUR_SOLUTION_ID_HERE.i.tgcloud.us:9000/requesttoken?secret=YOUR_SECRET_HERE&lifetime=1000000'

      Note that you may also pass a parameter named lifetime set to the number of seconds before the token expires. If ommitted, the default value is 2.6 million seconds, or about a month.

      The response will look something like:

      {"code":"REST-0000",
      "expiration":1570727825,
      "error":false,
      "message":"Generate new token successfully.",
      "token":"xyz789"
      }
    3. Using the generated RESTPP token, perform an API call using the following format:

      curl -X GET -H "Authorization: Bearer xyz789" 'https://SOLUTIONID.i.tgcloud.us:9000/echo'

5. FAQ for System settings

TODO: add simpler information around starting/stopping components stop vs stop all, configuration overview/basics, gadmin status

  1. How to change the root directory for temp/app/data/

    Recommended solution

    1. Change the temp root directory

      gadmin config set System.TempRoot /PATH/TO/DIRECTORY
      gadmin config apply -y
      gadmin restart all -y
    2. Change the application root directory

      Note that this can also be used to switch between application binaries of the same minor version. For example, you may install 3.1.5 and switch the AppRoot back to 3.1.1 to go back to use 3.1.1 binaries. This provides a roll back option when upgrading within the same minor version

      gadmin config set System.AppRoot /PATH/TO/DIRECTORY
      gadmin config apply -y
      gadmin restart all -y
    3. Change the data root directory

      Changing the data root directory is a more involved process because data in the directory must also be moved. Please use caution when modifying anything in the data root.

      #set the new directory you want to use and create it on all nodes in the cluster
      NewDir=/PATH/TO/DIRECTORY
      grun all "mkdir -p NEW_DIR"
      
      #configure variables for use
      OriginDataRoot=$(gadmin config get System.DataRoot)
      newDataRoot=$NewDir
      gstore_path=$OriginDataRoot/gstore
      
      #convert links from absolute to relative
      grun all "symlinks -rc $gstore_path/0/part"
      
      #stop all TG services
      gadmin stop all -y
      
      #create a copy of the TigerGraph config
      cp -L ~/.tg.cfg ~/.copy.tg.cfg
      
      #make sure the new location exists
      grun all "if [[ ! -f $newDataRoot && ! -d $newDataRoot ]]; then echo 'not exists'; else echo 'exists'; fi"
      
      #move data from the old location to the new location on all nodes
      grun all "mv $OriginDataRoot $newDataRoot"
      
      #remove the old link and prepare the modified config
      rm ~/.tg.cfg
      cp -L ~/.copy.tg.cfg ~/.tg.cfg
      gadmin config set System.DataRoot $newDataRoot --file ~/.tg.cfg
      
      #start the cluster with the new config and remove the old copy
      gadmin init cluster -y --skip-stop
      rm ~/.copy.tg.cfg

6. FAQ about System Security

TODO: add in the info contained in the security doc Jordan created

  1. How to use an ssh key generated by the user

    Recommended solution

    During the installation process, TigerGraph will generate an ssh credential for the TigerGraph user used on every node in the cluster. If you would like to specify your own ssh key, you must do so using the non-interactive installation procedure.

    Please note that tigerGraph currently ony supports using an SSH key in PEM format, and not OpenSSH.

    You can convery OpenSSH to PEM as follows:

    ssh-keygen -p -m PEM -f ~/PATH/TO/YOUR/KEY

    You can also use the following command to generate a new PEM key:

    ssh-keygen -m PEM -t rsa -b 4096
  1. How to use gcollect to collect cluster logs

    Recommended solution

    gcollect is a command to collect cluster logs. Using this command, the system will automatically collect data from multiple nodes in the TigerGraph cluster.

    Note that the output directory will be overwritten. Additionally, depending on the size and activity of your cluster, the output data will be quite large.

    See below for some common examples.

    # show all requests during the last hour
    gcollect -t 3600 show
    # collect debug info for a specific request
    gcollect -r RESTPP_2_1.1559075028795 -b 60 -d 120 -p "error" collect
    # collect debug info for all components
    gcollect -i -p "error" -p "FAILED" -s "2019-05-22,18:00:00"-e "2019-05-22,19:00:00" collect
    # Search from log files that have been collected before
    gcollect -i -p "unknown" -c admin,gpe -D -A 1 -B 2 grep
    # collect all requests during the last day
    gcollect -t 86400 collect
    # move output to date directory. Outpout must be at the beginning
    gcollect -o $(date +"%F") -c gsql collect

8. FAQ about data operations

  1. The data cannot be updated or loaded

    Possible issues

    When the data loading job is completed, a query cannot display the newly added data. Or, after data is updated the new data is not retrieved via query or RESTP API.

    Potential causes

    1. After the license limit reaches the maximum allowed value, the GSE will reject all subsequent ID conversion requests. This causues further updates or writes to fail (read only mode)

    2. In some cases, this can occur because the Kafka message containing the update has not been saved to disk by the GPE yet

    3. Further pulls from the Kafka queue are disabled due to maximum segment size reached

    Suggested troubleshooting methods

    1. Check the gse .out log to see if there are any messages related to license limitation violation.

    2. Check the segmentmeta.yaml file to see if the value of PostQueuePos is less than the deltaQ start offset.

    Recommended solution

    1. Contact your TigerGraph account rep or TigerGraph Support to expand license capacity. Alternatively, reduce the amount of data stored in the database and then restart the GPE and GSE.

    2. Run the rebuildnow RESTPP endpoint. The system runs this every 30 seconds, to trigger it immediately use the following:

      curl -X GET "http://localhost:9000/rebuildnow"
    3. Clear out the Kafka queue data and send the update again: gadmin reset -y

    4. As a failsafe on a dev cluster, dropping all graph data/metadata will resolve most problems: gsql 'drop all'

  2. How to check data loading progress via RESTPP endpoint

    Recommended solution

    Use the following curl command:

    curl -X POST "http://localhost:9000/restpploader?checkstatus=true" -d '{"check_active": true}'

    Alternatively, you can provide a job ID list as the payload data:

    curl -X POST "http://localhost:9000/restpploader?checkstatus=true" -d '{"job_id_list": ["YOUR_JOB_ID_1", "YOUR_JOB_ID_2"]}'

9. GraphStudio FAQ

GraphStudio is a web browser GUI that allows users to easily interact with TigerGraph. It includes the follow components: Schema Designer, Data Mapper, Data Loader, Graph Explorer, and Query Editor

  1. Web login exception

    Possible issues

    Version compatibility error

    Suggested troubleshoot methods:

    Observe error message - "Error: Failed to get configurations"

    Recommend solution

    Upgrade to a higher version

  2. Switch the web background color

    Recommended solution

    GraphStudio proides two color themes, dark and white. By default, the dark theme is used. You switch using the Admin button. Specific instructions can be found here: https://docs.tigergraph.com/ui/graphstudio/overview#dark-and-light-theme

  3. All services are online, but users cannot login to GraphStudio

    [white red-background]#I’m confused on this one.

  4. Browser will not load port 14240

    Possible issue: gadmin status shows all services are running, but user is unable to navigate to SERVER_IP:14240. Most likely issue is that the port is not open or is already occupied.

    Suggested troubleshooting

    1. Enter netstat -nlpt | grep 14240 to find the port, execute curl -v '127.0.0.1:14240' from the server and see if the result is normal.

    2. Try telnet ip:14240 to see if it can be connected.

    Recommended solution

    1. If port 14240 is occupied restart the GUI service. If still occupied after, run kill -9 on the tg_dbs_guid process ID.

    2. Confirm that the firewall allows connection on port 14240.

10. Admin Portal FAQ

  1. Missing monitoring information in Admin Portal

    Possible issue

    Insufficient system memory or disk space.

    Suggested troubleshooting

    Monitoring data for the Admin Portal is retrieved from the TS3 component and displayed in the Admin Portal. When the system has insufficient resources, data sent to TS3 may be lost. You can confirm resource usage on each machine using:

    grun  all "df -h"
    grun all "free -h"
    grun all "ps aux | grep $(gadmin config get System.AppRoot)/bin"

    Recommended solution

    TS3 data is stored in the following path:

    $(gadmin config get System.DataRoot)/ts3/db

    This file can be removed manually and the system will automatically generate a new file. This can help free disk space. To free memory or CPU usage will require alternative methods such as adding a new machine to the cluster or increasing memory/CPU capacity of the system.

11. Nginx FAQ

Nginx is a free and open source HTTP and reverse proxy server. Nginx is automatically installed along with TigerGraph and used to direct traffic to the correct component as well as provide configuration related to the GraphStudio GUI.

  1. Nginx cannot be started

    Possible issues

    After the Nginx process hangs, it cannot be started through gadmin start because the port is occupied

    Suggested troubleshooting methods

    1. Enter gadmin status -v nginx to see which machine has the issue with Nginx

    2. Log in to that machine and enter ps aux | grep nginx to find the Nginx processes

    Recommended solution

    Use kill -9 on all remaining Nginx process IDs, then run gadmin start nginx to start the nginx services.

12. RESTPP FAQ

  1. Error message "Unauthorized User authentication failed" with RESTPP authentication enabled.

    [black yellow-background]#I think might be better as internal information only. TODO: RESTPP auth enable and use

    Possible issue

    The error message relates to wrong authentication credentials for the RESTPP API call. In general, the system performs user authentication using two methods:

    1. User name and password (If the configuration value RESTPP.Factory.EnableAuth is false)

    2. Token (If RESTPP.Factory.EnableAuth is true)

    In this issue, the RESTPP authentication is enabled.

    Suggested troubleshooting methods

    Run the following commands to confirm the TG_TOKEN is consistent across system components

    1. Find the AuthToken in the .tg.cfg file

      gadmin config dump | grep -i token
    2. Find the TG_TOKEN used in the Executor (EXE)

      ps aux | grep exe # get the Executor process ID
      cat /proc/<EXE_PID>/environ
    3. Find the TG_TOKEN used in RESTPP

      ps aux | grep rest # get the RESTPP PID
      cat /proc/<RESTPP_PID>/environ

    Recommended solution

    If the above token results are different, process as follows.

    1. Stop all services

      gadmin stop all -y
    2. If needed, manually shut down the Executor process on each node

      ps aux | grep exe
      kill <EXE_PID> # if this doesn't work, use kill -9
    3. Start the Executor

      gadmin start exe
    4. Start other components

      gadmin start ctrl
      gadmin config apply --initial
      gadmin init kafka -y
      gadmin start ZK KAFKA IFM ETCD
      gadmin start
    5. Repeat the troubleshooting methods again to confirm TG_TOKEN is consistent

13. FAQ for Kafka and ZK

  1. How to enable debug logging in Kafka loader

    [black yellow-background]#probably better left as internal information

    Recommended solution

    1. Get the Kafka loader pid (tg_app_kafkaldr)

      ps aux | grep kafkaldr
    2. Run the following

      kill -10 <KAFKALDR_PID>
  2. How to get the last update run by the Kafka loader

    Recommended solution

    Users may use the following command to get information about the last run of the Kafka loader

    curl -X POST "http://localhost:9000/restppkafkaloader/<GRAPH_NAME>?checkstatus=true" -d '{"job_id_list": ["<YOUR_JOB_ID>"]}'
  3. How to get the Kafka queue offset

    Recommended solution

    Kafka queue offset information can be queried with the following command:

    ls -ltr $(gadmin config get System.DataRoot)/kafka/deltaQ_GPE_* | awk -F ' ' '{print $9}' | awk -F '.' '{print $1}' |grep -v leader-epoch-checkpoint | sort | uniq

14. Executor FAQ

Executor (EXE) is a core infrastructure component responsible for operations on all nodes in the cluster, mainly process and file management. It receives messages via Kafka from the Controller (CTRL). Upon system start, in a cluster, the Executor is started via ssh.

  1. Error "Executor cannot be started, unable to ssh to <IP> with given credentials"

    Suggested troubleshooting methods

    As evidenced by the error message, the executor is started via ssh. If there are issues with the ssh connection, then the Executor will be unable to start.

    To troubleshoot this, first test the ssh connection using the key that TigerGraph uses. Note that depending on which node the error message references, you should test the ssh to that node. SSHing to localhost will also help to narrow down potential networking issues.

    Note: if you installed TigerGraph via non-interactive installation and specified your own key, then that should be used instead. You can see which key is used via the gssh command: gssh | grep IdentityFile

    Useful commands:

    ssh -vi ~/.ssh/tigergraph_rsa tigergraph@<IP_ADDRESS>
    ssh -vi ~/.ssh/tigergraph_rsa tigergraph@localhost
    ping <IP_ADDRESS>
    telnet <IP_ADDRESS> 22 # telnet nc and nmap commands all do the same thing
    nc -vz <IP_ADDRESS> 22 # use whichever package you have available on your system
    nmap -p 22 <IP_ADDRESS>

    Recommended solution

    Depending on the output of the ssh command, a variety of solutions are required.

    • If the tigergraph user is subject to a password expiration policy, then this will be raised during the ssh and the password must be changed

    • If the issue is due to networking (firewall or issue communicating) then the ssh connection will fail before the ssh handshake begins

    • If the ssh to localhost works, but to the IP address does not, confirm port 22 is open between nodes in the firewall

    • If the key is not working, confirm that ssh via key is enabled in the sshd configuration and ensure that the key is present in ~/.ssh/authorized_keys

15. Controller FAQ

The Controller (CTRL) is a core infrastrucutre component for TigerGraph. The Controller is mainly response for system setting management, service management, and distribution of scheduled tasks.

  1. Excessive memory consumption

    Possible issue

    Due to large memory usage by other processes, the system response slows down or service goes down.

    Suggested troubleshooting methods

    Use the top command to observe the resource usage of the OS.

    Recommended solution

    • Temporary: use gadmin restart all to restart the services to reduce system memory usage.

    • Recommended: This phenomenon was common before version 3.1.5 and was resolved in all versions 3.1.5 and above. We suggest you upgrade to a higer version if you encounter issues with the controller.

16. GPE FAQ

The GPE (Graph Processing Component) is the main Engine component of TigerGraph, responsible for receiving queries from RESTPP, updating the graph data, and retrieving data.

  1. GPE crash

    Possible issue

    A large data update is sent to the GPE which is not able to process it quickly. After time, if the Kafka queue that the GPE receives update from exceeds 2 GB, the GPE will crash.

    Suggested troubleshooting methods

    Check process status using gadmin status and review logs using gadmin log gpe if the GPE crashes. Usually with this scenario, the GPE will crash again after it is started and before it finishes warming up.

    Recommended solution

    • Temporary: reduce frequency of data updates/posts so that the GPE has time to process them as they come in. Avoid extremely large data updates, especially on one single vertex. Spread data updates out over time. Use gadmin reset -y to clear the Kafka queue and bring the system back up.

    • Recommended: Upgrade to version 3.2 which has added safe handling of this scenario.

17. GSE FAQ

[black yellow-background]#todo: GSE compaction and notes about high frequency of deletes

The GSE (Graph Storage Engine) handles storage of a vertex ID map. This map translates between vertex primary IDs provided by the user, to internal vertex IDs used by the GPE.

  1. GSE service cannot start

    Possible issue

    The GSE service does not start on one node

    Suggested troubleshooting method

    First check that the disk space on the corresponding node is sufficient using df -h. If insufficient disk space is available the GSE may encounter issues and data will need to recovered from a replica or the system will need to be restored from backup.

    Recommended solution

    Free up disk space by clearing non-essential files

18. GSQL FAQ

GSQL is the query language used by TigerGraph, as well as the server that handles query language requests, authentication, and user access management. Additionally, GSQL offers metadata definition and data loading capabilities via Kafka, files on disk, and files on AWS S3.

DDL (Data Definition Language) is a set of commands used to define the database schema.

DML (Data Manipulation Language) is a set of commands used to add, modify, delete, and update data in the database.

  1. Schema change failed

    Possible issues

    The following error can be seen in the logs: "Submit schema change failed return code 310"

    This is possibly caused by GPE/GSE/RESTPP service not being online during the schema change operation. This could also be caused by a query or loading job running while the schema change is started that does not finish before the attempted schema change times out.

    Suggested troubleshooting methods

    1. Use gadmin status to see if any service is down

    2. Check the info log of GSQL for error code 310

    Recommended solution

    1. If the service is down, start the coressponding service.

    2. Stop the loading job. To abort queries, you can use the following RESTPP endpoint

      curl -X GET "http://localhost:9000/abortquery/<GRAPH_NAME>?requestid=all"

19. ETCD FAQ

[black yellow-background]#Nancy: please review, I have never done this before.

ETCD is a distributed and fault tolerant key-value store.

  1. TigerGraph related parameter information cannot be obtained normally

    Possible issue

    The ETCD service is down and causes configuration information to be unavailable. For example, gadmin config get System.AppRoot returns no value.

    Suggested troubleshooting method

    Use the following command to view ETCD service information. Note that you may need to change the given path depending on your system configuration.

    ETCDCTL_API=3 ./etcdctl --endpoints=<PRIVATE_IP>:20000 member list
    gssh | grep etcd.servers

    If the output from etcd does not contain as many servers as the etcd.servers line or the etcd command has an error, then we need to readd that server to the etcd cluster.

    Recommended solution

    #stop etcd services on node with issues, for example on m3
    gadmin stop etcd#3
    # remove etcd data on the node
    grun m3 "rm -rf /home/tigergraph/tigergraph/data/etcd/*"
    #remove the node from the etcd cluster
    ETCDCTL_API=3 /home/tigergraph/tigergraph/app/3.1.5/etcd/etcdctl --endpoints=<PRIVATE_IP>:20000 member remove http://<ISSUE_NODE_IP>:20000
    # start etcd on m3
    gadmin start etcd
    #add the service on m3 back to the cluster
    ETCDCTL_API=3 /home/tigergraph/tigergraph/app/3.1.5/etcd/etcdctl --endpoints=<PRIVATE_IP>:20000 member add'ETCD#3' --peer-urls=http://<ISSUE_NODE_IP>:20001

TEST