Node Commands¶
In this section, we will review the most important commands to monitor
the cluster. We assume that the system administrator has installed the
cluster and that you successfully executed the grbcluster login
command with the appropriate flags to access your cluster.
Listing Cluster Nodes¶
The nodes
command provides a list of nodes in the cluster, along
with status information. This command is a shortcut for the
node list
command. For example:
> grbcluster nodes
ID ADDRESS STATUS TYPE LICENSE PROCESSING #Q #R JL IDLE %MEM %CPU
b7d037db server1:61000 ALIVE COMPUTE VALID ACCEPTING 0 0 10 19m 15.30 5.64
735c595f server2:61000 ALIVE COMPUTE VALID ACCEPTING 0 0 10 19m 10.45 8.01
Add the —-describe
flag to see an explanation of each field:
> grbcluster nodes --describe
ID - Unique node ID, use --long to display full ID
ADDRESS - Node address
STATUS - Node status (ALIVE, FAILED, JOINING, LEAVING, DEGRADED)
TYPE - Node type (COMPUTE: Compute Server, WORKER: Distributed Worker)
GRP - Group name for job affinity (not displayed if empty or restricted)
LICENSE - License status (N/A, VALID, INVALID, EXPIRED)
PROCESSING- Processing state (ACCEPTING, DRAINING, STOPPED)
#Q - Number of jobs in queue
#R - Number of jobs running
JL - Job Limit (maximum number of running jobs)
IDLE - Idle time since the last job execution (in minutes)
%MEM - Percentage of memory currently used on the machine
%CPU - Percentage of CPU currently used on the machine
STARTED - Node start time, use --long
RUNTIMES - Deployed runtime versions, use --long
VERSION - Remote Services Agent version, use --long
Troubleshooting Connectivity Issues¶
You can test to see if a Remote Services node is reachable with the
node ping
command:
> grbcluster node ping --server=server1
Node is not reachable
The node latency
command provides additional details:
> grbcluster node latency
ADDRESS LATENCY NBERR
server1 1.12813ms 0
server2 1.218103ms 0
This will display the latency from the client machine to each node in the cluster.
Add the —-describe
flag to see an explanation of each field:
> grbcluster node latency --describe
ADDRESS - Node address
LATENCY - latency between the local client and a node
NBERR - Number of errors
Listing Cluster Licenses¶
The node licenses
command displays license status information for
each node in a cluster:
> grbcluster node licenses
ID ADDRESS STATUS TYPE KEY EXP ORG USER APP VER CS DL ERROR
eb07fe16 server1 VALID NODE gurobi 8 true 0
b7d037db server2 VALID NODE gurobi 8 true 0
Add the —-describe
flag to see an explanation of each field:
> grbcluster node licenses --describe
ID - Unique node ID, use --long to display full ID
ADDRESS - Node address
STATUS - License status (N/A, VALID, INVALID, EXPIRED)
TYPE - License type
KEY - License Cloud Key (not displayed if empty or restricted)
EXP - License expiration
VER - Maximum runtime version supported
CS - Indicate if Compute Server features are enabled
DL - Maximum number of workers for a distributed job (Distributed Limit)
ORG - Assigned organization
USER - Assigned username
APP - Assigned application name
ERROR - License error message
If a node has an INVALID
license, you can run the following command
to learn more:
> grbcluster node licenses
ID ADDRESS STATUS TYPE KEY EXP ORG USER APP VER CS DL ERROR
eb07fe16 server1 INVALID NODE false 0 No Gurobi license found...
Note that the node licenses
command can be used at any time to check
the validity and attributes of licenses on all the nodes of the cluster
(expiration date, distributed limit, etc.).
Changing the Job Limit¶
Each node of a Remote Services has a job limit, which indicates the
maximum number of jobs that can be run simultaneously on that node. This
job limit can be changed using the grbcluster node config
command,
together with the —-job-limit=
flag. For example, to change the job
limit to 5:
> grbcluster node config --job-limit=5
Changes to the job limit parameter only apply to the specified node; other nodes in the cluster are unaffected. Once changed, the new value will persist, even if you stop and restart the node.
Recall that you can run the nodes
command to view the current job
limit for each node in a cluster:
> grbcluster nodes
ID ADDRESS STATUS TYPE LICENSE PROCESSING #Q #R JL IDLE %MEM %CPU
b7d037db server1:61000 ALIVE COMPUTE VALID ACCEPTING 0 0 2 19m 15.30 5.64
735c595f server2:61000 ALIVE COMPUTE VALID ACCEPTING 0 0 2 19m 10.45 8.01
The JL
column shows the job limit, which is 2 for both nodes in the
cluster in this example.
We can change the limit for one node:
> grbcluster node config --server=server1:61000 --job-limit=5
By rerunning the nodes
command, we can see that the limit for
server1
has been changed to 5:
> grbcluster nodes
ID ADDRESS STATUS TYPE LICENSE PROCESSING #Q #R JL IDLE %MEM %CPU
b7d037db server1:61000 ALIVE COMPUTE VALID ACCEPTING 0 0 5 19m 15.30 5.64
735c595f server2:61000 ALIVE COMPUTE VALID ACCEPTING 0 0 2 19m 10.45 8.01