Installing a Cluster Node

Once the Remote Services package is installed, you will need to set up a license, if necessary. Then, the Remote Services agent must be configured and started as a standard process or as a service. Finally, you should verify your installation.

Licensing

You will need to download and install a license file on all Compute Server nodes (no license file is required for a Distributed Worker node). You will find detailed instructions for downloading a license in the section How do I retrieve and set up a Gurobi license of the Getting Started Knowledge Base article.

We will just provide a quick summary of the process here. Your first step is to locate and download your license file from the Gurobi License Center. When you download the license file, we strongly recommend that you place it in the default location:

  • C:\gurobi\ on Windows

  • /opt/gurobi/ on Linux

  • /Library/gurobi/ on macOS

  • The user’s home directory

You can also set the environment variable GRB_LICENSE_FILE to point to this file.

In order to use the Cluster Manager, you will need to connect at least one Compute Server node to the cluster. When certain operations are requested such as submitting a job or a batch, the Cluster Manager will check the licenses available on the nodes. If none of the nodes have a valid Compute Server license, the operation will not be authorized.

Remote Services Agent (grb_rs)

To form a Remote Services cluster, you need to run the Remote Services agent (grb_rs) on all of the nodes that make up the cluster. These agents communicate amongst themselves, and also with the Cluster Manager or the clients.

The primary task of the Remote Services agents is to collectively manage the queueing and the execution of jobs. The agents work together to balance the load by assigning a new job to the node with the fewest running jobs whenever possible. If all nodes are at capacity, newly submitted jobs will be queued, and the first node with available capacity will later execute the job. If a new node is added to the cluster, it will immediately start processing queued jobs.

The grb_rs executable provides several commands and flags to help in the configuration and execution of the agent. We will review these commands step by step in the following sections. You can see the full list of commands in the reference section or by using the command-line help:

> grb_rs --help

Configuring a Cluster Node

The Remote Services agent has a number of configuration properties that affect its behavior. These can be controlled using a grb_rs.cnf configuration file. The installation package includes a predefined configuration file that can be used as a starting point (<installdir>/bin/grb_rs.cnf).

The simplest way to modify the parameters is to edit the default configuration file. Other options are available, though. The grb_rs process uses the following precedence rules:

  • First priority: command-line flag —-config

  • Second priority: a configuration file in the current directory

  • Third priority: a configuration file in a shared directory (C:\gurobi, /opt/gurobi, /Library/gurobi for Windows, Linux and macOS platforms, respectively)

  • Fourth priority: a configuration file in the directory where grb_rs is located

Most of the properties that are configured through this file are related to communication options and job processing options. The configuration file is only read once, when grb_rs first starts. Subsequent changes to the file won’t affect parameter values on a running server.

Configuration file format

The configuration file contains a list of properties of the form PROPERTY=value. Lines that begin with the # symbol are treated as comments and are ignored. Here is an example:

# grb_rs.cnf configuration file
PORT=61000
MANAGER=http://mymanager:61080

While you could create this file from scratch, we recommend you start with the version of this file that is included with the product and modify it instead.

The command grb_rs properties lists all of the available properties, their default values, and provides documentation for each. Some can be overridden on the command-line of grb_rs; the name of the command-line flag you would use to do so is provided as well. Some properties are important and must be changed for a production deployment. However, we need to distinguish between deployment with a Cluster Manager and without.

Important Properties with a Cluster Manager

When deploying a node with a Cluster Manager, the configuration is easier and you need to review the following properties:

MANAGER

This is the URL of the manager.

HOSTNAME

This must be the DNS name of the node that can be resolved from the other nodes or from the Cluster Manager. grb_rs tries to get a reasonable default value, but this value may still not be resolved by other nodes and could generate connection errors. It this case, you need to override this name in the configuration file with a fully qualified name of your node, for example:

HOSTNAME=server1

Note that you do not need to give addresses that can be resolved by clients because all communication is routed through the Cluster Manager. The nodes are never accessed directly by the clients.

CLUSTER_TOKEN

The token is a private key that enables different nodes to join the same cluster. All nodes of a cluster and the Cluster Manager must have the same token. We recommended that you generate a brand new token when you set up your cluster. The grb_rs token command will generate a random token, which you can copy into the configuration file.

JOBLIMIT

This property sets the maximum number of jobs that can run concurrently when using Compute Server on a specific node. The limit can be changed on a running cluster using the grbcluster node config —-job-limit <new_limit> command, in which case the new value will persist and the value in the configuration file will be ignored from that point on (even if you stop and restart the cluster).

HARDJOBLIMIT

Certain jobs (those with priority 100) are allowed to ignore the JOBLIMIT, but they aren’t allowed to ignore this limit. Client requests beyond this limit are queued, irrespective of their priority. This limit is set to 0 by default which means that it is disabled and jobs with priority 100 will not be queued.

Important Properties without a Cluster Manager

When installing a node that will not be connected to a Cluster Manager, authentication of clients uses predefined passwords that must be stored in the configuration file. The default configuration files must be reviewed and the following properties must be changed for a production deployment:

HOSTNAME

This must be the DNS name of the node that can be resolved from the other nodes or the clients in your network. grb_rs tries to get a reasonable default value, but this value may still not be resolved by clients and could generate connection errors. It this case, you need to override this name in the configuration file with a fully qualified name of your node, for example:

HOSTNAME=server1

If the names cannot be resolved by clients, another option is to use IP addresses directly, in this case set this property to the IP address of the node.

CLUSTER_TOKEN

The token is a private key that enables different nodes to join the same cluster. All nodes of a cluster must have the same token. We recommended that you generate a brand new token when you set up your cluster. The grb_rs token command will generate a random token, which you can copy into the configuration file.

PASSWORD

This is the password that clients must supply in order to access the cluster. It can be stored in clear text or hashed. We recommended that you create your own password, and that you store it in hashed form. You can use the grb_rs hash command to compute the hashed value for your chosen password. Note that clients must provide the original password (not hashed) and it will be exchanged encrypted if HTTPS is used.

grb_rs hash newpass
$$ppEieKZExlBR-pCSUMlmc4oWlG8nZsUOE2IM0hJbzsmV_Yjj

Then copy and paste the value in the configuration file:

PASSWORD=$$ppEieKZExlBR-pCSUMlmc4oWlG8nZsUOE2IM0hJbzsmV_Yjj

The default password is pass.

ADMINPASSWORD

This is the password that clients must supply in order to run restricted administrative job commands. It can be stored in clear text or hashed. We recommended that you create your own password, and that you store it in hashed form. You can use the grb_rs hash command to compute the hashed value for your chosen password. Note that clients must provide the original password (not hashed) and it will be exchanged encrypted if HTTPS is used. The default password is admin.

CLUSTER_ADMINPASSWORD

This is the password that clients must supply in order to run restricted administrative cluster commands. It can be stored in clear text or hashed. We recommended that you create your own password, and that you store it in hashed form. You can use the grb_rs hash command to compute the hashed value for your chosen password. Note that clients must provide the original password (not hashed) and it will be exchanged encrypted if HTTPS is used. The default password is cluster.

JOBLIMIT

This property sets the maximum number of jobs that can run concurrently when using Compute Server on a specific node. The limit can be changed on a running cluster using the grbcluster node config —-job-limit <new_limit> command, in which case the new value will persist and the value in the configuration file will be ignored from that point on (even if you stop and restart the cluster).

HARDJOBLIMIT

Certain jobs (those with priority 100) are allowed to ignore the JOBLIMIT, but they aren’t allowed to ignore this limit. Client requests beyond this limit are queued, irrespective of their priority. This limit is set to 0 by default which means that it is disabled and jobs with priority 100 will not be queued.

Starting a Cluster Node as a Process

Once you have installed the Remote Services package (including retrieving and installing your license file and, for Linux users, setting your PATH variable), starting grb_rs as a standard process is quite straightforward. From a terminal window with administrator privileges, simply issue the following command:

> grb_rs

If you are using a Cluster Manager and you did not set the MANAGER configuration property you can specify it on the command-line:

> grb_rs --manager=http://mymanager:61080

Both commands will start the Remote Services agent on the default port (port 80), and you should see output like the following:

info  : Reading configuration file: /home/jones/gurobi_server1200/linux64/bin/grb_rs.cnf
info  : Gurobi Remote Services starting...
info  : Platform is linux
info  : Version is 12.0.0\ (build v12.0.0rc0)
info  : Variable GRB_LICENSE_FILE is not set
info  : License file found at /home/jones/gurobi.lic
info  : Node address is server1
info  : Node FQN is server1
info  : Node has 8 cores
info  : Using data directory /home/jones/gurobi_server1200/linux64/bin/data
info  : Data store created
info  : Available runtimes: [... 12.0.0]
info  : Public root is /home/jones/gurobi_server1200/linux64/resources/grb_rs/public
info  : Starting API server (HTTP) on port 80...

If you do not have administrator privileges or if the default port is already in use, you will see an error about opening the port. For example, on Linux you might see an error like this:

fatal : Gurobi Remote Services terminated, listen tcp :80: bind: permission denied

or

fatal : Gurobi Remote Services terminated, listen tcp :80: bind: address already in use

Note that grb_rs does not have to be run with elevated privileges, but it does need elevated privileges to use the default port 80.

If you would like to run grb_rs on a non-default port, use the —-port flag or set the PORT property in the configuration file. For example:

> grb_rs --manager=http://mymanager:61080 --port=61000

The Remote Services agent (grb_rs) needs a directory to store various files, including the runtimes, job metadata, job log files, etc. The default location is a directory named data, located in the same directory as the grb_rs executable (<installdir>/bin/data). If you have a directory named data in your current directory, it will use that location instead.

If starting grb_rs produces an error message that indicates that there was a problem creating the storage service (as shown below), a likely cause is that another grb_rs process is already running.

fatal : Error creating storage service: Error opening data store: timeout

Starting a Cluster Node as a Service

While you always have the option of running grb_rs from a terminal and leaving the process running in the background, we recommended that you start it as a service instead, especially in a production deployment. The advantage of a service is that it will automatically restart itself if the computer is restarted or if the process terminates unexpectedly.

grb_rs provides several commands that help you to set it up as a service. These must be executed with administrator privileges:

grb_rs install

Install the service. The details of exactly what this involves depend on the host operating system type and version: this uses systemd or upstart on Linux, launchd on macOS, and Windows services on Windows.

grb_rs start

Start the service (and install it if it hasn’t already been installed).

grb_rs stop

Stop the service.

grb_rs restart

Stop and then start the service.

grb_rs uninstall

Uninstall the service.

Note that the install command installs the service using default settings. If you don’t need to modify any of these, you can use the start command to both install and start the service. Otherwise, run install to register the service, then modify the configuration (the details are platform-dependent and are touched on below), and then run start the service.

Note that you only need to start the service once; grb_rs will keep running until you execute the grb_rs stop command. In particular, it will start again automatically if you restart the machine.

Note also that the start command does not take any flags or additional parameters. All of the configuration properties must be set in the grb_rs.cnf configuration file. If you need to make a change, edit the configuration file, then use the stop command followed by the start command to restart grb_rs with the updated configuration.

The one exception is the JOBLIMIT property, which can be changed on a live server using grbcluster. If you change this property and later restart the server, the new value will persist and the value in the configuration file will be ignored.

The exact behavior of these commands varies depending on the host operating system and version:

Linux

On Linux, grb_rs supports two major service managers: systemd and upstart. The install command will detect the service manager available on your system and will generate a service configuration file located in /etc/systemd/system/grb_rs.service or /etc/init/grb_rs.conf for systemd and upstart, respectively. Once the file is generated, you can edit it to set advanced properties. Please refer to the documentation of systemd or upstart to learn more about service configuration.

Use the start and stop commands to start and stop the service. When the service is running, log messages are sent to the Linux syslog and to a rotating log file, service.log, located in the same directory as grb_rs.

The uninstall command will delete the generated file.

macOS

On macOS, the system manager is called launchd, and the install command will generate a service file in /Library/LaunchDaemons/grb_rs.plist. Once the file is generated, you can edit it to set advanced properties. Please refer to the launchd documentation to learn more about service configuration.

Use the start and stop commands to start and stop the service. When the service is running, log messages are sent to the macOS syslog and to a rotating log file, service.log, located in the same directory as grb_rs.

The uninstall command will delete the generated file.

Windows

On Windows, the install command will declare the service to the operating system. If you wish to set advanced properties for the service configuration, you will need to start the Services configuration application. Please refer to the Windows Operating System documentation for more details.

Use the start and stop commands to start and stop the service. When the service is running, log messages are sent to the Windows event log and to a rotating log file, service.log, located in the same directory as grb_rs. Note that the service must run as a user that has write permissions to this directory; otherwise, no log file will be generated.

The uninstall command will delete the service from the registry.

Verification

Once you have grb_rs running, you can check to make sure that you will be able to submit jobs to it.

Log In with a Cluster Manager

As we have explained earlier, the Cluster Manager initially creates three default users with predefined passwords:

  • standard user: gurobi / pass

  • administrator: admin / admin

  • system administrator: sysadmin / cluster

These default accounts are provided to simplify installation; you should change the passwords or delete the accounts before actually using the cluster.

You can check that you can log in using the sysadmin account with the grbcluster command-line tool:

> grbcluster login --manager=http://mymanager:61080 --username=sysadmin
info  : Using client license file '/home/jones/gurobi.lic'
Password for sysadmin:
info  : User gurobi connected to http://mymanager:61080, session will expire on...

Log In without a Cluster Manager

With a self-managed cluster, there are no user accounts, and the access level is determined by the password used. Here are the default passwords (which can be changed in the configuration file):

  • standard user: pass

  • administrator: admin

  • system administrator: cluster

In this case, you need to log in to one of the nodes and provide the system administrator password:

> grbcluster login --server=http://server1:61000
info  : Using client license file '/home/jones/gurobi.lic'
Enter password (return to use default):
info  : Connected to https://server1:61000

Note that the password you provide is stored in clear in the license file (for future use by other commands). With this in mind, make sure that access to the license file is restricted.

Accessing the Cluster

Once you have verified that you can log in, you should also check the list of nodes with the command:

> grbcluster nodes
ID       ADDRESS       STATUS TYPE    LICENSE PROCESSING #Q #R JL IDLE %MEM  %CPU
b7d037db server1:61000 ALIVE  COMPUTE VALID   ACCEPTING  0  0  10 <1s  10.89 4.99

You are ready to submit jobs if both of the following are true:

  • the STATUS column indicates that one or more servers are ALIVE

  • the LICENSE column indicates that the license is VALID.

If grbcluster is unable to connect or if it does not show any live nodes, then check your network and the log of the grb_rs nodes (the console output or <installdir>/bin/service.log if started as a service).

If a node has an INVALID license, the ERROR field will provide more information about the error. For example:

> grbcluster node licenses
ID       ADDRESS       STATUS  TYPE KEY EXP ORG USER APP VER CS    DL  ERROR
b7d037db server1:61000 INVALID NODE                          false 0   No Gurobi license found...

You may also want to verify that it is possible to submit a job to your cluster. To this end, you may want to identify a machine from which the users will typically submit jobs and install the gurobi client package. Then, you can submit a job with a command like the following:

> gurobi_cl misc07.mps

For more information on how to install the client and run gurobi_cl, please refer to the section about using Remote Services.