Batch Commands#

In this section, we will review the most important commands available to manage batches. We assume that the system administrator has installed the cluster and that you have successfully executed the grbcluster login command with the appropriate flags to access your cluster. Batch management is only available with a Cluster Manager.

Creating Batches#

Once you are logged in to a Cluster Manager, you can use grbcluster to create a batch. This will submit a non-interactive job. The typical process involves the following three steps:

  • Create the batch and submit the non-interactive job. In this step, we indicate what model file we want to solve, and what result file we need. With this information, grbcluster will declare the batch, upload the input model file, and submit the solve request as a batch job. At this point, you can disconnect your client machine from the network (i.e., close your laptop). The request will be processed automatically.

    > grbcluster batch solve glass4.mps ResultFile=solution.sol
    info  : Batch ada0a345-aa9e-4d6b-a7f0-05caf345d4e2 created
    info  : Uploading glass4.mps...
    info  : Batch ada0a345-aa9e-4d6b-a7f0-05caf345d4e2 submitted with job 66d4783b...
    
  • Monitoring. While the batch job is being executed, you can monitor the status if you wish. You can reference the batch you submitted using its batch ID.

    > grbcluster batch status 2e05810c-911f-47ee-b695-27e1244fefd0 --wait
    info  : Batch 2e05810c-911f-47ee-b695-27e1244fefd0 status is SUBMITTED
    info  : Batch 2e05810c-911f-47ee-b695-27e1244fefd0 status is SUBMITTED
    info  : Batch 2e05810c-911f-47ee-b695-27e1244fefd0 status is SUBMITTED
    info  : Batch 2e05810c-911f-47ee-b695-27e1244fefd0 status is SUBMITTED
    info  : Batch 2e05810c-911f-47ee-b695-27e1244fefd0 status is COMPLETED
    
  • Download the results. Once a batch is complete, you can download the log file and any optimization result. By default, results are stores in a directory having the same name as the batch ID. You should also delete the batch so that the Cluster Manager can delete the associated data from the database.

    grbcluster batch download 2e05810c-911f-47ee-b695-27e1244fefd0
    info  : Results will be stored in directory 2e05810c-911f-47ee-b695-27e1244fefd0
    info  : Downloading solution.sol...
    info  : Downloading gurobi.log...
    info  : Discarding batch data
    

You can actually use grbcluster to perform all three steps in a single command:

> grbcluster batch solve ResultFile=solution.sol misc07.mps --download
info  : Batch 5d0ea600-5068-4a0b-bee0-efa26c18f35b created
info  : Uploading misc07.mps...
info  : Batch 5d0ea600-5068-4a0b-bee0-efa26c18f35b submitted with job a9700b72...
info  : Batch 5d0ea600-5068-4a0b-bee0-efa26c18f35b status is COMPLETED
info  : Results will be stored in directory 5d0ea600-5068-4a0b-bee0-efa26c18f35b
info  : Downloading solution.sol...
info  : Downloading gurobi.log...
info  : Discarding batch data

Listing Batches#

Optimization jobs running on a Compute Server cluster can be listed by using the batches command. The batches command is actually a shortcut for the batch list command. For example:

> grbcluster batches
ID       JOB      CREATED STATUS    STIME   USER  PRIO API        D SIZE   INPUT      OUTPUT
2e05810c ce7ab3a4 2019... COMPLETED 2019... jones 0    grbcluster X 0      glass4.mps solution.sol
ada0a345 66d4783b 2019... COMPLETED 2019... jones 0    grbcluster   288960 misc07.mps solution.sol

Note that you can get more information by using the —-long flag. With this flag, the command will also display the batch ID and the complete job ID, which is unique, instead of the short ID. To get an explanation of the meanings of the different fields, add the —-describe flag. For example:

> grbcluster batches --describe
ID        - Unique batch ID, use --long to display full ID
JOB       - Unique job ID, use --long to display full ID
CREATED   - Batch created  time
Status    - Batch Status
STIME     - Batch status updated time
USER      - Client username (not displayed if empty or restricted)
APP       - Application name (not displayed if empty or restricted)
PRIO      - Batch priority
API       - API type - Python, C++, Java, .NET, Matlab, R... (not displayed if empty or restricted)
D         - Indicate if batch data was discarded
SIZE      - Size of batch
INPUT     - List filenames of input files (not displayed if empty or restricted)
OUTPUT    - List filenames of output files (not displayed if empty or restricted)
RUNTIME   - Batch runtime version, use --long
PID       - Client process ID, use --long (not displayed if empty or restricted)
HOST      - Client hostname, use --long (not displayed if empty or restricted)
IP        - Client IP address, use --long (not displayed if empty or restricted)
APP       - Client application name, use --long (not displayed if empty or restricted)

Aborting Batches#

Batches submitted to a Cluster Manager can be aborted by using the batch abort command. For example:

> grbcluster batch abort 9bc34333

The following steps illustrate how you would start and subsequently abort a job. First, use the Gurobi command-line tool (gurobi_cl) to start a long-running optimization job on your Compute Server:

> grbcluster batch solve glass4.mps ResultFile=solution.sol

Once the batch is submitted, you can use grbcluster batches to monitor your batches:

> grbcluster batches
ID       JOB      CREATED Status    STIME   USER  PRIO API        D SIZE   INPUT
4aba4ad3 910878b9 2019... SUBMITTED 2019... jones 0    grbcluster   86579  glass4.mps

The full or short ID can be used to abort the batch as follows:

> grbcluster batch abort 4aba4ad3

After the abort command is issued, the status of the batch can be retrieved using the batches command:

> grbcluster batches
ID       JOB      CREATED Status    STIME   USER  PRIO API        D SIZE   INPUT
4aba4ad3 910878b9 2019... ABORTED   2019... jones 0    grbcluster   86579  glass4.mps

As you can see, the status of the batch has changed to ABORTED. Note also that the underlying job that was created to execute the batch is also ABORTED:

JOBID    BATCHID  ADDRESS       STATUS  STIME               USER   OPT     API
910878b9 4aba4ad3 serevr1:61001 ABORTED 2019-09-22 15:55:24 jones          grbcluster

Retrying Batches#

If a batch fails to execute, you can resubmit that batch. This might for example happen if the node where the batch job was running was shut down or ran out of memory. All of the input files and parameters of the batch specification are still stored by the Cluster Manager, so there is no need to upload them again. You can simply issue the batch retry command:

grbcluster batch retry edfa28f6-7abc-4af1-80a3-0b7472dcdcf0
info  : Batch edfa28f6-7abc-4af1-80a3-0b7472dcdcf0 submitted for retry with job 9c6f1b59...

Note that a new batch job is created to execute the batch, but the batch specification does not change and you can still use the same batch ID to monitor progress. Note that it is not possible to retry a batch if it is currently running or if the batch data was discarded.

Discarding Batches#

A batch has a set of input files and a set of result files that are stored in the Cluster Manager database. This enables the client to submit and disconnect while the batch is processed. Also, the results can be downloaded later when the client is ready. One consequence of this is that batches can consume significant space in the database. We may need to be careful to clean up data. It is important to discard batch data when you are done with it, to free up space in the database. Note that batch metadata is small and will still remain in the batch history for monitoring purposes even after you discard the batch.

By default, when using the grbcluster command to download the results, the batch will be discarded automatically. You can change the default behavior by using the —-discard flag if you may want to download the results again later:

> grbcluster batch solve misc07.mps ResultFile=solution.sol --download --discard=false
info  : Batch 076225d7-a1c9-462f-bfef-8e23c81d9f16 created
info  : Uploading misc07.mps...
info  : Batch 076225d7-a1c9-462f-bfef-8e23c81d9f16 submitted with job ef0861e9...
info  : Batch 076225d7-a1c9-462f-bfef-8e23c81d9f16 status is SUBMITTED
info  : Batch 076225d7-a1c9-462f-bfef-8e23c81d9f16 status is COMPLETED
info  : Results will be stored in directory 076225d7-a1c9-462f-bfef-8e23c81d9f16
info  : Downloading solution.sol...
info  : Downloading gurobi.log...

You can check the space used by this batch by looking in the SIZE column in the output of the batches command:

> grbcluster batches --batchId=076225d7
ID       JOB      CREATED  Status    STIME   USER  PRIO API        D SIZE   INPUT      OUTPUT
076225d7 ef0861e9 2019...  COMPLETED 2019... jones 0    grbcluster   288960 misc07.mps solution.sol

In order to discard a batch manually, you can use the batch discard command. You can verify that the size of the batch is 0 afterwards. You will also notice that the D column is flagged, indicating that the batch was discarded.

> grbcluster batch discard 076225d7
info  : Batch 076225d7-a1c9-462f-bfef-8e23c81d9f16 discarded

> ./grbcluster batches --batchId=076225d7
ID       JOB      CREATED  Status    STIME   USER  PRIO API        D SIZE INPUT      OUTPUT
076225d7 ef0861e9 2019...  COMPLETED 2019... jones 0    grbcluster X 0    misc07.mps solution.sol

Note that the Cluster Manager will automatically discard and delete batches when they are older than the maximum age, as specified in the cluster retention policy. Developers submitting a batch with a programming language API should call the appropriate discard function once results have been retrieved.