Batch Commands¶
In this section, we will review the most important commands available to
manage batches. We assume that the system administrator has installed
the cluster and that you have successfully executed the
grbcluster login
command with the appropriate flags to access your
cluster. Batch management is only available with a Cluster Manager.
Creating Batches¶
Once you are logged in to a Cluster Manager, you can use grbcluster
to create a batch. This will submit a non-interactive job. The typical
process involves the following three steps:
Create the batch and submit the non-interactive job. In this step, we indicate what model file we want to solve, and what result file we need. With this information,
grbcluster
will declare the batch, upload the input model file, and submit the solve request as a batch job. At this point, you can disconnect your client machine from the network (i.e., close your laptop). The request will be processed automatically.> grbcluster batch solve glass4.mps ResultFile=solution.sol info : Batch ada0a345-aa9e-4d6b-a7f0-05caf345d4e2 created info : Uploading glass4.mps... info : Batch ada0a345-aa9e-4d6b-a7f0-05caf345d4e2 submitted with job 66d4783b...
Monitoring. While the batch job is being executed, you can monitor the status if you wish. You can reference the batch you submitted using its batch ID.
> grbcluster batch status 2e05810c-911f-47ee-b695-27e1244fefd0 --wait info : Batch 2e05810c-911f-47ee-b695-27e1244fefd0 status is SUBMITTED info : Batch 2e05810c-911f-47ee-b695-27e1244fefd0 status is SUBMITTED info : Batch 2e05810c-911f-47ee-b695-27e1244fefd0 status is SUBMITTED info : Batch 2e05810c-911f-47ee-b695-27e1244fefd0 status is SUBMITTED info : Batch 2e05810c-911f-47ee-b695-27e1244fefd0 status is COMPLETED
Download the results. Once a batch is complete, you can download the log file and any optimization result. By default, results are stores in a directory having the same name as the batch ID. You should also delete the batch so that the Cluster Manager can delete the associated data from the database.
grbcluster batch download 2e05810c-911f-47ee-b695-27e1244fefd0 info : Results will be stored in directory 2e05810c-911f-47ee-b695-27e1244fefd0 info : Downloading solution.sol... info : Downloading gurobi.log... info : Discarding batch data
You can actually use grbcluster
to perform all three steps in a
single command:
> grbcluster batch solve ResultFile=solution.sol misc07.mps --download
info : Batch 5d0ea600-5068-4a0b-bee0-efa26c18f35b created
info : Uploading misc07.mps...
info : Batch 5d0ea600-5068-4a0b-bee0-efa26c18f35b submitted with job a9700b72...
info : Batch 5d0ea600-5068-4a0b-bee0-efa26c18f35b status is COMPLETED
info : Results will be stored in directory 5d0ea600-5068-4a0b-bee0-efa26c18f35b
info : Downloading solution.sol...
info : Downloading gurobi.log...
info : Discarding batch data
Listing Batches¶
Optimization jobs running on a Compute Server cluster can be listed by
using the batches
command. The batches
command is actually a
shortcut for the batch list
command. For example:
> grbcluster batches
ID JOB CREATED STATUS STIME USER PRIO API D SIZE INPUT OUTPUT
2e05810c ce7ab3a4 2019... COMPLETED 2019... jones 0 grbcluster X 0 glass4.mps solution.sol
ada0a345 66d4783b 2019... COMPLETED 2019... jones 0 grbcluster 288960 misc07.mps solution.sol
Note that you can get more information by using the —-long
flag.
With this flag, the command will also display the batch ID and the
complete job ID, which is unique, instead of the short ID. To get an
explanation of the meanings of the different fields, add the
—-describe
flag. For example:
> grbcluster batches --describe
ID - Unique batch ID, use --long to display full ID
JOB - Unique job ID, use --long to display full ID
CREATED - Batch created time
Status - Batch Status
STIME - Batch status updated time
USER - Client username (not displayed if empty or restricted)
APP - Application name (not displayed if empty or restricted)
PRIO - Batch priority
API - API type - Python, C++, Java, .NET, Matlab, R... (not displayed if empty or restricted)
D - Indicate if batch data was discarded
SIZE - Size of batch
INPUT - List filenames of input files (not displayed if empty or restricted)
OUTPUT - List filenames of output files (not displayed if empty or restricted)
RUNTIME - Batch runtime version, use --long
PID - Client process ID, use --long (not displayed if empty or restricted)
HOST - Client hostname, use --long (not displayed if empty or restricted)
IP - Client IP address, use --long (not displayed if empty or restricted)
APP - Client application name, use --long (not displayed if empty or restricted)
Aborting Batches¶
Batches submitted to a Cluster Manager can be aborted by using the
batch abort
command. For example:
> grbcluster batch abort 9bc34333
The following steps illustrate how you would start and subsequently
abort a job. First, use the Gurobi command-line tool (gurobi_cl
) to
start a long-running optimization job on your Compute Server:
> grbcluster batch solve glass4.mps ResultFile=solution.sol
Once the batch is submitted, you can use grbcluster batches
to
monitor your batches:
> grbcluster batches
ID JOB CREATED Status STIME USER PRIO API D SIZE INPUT
4aba4ad3 910878b9 2019... SUBMITTED 2019... jones 0 grbcluster 86579 glass4.mps
The full or short ID
can be used to abort the batch as follows:
> grbcluster batch abort 4aba4ad3
After the abort command is issued, the status of the batch can be
retrieved using the batches
command:
> grbcluster batches
ID JOB CREATED Status STIME USER PRIO API D SIZE INPUT
4aba4ad3 910878b9 2019... ABORTED 2019... jones 0 grbcluster 86579 glass4.mps
As you can see, the status of the batch has changed to ABORTED
. Note
also that the underlying job that was created to execute the batch is
also ABORTED
:
JOBID BATCHID ADDRESS STATUS STIME USER OPT API
910878b9 4aba4ad3 serevr1:61001 ABORTED 2019-09-22 15:55:24 jones grbcluster
Retrying Batches¶
If a batch fails to execute, you can resubmit that batch. This might for
example happen if the node where the batch job was running was shut down
or ran out of memory. All of the input files and parameters of the batch
specification are still stored by the Cluster Manager, so there is no
need to upload them again. You can simply issue the batch retry
command:
grbcluster batch retry edfa28f6-7abc-4af1-80a3-0b7472dcdcf0
info : Batch edfa28f6-7abc-4af1-80a3-0b7472dcdcf0 submitted for retry with job 9c6f1b59...
Note that a new batch job is created to execute the batch, but the batch specification does not change and you can still use the same batch ID to monitor progress. Note that it is not possible to retry a batch if it is currently running or if the batch data was discarded.
Discarding Batches¶
A batch has a set of input files and a set of result files that are stored in the Cluster Manager database. This enables the client to submit and disconnect while the batch is processed. Also, the results can be downloaded later when the client is ready. One consequence of this is that batches can consume significant space in the database. We may need to be careful to clean up data. It is important to discard batch data when you are done with it, to free up space in the database. Note that batch metadata is small and will still remain in the batch history for monitoring purposes even after you discard the batch.
By default, when using the grbcluster
command to download the
results, the batch will be discarded automatically. You can change the
default behavior by using the —-discard
flag if you may want to
download the results again later:
> grbcluster batch solve misc07.mps ResultFile=solution.sol --download --discard=false
info : Batch 076225d7-a1c9-462f-bfef-8e23c81d9f16 created
info : Uploading misc07.mps...
info : Batch 076225d7-a1c9-462f-bfef-8e23c81d9f16 submitted with job ef0861e9...
info : Batch 076225d7-a1c9-462f-bfef-8e23c81d9f16 status is SUBMITTED
info : Batch 076225d7-a1c9-462f-bfef-8e23c81d9f16 status is COMPLETED
info : Results will be stored in directory 076225d7-a1c9-462f-bfef-8e23c81d9f16
info : Downloading solution.sol...
info : Downloading gurobi.log...
You can check the space used by this batch by looking in the SIZE
column in the output of the batches
command:
> grbcluster batches --batchId=076225d7
ID JOB CREATED Status STIME USER PRIO API D SIZE INPUT OUTPUT
076225d7 ef0861e9 2019... COMPLETED 2019... jones 0 grbcluster 288960 misc07.mps solution.sol
In order to discard a batch manually, you can use the batch discard
command. You can verify that the size of the batch is 0 afterwards. You
will also notice that the D
column is flagged, indicating that the
batch was discarded.
> grbcluster batch discard 076225d7
info : Batch 076225d7-a1c9-462f-bfef-8e23c81d9f16 discarded
> ./grbcluster batches --batchId=076225d7
ID JOB CREATED Status STIME USER PRIO API D SIZE INPUT OUTPUT
076225d7 ef0861e9 2019... COMPLETED 2019... jones 0 grbcluster X 0 misc07.mps solution.sol
Note that the Cluster Manager will automatically discard and delete batches when they are older than the maximum age, as specified in the cluster retention policy. Developers submitting a batch with a programming language API should call the appropriate discard function once results have been retrieved.