PDHG on GPU (alpha release)#
PDHG and linear optimization#
The Primal-Dual Hybrid Gradient (PDHG) algorithm is a first-order method for solving large-scale linear optimization problems (LP). It can be a viable alternative to our Barrier or Simplex algorithms for very large or computationally challenging problems. The algorithm lends itself well to parallelization on modern GPUs.
Note
This feature is part of a preview release and intended for experimentation rather than production use.
Recommended hardware and platform#
Using Gurobi’s PDHG algorithm on the GPU requires NVIDIA hardware. We recommend using an NVIDIA H100 or newer (our internal testing was conducted on a GH200 Grace Hopper Superchip). You must have NVIDIA drivers installed, but a full CUDA installation is not required.
Only the armlinux64 and linux64 platforms are supported. We provide wheels for
Python 3.11 and Python 3.13 (just download your version of choice and run pip
install <wheel file name>
) as well as our standard installer. The standard
installer includes our command line tool gurobi_cl
, which is used in the
examples on this page.
Quickstart#
You can download the GPU alpha release from the download page. Once installed, you should run a simple test. A model that is particularly favorable for PDHG is the root relaxation of rwth-timetable, a MIPLIB model whose root relaxation is challenging to solve with Simplex or Barrier. You can download the LP relaxation model file using this command:
wget -O rwth-timetable-rel.mps.bz2 "https://share.gurobi.com/public.php/dav/files/pk3xYHcGqRqFZm9/?accept=zip"
To solve this model using PDHG on the GPU you should to set the Method parameter to 6, and the GURO_PAR_PDHGGPU parameter to 1:
./gurobi_cl Method=6 GURO_PAR_PDHGGPU=1 rwth-timetable-rel.mps.bz2
From this command, you should see output similar to this:
Gurobi Optimizer version 12.0.4 build v12.0.4a2 (armlinux64 - "Ubuntu 24.04.1 LTS")
Copyright (c) 2025, Gurobi Optimization, LLC
Read MPS format model from file rwth-timetable-rel.mps.bz2
Reading time = 7.12 seconds
rwth-timetable-rel: 440134 rows, 923564 columns, 4510786 nonzeros
Using Gurobi shared library libgurobi.so.12.0.4
CPU model: ARM64
Thread count: 72 physical cores, 72 logical processors, using up to 32 threads
Non-default parameters:
Method 6
GURO_PAR_PDHGGPU 1
Optimize a model with 440134 rows, 923564 columns and 4510786 nonzeros
Model fingerprint: 0xf4eab31e
Coefficient statistics:
Matrix range [1e+00, 2e+00]
Objective range [1e+00, 4e+05]
Bounds range [1e+00, 1e+01]
RHS range [1e+00, 3e+00]
Presolve removed 206440 rows and 549304 columns
Presolve time: 3.05s
Presolved: 233694 rows, 374260 columns, 3675892 nonzeros
Start PDHG on GPU
Objective Residual
Iter Primal Dual Primal Dual Compl Time
0 2.43298400e+07 -1.18242230e+07 6.32e-01 0.00e+00 6.45e+01 4s
9900 7.14911032e+05 7.15056557e+05 1.52e-03 1.92e-03 2.91e-04 5s
47100 7.15088403e+05 7.15136961e+05 7.56e-04 5.31e-03 4.29e-05 10s
84300 7.15261039e+05 7.15230778e+05 2.17e-08 9.48e-03 5.41e-05 15s
121450 7.15261038e+05 7.15241276e+05 7.12e-09 4.91e-03 3.53e-05 20s
158750 7.15261032e+05 7.15251680e+05 4.50e-08 2.75e-03 1.67e-05 25s
196100 7.15260987e+05 7.15259205e+05 8.42e-06 3.18e-04 5.31e-06 30s
197100 7.15259349e+05 7.15258791e+05 1.03e-05 5.69e-04 6.53e-06 30s
PDHG solved model in 197100 iterations and 30.14 seconds (1996.89 work units)
Optimal objective 7.15259349e+05
Crossover log...
58232 DPushes remaining with DInf 2.2131730e-02 30s
267 DPushes remaining with DInf 4.4412808e-03 35s
0 DPushes remaining with DInf 3.6698525e-04 36s
26221 PPushes remaining with PInf 1.5546474e+01 36s
0 PPushes remaining with PInf 6.4717492e-02 39s
Push phase complete: Pinf 6.4717492e-02, Dinf 6.7581825e+02 39s
Iteration Objective Primal Inf. Dual Inf. Time
74917 7.1526099e+05 0.000000e+00 6.758183e+02 39s
75117 7.1526098e+05 0.000000e+00 6.872417e+02 40s
Extra simplex iterations after uncrush: 2
75797 7.1526097e+05 0.000000e+00 0.000000e+00 45s
Solved in 75797 iterations and 44.55 seconds (2015.85 work units)
Optimal objective 7.152609726e+05
After presolve, the PDHG iteration log is displayed. If you don’t see the line
Start PDHG on GPU
, then your GPU was not correctly detected, and the
algorithm will run on the CPU instead. The PDHG iteration log is similar to what
you’re used to from Barrier: it displays current primal and dual objective values,
primal and dual residuals, and complementarity.
As the iterations progress, the gap between the primal and dual objectives reduces, and the residuals and complementarity decrease. Once the termination criteria are met (see Termination below for more details), crossover converts the PDHG solution to a basic solution. Note that only the PDHG algorithm itself runs on the GPU.
It can be interesting to experiment with different termination criteria. For example, you can loosen PDHG termination tolerances like the relative LP duality gap tolerance GURO_PAR_PDHGCONVTOL:
./gurobi_cl Method=6 GURO_PAR_PDHGGPU=1 GURO_PAR_PDHGCONVTOL=1e-4 rwth-timetable-rel.mps.gz
You should see log output similar to this:
Gurobi Optimizer version 12.0.4 build v12.0.4a2 (armlinux64 - "Ubuntu 24.04.1 LTS")
Copyright (c) 2025, Gurobi Optimization, LLC
Read MPS format model from file rwth-timetable-rel.mps.bz2
Reading time = 7.24 seconds
rwth-timetable-rel: 440134 rows, 923564 columns, 4510786 nonzeros
Using Gurobi shared library libgurobi.so.12.0.4
CPU model: ARM64
Thread count: 72 physical cores, 72 logical processors, using up to 32 threads
Non-default parameters:
Method 6
GURO_PAR_PDHGGPU 1
GURO_PAR_PDHGCONVTOL 0.0001
Optimize a model with 440134 rows, 923564 columns and 4510786 nonzeros
Model fingerprint: 0xf4eab31e
Coefficient statistics:
Matrix range [1e+00, 2e+00]
Objective range [1e+00, 4e+05]
Bounds range [1e+00, 1e+01]
RHS range [1e+00, 3e+00]
Presolve removed 206440 rows and 549304 columns
Presolve time: 3.04s
Presolved: 233694 rows, 374260 columns, 3675892 nonzeros
Start PDHG on GPU
Objective Residual
Iter Primal Dual Primal Dual Compl Time
0 2.43298400e+07 -1.18242230e+07 6.32e-01 0.00e+00 6.45e+01 4s
10000 7.14911940e+05 7.15057092e+05 1.50e-03 2.07e-03 2.87e-04 5s
47450 7.15089030e+05 7.15139180e+05 7.56e-04 4.90e-03 4.14e-05 10s
57800 7.15236468e+05 7.15167180e+05 1.63e-04 2.72e-02 1.66e-04 11s
PDHG solved model in 57800 iterations and 11.38 seconds (587.93 work units)
Optimal objective 7.15236468e+05
Crossover log...
62844 DPushes remaining with DInf 8.5181171e-03 11s
506 DPushes remaining with DInf 4.3306250e+02 15s
0 DPushes remaining with DInf 4.5860113e+02 16s
30135 PPushes remaining with PInf 3.8201461e-01 16s
0 PPushes remaining with PInf 0.0000000e+00 19s
Push phase complete: Pinf 0.0000000e+00, Dinf 8.2959619e+02 19s
Iteration Objective Primal Inf. Dual Inf. Time
83704 7.1526101e+05 0.000000e+00 8.295962e+02 19s
83974 7.1526100e+05 0.000000e+00 1.457673e+03 20s
85194 7.1526098e+05 0.000000e+00 6.639458e+01 25s
Extra simplex iterations after uncrush: 3
86171 7.1526097e+05 0.000000e+00 0.000000e+00 30s
Solved in 86171 iterations and 29.93 seconds (617.40 work units)
Optimal objective 7.152609726e+05
Increasing this tolerance results in significantly less work spent in the PDHG algorithm. However, due to the lower quality of the resulting solution, the crossover algorithm needs more time to succeed.
PDHG and termination#
By the nature of first-order methods, PDHG converges only at a linear rate to an optimal solution. In practice, this means that it will seldom achieve highly accurate solutions to absolute tolerances. For this reason, it is customary to terminate the algorithm upon achieving a reasonably small violation relative to the norm of the objective or right-hand-side vector. Consequently, the individual constraint violations may be much larger than for solutions obtained by Gurobi’s other LP algorithms.
To mitigate this accuracy problem, Gurobi’s crossover algorithm is automatically invoked from PDHG’s final solution. Crossover may be able to turn a low accuracy solution from PDHG into a high accuracy one. Crossover is an essential component when you seek highly accurate solutions. If your optimization application doesn’t require accurate solutions, you can experiment with switching it off through the Crossover parameter. You can assess the quality of the solution through the quality attributes.
As seen in the previous example, changing the convergence tolerances can have a profound effect on both PDHG and crossover work, and thus the overall run time. We will now explain the termination tolerances in detail.
Mathematical background#
Consider the following primal LP and its corresponding dual formulation:
Primal#
Dual#
These can be written as the following saddle-point problem:
This problem can be solved using the iterative primal–dual hybrid gradient (PDHG) scheme from Chambolle and Pock (2011) [1]:
Where \(\tau\) and \(\sigma\) are the primal and dual step sizes.
Based on this work, Applegate et al. (2021) [2] developed a practical algorithm to solve large LPs. This includes algorithmic improvements such as restarts and an adaptive step size calculation.
Termination#
PDHG will terminate with GRB_OPTIMAL
status if both of the following
conditions are satisfied by the solution vectors:
The relative difference between the primal and dual objective values is less than the specified tolerance, i.e., \(\left|b^\top y - c^\top x \right| \le \epsilon_{\text{conv}}\left(1 + |c^\top x|\right)\)
The primal and dual solution values meet the specified feasibility tolerance. This is satisfied if either
the absolute feasibility tolerance is met, i.e.
\(\lVert b - A x \rVert_{\infty} < \epsilon_{\text{abs}}\) and
\(\lVert c - A^\top y - \lambda \rVert_{\infty} < \epsilon_{\text{abs}}\);
or, the relative feasibility tolerance is met, i.e.
\(\lVert b - A x \rVert_2 \le \epsilon_{\text{rel}}\left(1 + \lVert b \rVert_2\right)\) and
\(\lVert c - A^\top y - \lambda \rVert_2 \le \epsilon_{\text{rel}}\left(1 + \lVert c\rVert_2\right)\).
The first criterion is controlled by GURO_PAR_PDHGCONVTOL (\(\epsilon_{\text{conv}}\)), the other two by GURO_PAR_PDHGABSTOL (\(\epsilon_{\text{abs}}\)) and GURO_PAR_PDHGRELTOL (\(\epsilon_{\text{rel}}\)), respectively.
Note that due to presolve and a scaling intrinsic to the PDHG algorithm, the solution quality with respect to the original problem may be worse than that suggested by the selected tolerances. Typically, this phenomenon occurs only if crossover was disabled and the solution has low accuracy in the first place.
PDHG Parameters#
This section details the relevant parameters for PDHG.
GURO_PAR_PDHGGPU#
Run PDHG on a GPU (if available)
Type:
int
Default value:
0
Minimum value:
0
Maximum value:
1
try to run PDHG on the GPU
GURO_PAR_PDHGABSTOL#
PDHG absolute feasibility tolerance
Type:
double
Default value:
1e-6
Minimum value:
1e-9
Maximum value:
1e-2
GURO_PAR_PDHGRELTOL#
PDHG relative feasibility tolerance
Type:
double
Default value:
1e-6
Minimum value:
0
Maximum value:
Infinity
Note that setting this parameter to 0
disables the convergence
checks with respect to relative residual norms. PDHG will then only
terminate if the absolute feasibility tolerances are met.
GURO_PAR_PDHGCONVTOL#
PDHG convergence tolerance
Type:
double
Default value:
1e-6
Minimum value:
0
Maximum value:
1
GURO_PAR_PDHGITERLIMIT#
PDHG iteration limit
Type:
double
Default value:
Infinity
Minimum value:
0
Maximum value:
Infinity
Found a bug?#
If you are experiencing strange behavior or think you’ve found a bug please don’t hesitate to open a ticket. We will aim to help as soon as we can!