PDHG on GPU (alpha release)#

PDHG and linear optimization#

The Primal-Dual Hybrid Gradient (PDHG) algorithm is a first-order method for solving large-scale linear optimization problems (LP). It can be a viable alternative to our Barrier or Simplex algorithms for very large or computationally challenging problems. The algorithm lends itself well to parallelization on modern GPUs.

Note

This feature is part of a preview release and intended for experimentation rather than production use.

Recommended hardware and platform#

Using Gurobi’s PDHG algorithm on the GPU requires NVIDIA hardware. We recommend using an NVIDIA H100 or newer (our internal testing was conducted on a GH200 Grace Hopper Superchip). You must have NVIDIA drivers installed, but a full CUDA installation is not required. The required driver version is >= 525, < 580. See the CUDA driver compatibility table for details.

Only the armlinux64 and linux64 platforms are supported. We provide wheels for Python 3.11 and Python 3.13 (just download your version of choice and run pip install <wheel file name>) as well as our standard installer. The standard installer includes our command line tool gurobi_cl, which is used in the examples on this page.

Quickstart#

You can download the GPU alpha release from the download page. Once installed, you should run a simple test. A model that is particularly favorable for PDHG is the root relaxation of rwth-timetable, a MIPLIB model whose root relaxation is challenging to solve with Simplex or Barrier. You can download the LP relaxation model file using this command:

wget -O rwth-timetable-rel.mps.bz2 "https://share.gurobi.com/public.php/dav/files/pk3xYHcGqRqFZm9/?accept=zip"

To solve this model using PDHG on the GPU you should to set the Method parameter to 6, and the GURO_PAR_PDHGGPU parameter to 1:

./gurobi_cl Method=6 GURO_PAR_PDHGGPU=1 rwth-timetable-rel.mps.bz2

From this command, you should see output similar to this:

Gurobi Optimizer version 12.0.4 build v12.0.4a2 (armlinux64 - "Ubuntu 24.04.1 LTS")
Copyright (c) 2025, Gurobi Optimization, LLC

Read MPS format model from file rwth-timetable-rel.mps.bz2
Reading time = 7.12 seconds
rwth-timetable-rel: 440134 rows, 923564 columns, 4510786 nonzeros

Using Gurobi shared library libgurobi.so.12.0.4

CPU model: ARM64
Thread count: 72 physical cores, 72 logical processors, using up to 32 threads

Non-default parameters:
Method  6
GURO_PAR_PDHGGPU  1

Optimize a model with 440134 rows, 923564 columns and 4510786 nonzeros
Model fingerprint: 0xf4eab31e
Coefficient statistics:
  Matrix range     [1e+00, 2e+00]
  Objective range  [1e+00, 4e+05]
  Bounds range     [1e+00, 1e+01]
  RHS range        [1e+00, 3e+00]
Presolve removed 206440 rows and 549304 columns
Presolve time: 3.05s
Presolved: 233694 rows, 374260 columns, 3675892 nonzeros


Start PDHG on GPU

                       Objective                Residual
     Iter       Primal          Dual         Primal    Dual     Compl    Time
        0   2.43298400e+07 -1.18242230e+07  6.32e-01 0.00e+00  6.45e+01    4s
     9900   7.14911032e+05  7.15056557e+05  1.52e-03 1.92e-03  2.91e-04    5s
    47100   7.15088403e+05  7.15136961e+05  7.56e-04 5.31e-03  4.29e-05   10s
    84300   7.15261039e+05  7.15230778e+05  2.17e-08 9.48e-03  5.41e-05   15s
   121450   7.15261038e+05  7.15241276e+05  7.12e-09 4.91e-03  3.53e-05   20s
   158750   7.15261032e+05  7.15251680e+05  4.50e-08 2.75e-03  1.67e-05   25s
   196100   7.15260987e+05  7.15259205e+05  8.42e-06 3.18e-04  5.31e-06   30s
   197100   7.15259349e+05  7.15258791e+05  1.03e-05 5.69e-04  6.53e-06   30s

PDHG solved model in 197100 iterations and 30.14 seconds (1996.89 work units)
Optimal objective 7.15259349e+05

Crossover log...

   58232 DPushes remaining with DInf 2.2131730e-02                30s
     267 DPushes remaining with DInf 4.4412808e-03                35s
       0 DPushes remaining with DInf 3.6698525e-04                36s

   26221 PPushes remaining with PInf 1.5546474e+01                36s
       0 PPushes remaining with PInf 6.4717492e-02                39s

  Push phase complete: Pinf 6.4717492e-02, Dinf 6.7581825e+02     39s

Iteration    Objective       Primal Inf.    Dual Inf.      Time
   74917    7.1526099e+05   0.000000e+00   6.758183e+02     39s
   75117    7.1526098e+05   0.000000e+00   6.872417e+02     40s
Extra simplex iterations after uncrush: 2
   75797    7.1526097e+05   0.000000e+00   0.000000e+00     45s

Solved in 75797 iterations and 44.55 seconds (2015.85 work units)
Optimal objective  7.152609726e+05

After presolve, the PDHG iteration log is displayed. If you don’t see the line Start PDHG on GPU, then your GPU was not correctly detected, and the algorithm will run on the CPU instead. The PDHG iteration log is similar to what you’re used to from Barrier: it displays current primal and dual objective values, primal and dual residuals, and complementarity.

As the iterations progress, the gap between the primal and dual objectives reduces, and the residuals and complementarity decrease. Once the termination criteria are met (see Termination below for more details), crossover converts the PDHG solution to a basic solution. Note that only the PDHG algorithm itself runs on the GPU.

It can be interesting to experiment with different termination criteria. For example, you can loosen PDHG termination tolerances like the relative LP duality gap tolerance GURO_PAR_PDHGCONVTOL:

./gurobi_cl Method=6 GURO_PAR_PDHGGPU=1 GURO_PAR_PDHGCONVTOL=1e-4 rwth-timetable-rel.mps.gz

You should see log output similar to this:

Gurobi Optimizer version 12.0.4 build v12.0.4a2 (armlinux64 - "Ubuntu 24.04.1 LTS")
Copyright (c) 2025, Gurobi Optimization, LLC

Read MPS format model from file rwth-timetable-rel.mps.bz2
Reading time = 7.24 seconds
rwth-timetable-rel: 440134 rows, 923564 columns, 4510786 nonzeros

Using Gurobi shared library libgurobi.so.12.0.4

CPU model: ARM64
Thread count: 72 physical cores, 72 logical processors, using up to 32 threads

Non-default parameters:
Method  6
GURO_PAR_PDHGGPU  1
GURO_PAR_PDHGCONVTOL  0.0001

Optimize a model with 440134 rows, 923564 columns and 4510786 nonzeros
Model fingerprint: 0xf4eab31e
Coefficient statistics:
  Matrix range     [1e+00, 2e+00]
  Objective range  [1e+00, 4e+05]
  Bounds range     [1e+00, 1e+01]
  RHS range        [1e+00, 3e+00]
Presolve removed 206440 rows and 549304 columns
Presolve time: 3.04s
Presolved: 233694 rows, 374260 columns, 3675892 nonzeros


Start PDHG on GPU

                       Objective                Residual
     Iter       Primal          Dual         Primal    Dual     Compl    Time
        0   2.43298400e+07 -1.18242230e+07  6.32e-01 0.00e+00  6.45e+01    4s
    10000   7.14911940e+05  7.15057092e+05  1.50e-03 2.07e-03  2.87e-04    5s
    47450   7.15089030e+05  7.15139180e+05  7.56e-04 4.90e-03  4.14e-05   10s
    57800   7.15236468e+05  7.15167180e+05  1.63e-04 2.72e-02  1.66e-04   11s

PDHG solved model in 57800 iterations and 11.38 seconds (587.93 work units)
Optimal objective 7.15236468e+05

Crossover log...

   62844 DPushes remaining with DInf 8.5181171e-03                11s
     506 DPushes remaining with DInf 4.3306250e+02                15s
       0 DPushes remaining with DInf 4.5860113e+02                16s

   30135 PPushes remaining with PInf 3.8201461e-01                16s
       0 PPushes remaining with PInf 0.0000000e+00                19s

  Push phase complete: Pinf 0.0000000e+00, Dinf 8.2959619e+02     19s

Iteration    Objective       Primal Inf.    Dual Inf.      Time
   83704    7.1526101e+05   0.000000e+00   8.295962e+02     19s
   83974    7.1526100e+05   0.000000e+00   1.457673e+03     20s
   85194    7.1526098e+05   0.000000e+00   6.639458e+01     25s
Extra simplex iterations after uncrush: 3
   86171    7.1526097e+05   0.000000e+00   0.000000e+00     30s

Solved in 86171 iterations and 29.93 seconds (617.40 work units)
Optimal objective  7.152609726e+05

Increasing this tolerance results in significantly less work spent in the PDHG algorithm. However, due to the lower quality of the resulting solution, the crossover algorithm needs more time to succeed.

PDHG and termination#

By the nature of first-order methods, PDHG converges only at a linear rate to an optimal solution. In practice, this means that it will seldom achieve highly accurate solutions to absolute tolerances. For this reason, it is customary to terminate the algorithm upon achieving a reasonably small violation relative to the norm of the objective or right-hand-side vector. Consequently, the individual constraint violations may be much larger than for solutions obtained by Gurobi’s other LP algorithms.

To mitigate this accuracy problem, Gurobi’s crossover algorithm is automatically invoked from PDHG’s final solution. Crossover may be able to turn a low accuracy solution from PDHG into a high accuracy one. Crossover is an essential component when you seek highly accurate solutions. If your optimization application doesn’t require accurate solutions, you can experiment with switching it off through the Crossover parameter. You can assess the quality of the solution through the quality attributes.

As seen in the previous example, changing the convergence tolerances can have a profound effect on both PDHG and crossover work, and thus the overall run time. We will now explain the termination tolerances in detail.

Mathematical background#

Consider the following primal LP and its corresponding dual formulation:

Primal#

\[\begin{split}\begin{align*} \min ~& c^\top x \\ \text{s.t.}~& A x = b \\ & x \geq 0 \end{align*}\end{split}\]

Dual#

\[\begin{split}\begin{align*} \max ~& b^\top y \\ \text{s.t.}~& A^\top y \leq c \end{align*}\end{split}\]

These can be written as the following saddle-point problem:

\[\min_x \max_y L(x, y) = c^\top x - y^\top A x + b^\top y\]

This problem can be solved using the iterative primal–dual hybrid gradient (PDHG) scheme from Chambolle and Pock (2011) [1]:

\[x_{k+1} = \operatorname{proj}_{\mathbb{R}^n_{+}}\!\left(x_k + \tau A^\top y_k - \tau c\right)\]

\[y_{k+1} = y_{k} - \sigma A \bigl(2x_{k+1} - x_{k}\bigr) + \sigma b\]

Where \(\tau\) and \(\sigma\) are the primal and dual step sizes.

Based on this work, Applegate et al. (2021) [2] developed a practical algorithm to solve large LPs. This includes algorithmic improvements such as restarts and an adaptive step size calculation.

Termination#

PDHG will terminate with GRB_OPTIMAL status if both of the following conditions are satisfied by the solution vectors:

The relative difference between the primal and dual objective values is less than the specified tolerance, i.e., \(\left|b^\top y - c^\top x \right| \le \epsilon_{\text{conv}}\left(1 + |c^\top x|\right)\)
The primal and dual solution values meet the specified feasibility tolerance. This is satisfied if either
1. the absolute feasibility tolerance is met, i.e.
  - \(\lVert b - A x \rVert_{\infty} < \epsilon_{\text{abs}}\) and
  - \(\lVert c - A^\top y - \lambda \rVert_{\infty} < \epsilon_{\text{abs}}\);
2. or, the relative feasibility tolerance is met, i.e.
  - \(\lVert b - A x \rVert_2 \le \epsilon_{\text{rel}}\left(1 + \lVert b \rVert_2\right)\) and
  - \(\lVert c - A^\top y - \lambda \rVert_2 \le \epsilon_{\text{rel}}\left(1 + \lVert c\rVert_2\right)\).

The first criterion is controlled by GURO_PAR_PDHGCONVTOL (\(\epsilon_{\text{conv}}\)), the other two by GURO_PAR_PDHGABSTOL (\(\epsilon_{\text{abs}}\)) and GURO_PAR_PDHGRELTOL (\(\epsilon_{\text{rel}}\)), respectively.

Note that due to presolve and a scaling intrinsic to the PDHG algorithm, the solution quality with respect to the original problem may be worse than that suggested by the selected tolerances. Typically, this phenomenon occurs only if crossover was disabled and the solution has low accuracy in the first place.

PDHG Parameters#

This section details the relevant parameters for PDHG.

GURO_PAR_PDHGGPU#

Run PDHG on a GPU (if available)

Type: int
Default value: 0
Minimum value: 0
Maximum value: 1 try to run PDHG on the GPU

GURO_PAR_PDHGABSTOL#

PDHG absolute feasibility tolerance

Type: double
Default value: 1e-6
Minimum value: 1e-9
Maximum value: 1e-2

GURO_PAR_PDHGRELTOL#

PDHG relative feasibility tolerance

Type: double
Default value: 1e-6
Minimum value: 0
Maximum value: Infinity

Note that setting this parameter to 0 disables the convergence checks with respect to relative residual norms. PDHG will then only terminate if the absolute feasibility tolerances are met.

GURO_PAR_PDHGCONVTOL#

PDHG convergence tolerance

Type: double
Default value: 1e-6
Minimum value: 0
Maximum value: 1

GURO_PAR_PDHGITERLIMIT#

PDHG iteration limit

Type: double
Default value: Infinity
Minimum value: 0
Maximum value: Infinity

Found a bug?#

If you are experiencing strange behavior or think you’ve found a bug please don’t hesitate to open a ticket. We will aim to help as soon as we can!

Press `/` or click to search this documentation	Gurobi website	Gurobi community forum
Go to other Gurobi documentation	Gurobi download	Contact support

PDHG on GPU (alpha release)#

PDHG and linear optimization#

Recommended hardware and platform#

Quickstart#

PDHG and termination#

Mathematical background#

Primal#

Dual#

Termination#

PDHG Parameters#

GURO_PAR_PDHGGPU#

GURO_PAR_PDHGABSTOL#

GURO_PAR_PDHGRELTOL#

GURO_PAR_PDHGCONVTOL#

GURO_PAR_PDHGITERLIMIT#

Found a bug?#

References#