Tolerances and User-Scaling#
Gurobi will solve the model as defined by the user. However, when evaluating a candidate solution for feasibility, in order to account for possible round-off errors in the floating-point evaluations, we must allow for some tolerances.
To be more precise, satisfying Optimality Conditions requires us to test at least the following three criteria:
- IntFeasTol
Integrality of solutions, i.e., whether a integer variable \(x\) takes an integer value or not. More precisely, \(x\) will be considered integral if \(\text{abs}(x - \text{floor}(x + 0.5)) \leq \text{IntFeasTol}\).
- FeasibilityTol
Feasibility of primal constraints, i.e., whether \(a \cdot x \leq b\) holds for the primal solution. More precisely, \(a \cdot x \leq b\) will be considered to hold if \((a * x) - b \leq \text{FeasibilityTol}\).
- OptimalityTol
Feasibility of dual constraints, i.e., whether \(a \cdot y \leq c\) holds for the dual solution. More precisely, \(a \cdot y \leq c\) will be considered to hold if \((a * y) - c \leq \text{OptimalityTol}\).
Note that these tolerances are absolute; they do not depend on the scale of the quantities involved in the computation. This means that when formulating a problem, these tolerances should be taken into account, specially to select the units in which variables and constraints will be expressed.
It is very important to note that the usage of these tolerances implicitly defines a gray zone in the search space in which solutions that are very slightly infeasible can still be accepted as feasible. However, the solver will not explicitly search for such solutions.
For this reason, it is actually possible (although highly unlikely for well-posed problems) for a model to be reported as being both feasible and infeasible (in the sense stated above). This can occur if the model is infeasible in exact arithmetic, but there exists a solution that is feasible within the solver tolerances. For instance, consider:
Models at the Edge of Infeasibility#
As we saw in the introduction, seemingly contradictory results regarding the feasibility or infeasibility of a model can legitimately occur for models that are at the boundary between feasibility and infeasibility.
A more complicated example is \(x + 10^{8} y = -1, x\), \(y >= 0\). It has two bases, one where \(x\) is basic and one where \(y\) is basic. If \(x\) is basic, we get \(x = -1\), which is clearly infeasible. However, if \(y\) is basic we get \(y = -10^{8}\), which is feasible within tolerance. Different algorithms could lead to either of such bases and thus come to apparently contradictory feasibility results.
Presolve reductions can also play a role. A presolve reduction, e.g. fixing a variable to a bound, implicitly forces a tolerance of 0 for that variable. When solving the reduced model, the optimizer therefore no longer has the option to “spread” a slight infeasibility of the model over these variables and produce a solution that is feasible within tolerances. This leads to seemingly contradictory feasibilty results when solving the model with presolve enabled or disabled.
What can be done to diagnose such cases:
First step is to tighten the FeasibilityTol to \(10^{-9}\) and try again. In many cases this will lead to a consistent declaration of infeasibility of the model at hand, which tells you that the model is on this boundary of infeasibility.
Use
feasRelax
(e.g.Model.feasRelax
in Python) to solve your model (again with a tight FeasibilityTol. This boundary case is identified by a non-zero relaxation value.Compute the IIS (again with a tight FeasibilityTol) to analyze the infeasibility.
Another source of seemingly contradictory results is due to numerical issues of the model and will be discussed in the following subsections.
Gurobi Tolerances and the Limitations of Double-Precision Arithmetic#
The default values for these primal and dual feasibility tolerances are \(10^{-6}\), and the default for the integrality tolerance is \(10^{-5}\). If you choose the range for your inequalities and variables correctly, you can typically ignore tolerance issues entirely.
To give an example, if your constraint right-hand side is on the order of \(10^3\), then relative numeric errors from computations involving the constraint (if any) are likely to be less than \(10^{-9}\), i.e., less than one in a billion. This is usually far more accurate than the accuracy of input data, or even of what can be measured in practice.
However, if you define a variable \(x\in[-10^{-6},10^{-6}]\), then relative numeric error may be as big as 50% of the variable range.
If, on the other hand, you have a variable \(x\in[-10^{10},10^{10}]\), and you are using default primal feasibility tolerances; then what you are really asking is for the relative numeric error (if any) to be less than \(10^{-16}\). However, this is beyond the limits of comparison for double-precision numbers. This implies that you are not allowing any round-off error at all when testing feasible solutions for this particular variable. And although this might sound as a good idea, in fact, it is really bad, as any round-off computation may result in your truly optimal solution being rejected as infeasible.
Why Scaling and Geometry is Relevant#
This section provides a simple example of how scaling problems can slow down problem solving and, in extreme cases, result in unexpected answers. Consider the problem:
and let \(D\) be a diagonal matrix where \(D_{ii} > 0,\,\forall i\). In theory, solving \((P)\) should be equivalent to solving the related problem \((P_D)\):
However, in practice, the two models behave very differently. To
demonstrate this, we use a simple script rescale.py
that randomly rescales the columns of the model. Let’s consider the
impact of rescaling on the problem pilotnov.mps.bz2
. Solving the
original problem gives the following output:
Optimize a model with 975 rows, 2172 columns and 13057 nonzeros
Model fingerprint: 0x658fc661
Coefficient statistics:
Matrix range [2e-06, 6e+06]
Objective range [3e-03, 9e-01]
Bounds range [1e-05, 6e+04]
RHS range [1e-05, 4e+04]
Warning: Model contains large matrix coefficient range
Consider reformulating model or setting NumericFocus parameter
to avoid numerical issues.
Presolve removed 254 rows and 513 columns
Presolve time: 0.00s
Presolved: 721 rows, 1659 columns, 11455 nonzeros
Iteration Objective Primal Inf. Dual Inf. Time
0 -3.2008682e+05 1.311826e+05 0.000000e+00 0s
Extra simplex iterations after uncrush: 1
1022 -4.4972762e+03 0.000000e+00 0.000000e+00 0s
Solved in 1022 iterations and 0.03 seconds (0.05 work units)
Optimal objective -4.497276188e+03
Kappa: 3.9032407e+7
Note the message regarding the matrix coefficient range in the log (which in this case shows a range of [2e-06, 6e+06]).
If we run rescale.py -f pilotnov.mps.bz2 -s 1e3
(randomly rescaling
columns up or down by as much as \(10^3\)), we obtain:
Optimize a model with 975 rows, 2172 columns and 13057 nonzeros
Model fingerprint: 0xf23dce03
Coefficient statistics:
Matrix range [3e-09, 9e+09]
Objective range [2e-06, 9e+02]
Bounds range [5e-09, 6e+07]
RHS range [1e-05, 4e+04]
Warning: Model contains large matrix coefficient range
Consider reformulating model or setting NumericFocus parameter
to avoid numerical issues.
Presolve time: 0.00s
Presolved: 852 rows, 1801 columns, 11632 nonzeros
Iteration Objective Primal Inf. Dual Inf. Time
0 -6.1770301e+32 7.412884e+31 6.177030e+02 0s
Extra simplex iterations after uncrush: 2
1476 -4.4972762e+03 0.000000e+00 0.000000e+00 0s
Solved in 1476 iterations and 0.05 seconds (0.10 work units)
Optimal objective -4.497276188e+03
Kappa: 1.953834e+17
This time, the optimization process takes more iterations. In both runs we get a warning:
Extra simplex iterations after uncrush
This indicates that extra simplex iterations were performed on the
unpresolved model; in the first run one extra iteration and in the second
run two.
Also, note the very large value for Kappa
; its
meaning will be discussed in this section.
If we run rescale.py -f pilotnov.mps.bz2 -s 1e7
, we obtain:
Optimize a model with 975 rows, 2172 columns and 13057 nonzeros
Model fingerprint: 0xe7790ac5
Coefficient statistics:
Matrix range [4e-13, 9e+13]
Objective range [4e-09, 2e+07]
Bounds range [5e-13, 3e+11]
RHS range [1e-05, 4e+04]
Warning: Model contains large matrix coefficient range
Warning: Model contains large bounds
Consider reformulating model or setting NumericFocus parameter
to avoid numerical issues.
Presolve time: 0.00s
Presolved: 849 rows, 1801 columns, 11626 nonzeros
Iteration Objective Primal Inf. Dual Inf. Time
0 -7.8700728e+35 7.269558e+31 7.870073e+05 0s
Warning: 1 variables dropped from basis
Warning: switch to quad precision
Warning: 1 variables dropped from basis
Extra simplex iterations after uncrush: 12023
16200 -4.4972762e+03 0.000000e+00 0.000000e+00 1s
Solved in 16200 iterations and 1.30 seconds (1.40 work units)
Optimal objective -4.497276188e+03
Warning: unscaled primal violation = 839220 and residual = 0.459669
Kappa: 3.621029e+14
Now we get a much larger number of extra simplex iterations, additional warnings, and more troublingly, we get a warning about the quality of the resulting solution:
Warning: unscaled primal violation = 839220 and residual = 0.459669
This message indicates that the solver had trouble finding a solution that satisfies the default tolerances.
Finally, if we run rescale.py -f pilotnov.mps.bz2 -s 1e8
, we obtain:
Optimize a model with 975 rows, 2172 columns and 13054 nonzeros
Model fingerprint: 0xe0cb48da
Coefficient statistics:
Matrix range [3e-13, 1e+15]
Objective range [3e-11, 1e+08]
Bounds range [5e-14, 9e+12]
RHS range [1e-05, 4e+04]
Warning: Model contains large matrix coefficient range
Warning: Model contains large bounds
Consider reformulating model or setting NumericFocus parameter
to avoid numerical issues.
Presolve time: 0.00s
Presolved: 836 rows, 1787 columns, 11441 nonzeros
Iteration Objective Primal Inf. Dual Inf. Time
0 -6.5442915e+36 7.208386e+31 6.544291e+06 0s
Solved in 488 iterations and 0.01 seconds (0.01 work units)
Infeasible or unbounded model
In this case, the optimization run terminates almost instantly, but with
the unexpected Infeasible or unbounded model
result.
As you can see, as we performed larger and larger rescalings, we
continued to obtain the same optimal value, but there were clear signs
that the solver struggled. We see warning messages, as well as increasing
iteration counts, runtimes, and large Kappa
values. However, once we pass
a certain rescaling value, the solver is no longer able to solve the
model and instead reports that it is Infeasible or unbounded
.
Note that this is not a bug in Gurobi. It has to do with changing the meaning of numbers depending on their range, the use of fixed tolerances, and in the changing geometry of the problem due to scaling. We will discuss this topic further in Instability and the geometry of optimization problems.
Recommended Ranges for Variables and Constraints#
Keeping the lessons of the previous section in mind, we recommended that right-hand sides of inequalities representing physical quantities (even budgets) should be scaled so that they are on the order of \(10^4\) or less. The same applies to variable domains, as variable bounds are again linear constraints.
In the case of objective functions, we recommend that good solutions should have an optimal value that is less than \(10^4\), and ideally also above one (unless the objective coefficients are all zero). This is because the OptimalityTol is used to ensure that reduced cost are close enough to zero. If coefficients are too large, we again face difficulties in determining whether an LP solution truly satisfies the optimality conditions or not. On the other hand, if the coefficients are too small, then it may be too easy to satisfy the feasibility conditions.
The coefficients of the constraint matrix are actually more important than the right-hand side values, variable bounds, and objective coefficients mentioned here. We’ll discuss those shortly.
Improving Ranges for Variables and Constraints#
There are three common ways to improve ranges for objectives, constraints and variables:
Use problem-specific information to tighten bounds:
Although presolve, and, in particular, bound strengthening, is quite good at deriving implied variables bounds, it may not have access to all of the information known to the modeler. Incorporating tighter bounds directly into the model can not only improve the numerical behavior, but it can also speed up the optimization process.
Choose the right units to express your variables and constraints:
When defining your variables and constraints, it is important to choose units that are consistent with tolerances. To give an example, a constraint with a \(10^{10}\) right-hand side value is not going to work well with the default \(10^{-6}\) feasibility tolerance. By changing the units (e.g., replacing pounds with tons, or dollars with millions of dollars, or …), it is often possible to significantly improve the numerics of the problems.
Disaggregate multiple objectives:
A common source for very large range of objective coefficients is the practice of modeling hierarchical objectives as an aggregation of objective functions with large multipliers. For example, if the user wants to optimize a problem \(P\) with objective function \(f_1(x)\) and then, subject to \(f_1(x)\) being optimal, optimize \(f_2(x)\), a common trick is to use as surrogate objective \(\bar{f}(x) = M f_1(x) + f_2(x)\) where \(M\) is a large constant. When you combine a large \(M\) with a relatively tight dual feasibility tolerance, it becomes much harder for the solver to find solutions that achieve dual feasibility. We recommend that you either define \(M\) as small possible as or reformulate your model using a hierarchical objective (which is made easier by our multi-objective optimization features).
These techniques are usually sufficient to eliminate the problems that arise from bad scaling.
Advanced User-Scaling#
In the previous sections, we presented some simple strategies to limit the ranges of variable bounds, constraint right-hand sides, objective values, and constraint matrix coefficients. However, it could happen that by scaling constraints or variables, some constraint coefficients become too small. Note that Gurobi will treat any constraint coefficient with absolute value under \(10^{-13}\) as zero. Consider the following example:
In this example, the matrix coefficients range in \([10^{-7},10^4]\). If we multiply all \(x\) coefficients by \(10^5\), and divide all coefficients in the second constraint by \(10^3\), we obtain:
where \(x=10^5x'\). The resulting matrix coefficients have a range in \([10^{-2},10^2]\). Essentially the trick is to simultaneously scale a column and a row to achieve a smaller range in the coefficient matrix.
We recommend that you scale the matrix coefficients so that their range is contained in six orders of magnitude or less, and hopefully within \([10^{-3},10^{6}]\).
Avoid Hiding Large Coefficients#
As we said before, a typical recommendation for improving numerics is to limit the range of constraint matrix coefficients. The rationale behind this guideline is that terms to be added in a linear expression should be of comparable magnitudes so that rounding errors are minimized. For example:
is usually considered a potential source of numerical instabilities due to the wide range of the coefficients in the constraint. However, it is easy to implement a simple (but useless) alternative:
This form certainly has nicer values in the matrix. However, the solution \(y=-10^{-6},\ x=-1\) might still be considered feasible as the bounds on variables and constraints might be violated within the tolerances. A better alternative is to reformulate
as
where \(10^{-3} y' = y\). In this setting, the most negative values for \(x\) which might be considered feasible would be \(-10^{-3}\), and for the original \(y\) variable it would be \(-10^{-9}\), which is a clear improvement over the original situation.
Dealing with Big-M Constraints#
Big-M constraints are a regular source of instability for optimization problems. They are so named because they typically involve a large coefficient \(M\) that is chosen to be larger than any reasonable value that a continuous variable or expression may take. Here’s a simple example:
Big-M constraints are typically used to propagate the implications of a binary, on-off decision to a continuous variable. For example, a big-M might be used to enforce the condition that an edge can only admit flow if you pay the fixed charge associated with opening the edge, or a facility can only produce products if you build it. In our example, note that the \(y = 0.0000099999\) satisfies the default integrality tolerance (IntFeasTol=\(10^{-5}\)), which allows \(x\) to take a value of \(9.999\). In other words, \(x\) can take a positive value without incurring an expensive fixed charge on \(y\), which subverts the intent of only allowing a non-zero value for \(x\) when the binary variable \(y\) has the value of 1. You can reduce the effect of this behavior by adjusting the IntFeasTol parameter, but you can’t avoid it entirely.
However, if the modeler has additional information that the \(x\) variable will never be larger than \(10^3\), then you could reformulate the earlier constraint as:
And now, \(y = 0.0000099999\) would only allow for \(x \leq 0.01\).
For cases when it is not possible to either rescale variable \(x\) or tighten its bounds, an SOS constraints or an indicator constraint (of the form \(y = 0 \Rightarrow x = 0\)) may produce more accurate solutions, but often at the expense of additional processing time.