Instability and the Geometry of Optimization Problems#
As we have seen, whenever we solve a problem numerically, we have to
accept that the input we provide and the output we obtain may differ
from the theoretical or mathematical solution to the given problem.
For example,
This is the idea behind the notion of the Condition Number for a given problem. While it is true that for most practical optimization problems, small perturbations in the input only induce small perturbations in the final answer to the problem, there are some special situations where this is not the case. These ill behaving problems are called Ill Conditioned or Numerically Unstable.
This sections aims to show, in the context of linear optimization problems, the most common sources for this behavior, and also how to avoid the behavior altogether. We will review first the problem of solving linear systems with unique solutions, and then move into the more central issue of linear optimization problems, its geometric interpretation, and then describe some of the most common bad cases. We then provide two thought experiments with interactive material to help illustrate the concepts of this section. We conclude with some further thoughts on this topic.
Note that although the notion of the Condition Number has received a lot of attention from the academic community, reviewing this literature is beyond the scope of this document. If you want to start looking into this topic, a good entry point can be the Condition Number page at Wikipedia.
The Case of Linear Systems#
Solving linear systems is a very common sub-routine in any MI(QC)P-solver, as we have to solve many linear systems during the full execution of the algorithm.
So, consider that we have a linear system
Note that the above definition is independent of the magnitudes of
This quantity is known as the condition number of the matrix
A common interpretation of
The condition number for the optimal simplex basis in an LP is captured
in the KappaExact attribute. A very large
When this is indeed the case, the best advice is to scale the constraint
matrix coefficients so that the resulting range of coefficients is
small. This transformation will typically reduce the
The Geometry of Linear Optimization Problems#
Before showing optimization models that exhibit bad behavior, we first need to understand the geometry behind them. Consider a problem of the form
For example:
Note that if we denote
The feasible region, direction of improvement
Note that whenever we move in the direction of
To understand how changes in the input data affect the feasible region
and the optimal solution, consider a small modification:
Note that although we changed the right-hand side, this change had no effect in the optimal solution to the problem, but it did change the feasible region by enlarging the bottom part of the feasible area.
Changing the objective vector tilts the corresponding vector in the
graphical representation. This of course also changes the optimal
objective value. Perturbing a constraint tilts the graphical
representation of the constraint. The change in
Multiple Optimal Solutions#
A common misconception among beginners in optimization is the idea that optimization problems really have just one solution. Surprisingly, this is typically not true. For many practical problems, the objective (whether it is cost or revenue or …) is dominated by a handful of variables, while most variables are just there to ensure that the actual operation of the solution is possible. Consider a staffing problem, for example, where cost is typically driven by the number of people who work on a given day, not by the specific people.
These kind of situations naturally lead to problems similar to
Graphically this can be depicted as
In this situation is clear that
Dealing with Epsilon-Optimal Solutions#
The previous section considered the case of multiple (true) optimal
solutions. What happens when we have several
Graphically this can be depicted as
If
The above statement is true whenever the distance between
Note that both situations share one ingredient: The objective function
is (almost) parallel to one of the sides of the feasible region. In the
first case, this side is relatively short, and thus jumping from
If you take out either of these two ingredients, namely the objective vector being almost parallel to a constraint, or the edge induced by this nearly-parallel constraint being very long, then this problem can not arise. For the reasons discussed at the beginning of this section, it is common for the objective function to be close to parallel to one or more constraints. Thus, the best way to avoid this situation is to avoid the second condition. The simplest way to do this is to ensure that the ranges for your variables are not too large. Please refer to the Tolerances and User-Scaling section for guidance on this.
Thin Feasible Regions#
We now consider another extreme situation that can lead to unexpected results. Consider the problem defined as
and its graphical representation
It is clear from the graphical representation that the optimal solution
for the problem will be at the intersection of constraints
If we perturb the right-hand side vector
This quantity can be either small or very large, depending on the
relative magnitude between
A similar issue arises if we perturb
What is driving this bad behavior? The problem is that the optimal point
is defined by two constraints that are nearly parallel. The smaller
Note however that, if the original problem had an additional variable
bound of the form
Optimizing Over the Circle#
Now we provide our first thought experiment: Consider the problem of optimizing a linear function over the feasible region defined by the constraints
i.e. the feasible region is essentially a unit circle in
To perform the experiment, we execute the code circleOpt.py, where we randomly select an objective vector, find the optimal solution to the resulting optimization problem, and compute several relevant quantities:
The worst distance between the reported primal solution, and the theoretical solution to the problem of actually optimizing over a perfect circle, over all previous runs.
The worst bound violation reported by Gurobi over all previous runs.
The worst constraint violation reported by Gurobi over all previous runs.
The worst dual violation reported by Gurobi over all previous runs.
The number of previous experiments.
Accumulated number of simplex iterations.
The
(KappaExact attribute) value for the current optimal basis.
Sample output is shown below:
Added 2 Vars and 1048576 constraints in 19.19 seconds
Errors: 8.65535e-08 0 2.94137e-07 2.77556e-17 Iter 0 10 Kappa 3150.06
Errors: 4.81978e-07 0 3.22359e-07 2.77556e-17 Iter 1 21 Kappa 3009.12
Errors: 4.81978e-07 0 3.4936e-07 1.11022e-16 Iter 2 33 Kappa 2890.58
Errors: 1.53201e-06 0 9.78818e-07 1.11022e-16 Iter 6 79 Kappa 1727.89
Errors: 1.61065e-06 0 8.26005e-07 1.11022e-16 Iter 46 536 Kappa 1880.73
Errors: 1.61065e-06 0 8.84782e-07 1.11022e-16 Iter 52 602 Kappa 1817.27
Errors: 1.61065e-06 0 9.4557e-07 1.11022e-16 Iter 54 625 Kappa 1757.96
Errors: 1.69167e-06 0 9.78818e-07 1.11022e-16 Iter 64 742 Kappa 1727.89
Errors: 1.69167e-06 0 3.8268e-07 1.66533e-16 Iter 88 1022 Kappa 2761.99
Errors: 1.69167e-06 0 9.04817e-07 1.66533e-16 Iter 92 1067 Kappa 1797.06
Errors: 1.69167e-06 0 2.94137e-07 2.22045e-16 Iter 94 1089 Kappa 3150.06
Errors: 1.69167e-06 0 3.29612e-07 2.22045e-16 Iter 95 1101 Kappa 2975.84
Errors: 1.69167e-06 0 3.4936e-07 2.22045e-16 Iter 98 1137 Kappa 2890.58
Errors: 1.69167e-06 0 9.25086e-07 2.22045e-16 Iter 99 1147 Kappa 1777.3
Errors: 1.69167e-06 0 9.78818e-07 2.22045e-16 Iter 107 1237 Kappa 1727.89
Errors: 1.69167e-06 0 9.99895e-07 2.22045e-16 Iter 112 1293 Kappa 1709.61
Errors: 1.84851e-06 0 9.78818e-07 2.22045e-16 Iter 132 1523 Kappa 1727.89
Errors: 1.96603e-06 0 9.99895e-07 2.22045e-16 Iter 134 1545 Kappa 1709.61
Surprisingly the reported errors are rather small. Why is this? There
are at least two contributing factors: the model has a bounded feasible
region (in this case the range of both variables is
We encourage you to play with this code, perturb some of the input data, and analyze the results. You will see the discrepancies between the theoretical and the numerical optimal solution will be comparable to the sizes of the perturbations.
Optimizing Over Thin Regions#
Now we move to our second thought experiment: Consider a feasible region
consisting of a triangle in
Consider the case where the ratio of the base to the height is on the
order of
In theory, the optimal solution should be the apex of the triangle, but
assume that we randomly perturb both the right-hand side and the
objective function with terms in the order of
To perform the experiment, we execute the code thinOpt.py, where we perform a series of re-optimizations with different perturbations as described above. To be more precise, whenever the new computed solution is further from the mathematical solution by more than it has been in previous trials, we print:
The new maximum distance among solutions.
The current iteration.
The
(KappaExact attribute) value for the current optimal basis.The bound violation as reported by Gurobi for the current solution.
The constraint violation as reported by Gurobi for the current solution.
The dual violation as reported by Gurobi for the current solution.
Sample output is shown below:
New maxdiff 4e+16 Iter 0 Kappa 3.31072 Violations: 0 0 0
New maxdiff 4e+16 Iter 1 Kappa 3.31072 Violations: 0 0 0
New maxdiff 4e+16 Iter 2 Kappa 3.31072 Violations: 0 0 0
New maxdiff 4e+16 Iter 7 Kappa 3.31072 Violations: 0 0 0
New maxdiff 4e+16 Iter 83 Kappa 3.31072 Violations: 0 0 2.64698e-23
New maxdiff 4e+16 Iter 194 Kappa 3.31072 Violations: 0 0 0
New maxdiff 4e+16 Iter 1073 Kappa 3.31072 Violations: 0 1.13687e-13 0
New maxdiff 4e+16 Iter 4981 Kappa 3.31072 Violations: 0 0 0
New maxdiff 4e+16 Iter 19514 Kappa 3.31072 Violations: 0 0 0
New maxdiff 4e+16 Iter 47117 Kappa 3.31072 Violations: 0 0 0
New maxdiff 4e+16 Iter 429955 Kappa 3.31072 Violations: 0 0 0
New maxdiff 4e+16 Iter 852480 Kappa 3.31072 Violations: 0 0 0
Results look very different from what we saw in our first test. The
distance between the solution to the unperturbed model and the solution
to the perturbed one is huge, even from the very first iteration. Also,
the
Again, we encourage you to play with this example. For example, what
would happen if the nominal objective function is
Stability and Convergence#
The algorithms used to solve linear programming problems are all forced to make an assumption: that tiny changes to the system (e.g., making a small step in barrier) lead to small changes in the solution. If this is not true (due to ill-conditioning), then the algorithm may jump around in the solution space and have a hard time converging.
Finally, one way to improve the geometry of a problem is by suitably scaling variables and constraints as explained in the Tolerances and User-Scaling section, and working with bounded feasible sets with reasonable ranges for all variables.