Nonlinear Constraints#
In Gurobi, you can express various types of nonlinear restrictions (integrality, quadratic constraints, …). By abuse of terminology, we refer to nonlinear constraints as those constraints that involve nonlinear functions such as logarithm, exponential, sine, cosine, … You can express those relationships in Gurobi either with function constraints or nonlinear constraints (see General Constraints for their definitions). Our intent in this chapter is to detail some aspects of our algorithms that are specific to those constraints.
Similarly to what is done for non-convex quadratic constraints, Gurobi solves models containing such nonlinear constraints to global optimality using a spatial branch-and-bound framework.
Note
As we mentioned for the quadratic case, note that this differs from other solvers that seek a locally optimal solution and don’t aim to prove global optimality.
The difference between function constraints and nonlinear constraints is that the former can only represent a nonlinear relationship between two variables while the latter can handle multivariate composite functions. Nonlinear constraints are more powerful but their representation is more complex. We detail this representation in the section Expression Trees.
We then discuss specific solution strategies for dealing with nonlinear functions. The basic principles of the spatial branch-and-bound is identical for both type of constraints and is discussed in Solution Strategies. But there are also a few differences. In particular, for function constraints, Gurobi can also perform a static piecewise-linear approximation using a MIP formulation and we describe this procedure and its parameters in Static Piecewise-Linear Approximation.
Expression Trees#
Gurobi supports nonlinear constraints of the form
Note
Note that in Python, we recommend using arithmetic operators and our
nonlinear function helpers to build nonlinear constraints (see
NLExpr
for details). Constructing these expressions as described
here is considered advanced usage.
Let’s start with a simple example to illustrate the concept.
Consider the multivariate expression
In order to pass such a directed tree to any of Gurobi’s APIs except for Python, we need to encode it with native data types. Speaking abstractly, our format uses three arrays to represent such an expression tree, where each three i-th entries together define the i-th node in the tree. The three arrays contain the following information:
The opcode (“operation code”) array. It contains the information which elementary operation this node represents. A list of possible opcode values is given further below.
The data array. It contains auxiliary information for this node, like the value of a constant, or the index of a variable in the optimization model. Fill a value of -1 if the corresponding opcode does not assume additional data.
The parent array. It defines the tree topology by providing the index of this node’s parent in the array representation. The (unique) root node of the tree must have the value -1 as its parent.
The nodes of the tree must be presented in these arrays in pre-order, that is, each node of the tree must be followed by all its descendants in a consecutive manner.
Continuing our
array index |
opcode |
data |
parent |
---|---|---|---|
0 |
plus |
-1 |
-1 |
1 |
sin |
-1 |
0 |
2 |
multiply |
-1 |
1 |
3 |
constant |
2.5 |
2 |
4 |
variable |
1 |
2 |
5 |
variable |
2 |
0 |
Let’s discuss the table line-by-line.
Index 0: This is the root node of the tree (its parent value is -1), and it represents a “plus” operation. It doesn’t need additional data, hence its data value is set to -1.
Index 1: This is the first child of its parent at index 0, and it represents an application of the sine function. It doesn’t need additional data, hence its data value is set to -1.
Index 2: This is the child of its parent sine node (parent value is 1), and represents a “multiply” elementary operation. It doesn’t need additional data, hence its data value is set to -1.
Index 3: This is the first child of its parent multiply node (parent value is 2) and represents the constant value 2.5, which is stored as the data value.
Index 4: This is the second child of its parent multiply node (parent value is 2), and represents the variable
which has the index 1 in the model, as designated by the corresponding value in the data array.Index 5: This is the second child of its parent plus node (parent value is 0), and represents the variable
which has the index 2 in the model, as designated by the corresponding value in the data array.
Operation Codes#
An operation code (“opcode”) describes an elementary operation that maps input values to an output value. As seen in Expression Trees, each node in an expression tree carries exactly one opcode, and we will discuss all available opcodes now.
OPCODE_CONSTANT
This opcode will “load” a constant value as an argument for its parent node. The value is specified by the corresponding entry in the
data
array. It cannot be a parent of a node in an expression tree.
OPCODE_VARIABLE
This opcode will “load” a variable as an argument for its parent node. The variable index is specified by the corresponding entry in the
data
array. Note that in some languages you may need to cast the variable index, which is an integer type, to a floating point value. It cannot be a parent of a node in an expression tree.
OPCODE_PLUS
This n-ary opcode will evaluate to the sum of all its arguments. It can have an arbitrary number of arguments, where a single argument operation amounts to a unary plus (a no-op), and an evaluation with zero arguments results in the value 0.0.
OPCODE_MINUS
This opcode is a binary operation and will evaluate to the difference of its first and second argument.
OPCODE_MULTIPLY
This n-ary opcode will evaluate to the product of all its arguments. It can have an arbitrary number of arguments, where a single argument operation amounts to a unary multiplication (a no-op), and an evaluation with zero arguments results in the value 1.0.
OPCODE_DIVIDE
This opcode is a binary operation and will evaluate to the quotient of its first argument (dividend) and second argument (divisor), viz.
. Gurobi will implicitly prevent the divisor to take values in the interval .
OPCODE_UMINUS
This unary opcode evaluates to the negative of its sole argument.
OPCODE_SQUARE
This unary opcode evaluates to the square of its sole argument.
OPCODE_SQRT
This unary opcode evaluates to the square root of its sole argument. Gurobi will implicitly enforce a lower bound of zero on the argument.
OPCODE_SIN
,OPCODE_COS
,OPCODE_TAN
Unary opcodes for the trigonometric functions sine, cosine and tangent.
OPCODE_POW
This binary opcode evaluates to
, where the base is the first argument, and the exponent is its second argument. A number of restrictions apply to this opcode:One of
and must evaluate to a constant. More precisely, if Gurobi’s presolve algorithm cannot determine that either the base or the exponent evaluates to a constant, the optimization will stop with an error.If the base evaluates to a constant, that constant must be strictly greater than zero. Otherwise the optimization will stop with an error.
If the exponent evaluates to a constant negative integer value, then Gurobi will implicitly prevent the base
to take values in the interval .If the exponent evaluates to a constant nonnegative fractional value, then Gurobi will implicitly enforce a lower bound of
on the base.If the exponent evaluates to a constant negative fractional value, then Gurobi will implicitly enforce a lower bound of
on the base.
OPCODE_EXP
This unary opcode evaluates to the exponential
of its sole argument , viz.
OPCODE_LOG
,OPCODE_LOG2
,OPCODE_LOG10
These unary opcodes will evaluate to the logarithm (with bases
, 2, and 10) of its argument. Gurobi will enforce a lower bound of on the argument.
OPCODE_LOGISTIC
This unary function will evaluate its argument
to the value of the standard logistic function, viz. .
A compact reference overview of all opcodes is given in Nonlinear Operation Codes.
Solution Strategies#
There is no good way to restate nonlinear constraints so that Gurobi’s (linear)
MIP solver can handle them efficiently and accurately. The most effective
approach to solving such problems to global optimality consists in forming a
linear outer approximation of the functions that is iteratively refined in a
branch-and-bound fashion. The linear outer approximations are valid in that they
contain all feasible solutions to the nonlinear constraints. The branching is
usually done on the variables
We note that while the spatial branch-and-bound is often able to compute very accurate solutions, depending on the complexity of the nonlinear functions and the size of the domain of the variables the solver needs to branch on, the search tree may be considerably larger than for traditional MIP.
For function constraints, Gurobi also offers an alternative approach where the constraint is replaced by a static piecewise-linear approximation and reformulated using our PWL general constraint. In this approach, the nonlinear function constraint is replaced before the MIP solution process begins by a fixed approximation. The accuracy of the approximation will of course depend heavily on the number of pieces used. This approach is discussed in the section Static Piecewise-Linear Approximation below.
The spatial branch-and-bound approach is usually a better approach in particular if you want solutions with a good accuracy. If this accuracy is not required for you, then the static approach may be faster in some cases.
Dynamic Piecewise-Linear Approximation#
As noted earlier, the preferred option for managing function constraints is using spatial branching and outer approximation. Gurobi will take this approach for any function where the FuncNonlinear attribute is set to 1 (or that attribute is set to -1 and the global FuncNonlinear parameter is set to -1 or 1).
Consider a linear approximation of
The shaded regions show the points that are feasible for the linear
approximation but not for the original quadratic constraint. We’ve
chosen one point on the left figure, labeled feasible, that lies in
this region. It may violate
Solving a model with (nonconvex) nonlinear constraints to global optimality is well known to be a hard task. The idea behind spatial branching is to divide the overall solution space into portions, and to compute valid primal and dual bounds for each portion. Such bounds are obtained by computing a linear approximation to the feasible space for each region (a so-called outer approximation) that contains all feasible solutions for the original nonlinear constraints in that region. Bounds for a region can often be tightened by spatial branching, where the domain of a variable (integer or continuous) is split, allowing hopefully tighter outer approximations to be computed for the resulting feasible subregions. Bounds for the original model can be obtained by combining the bounds from the leafs of the resulting branch-and-bound tree. This process continues until the desired optimality gap (MIPGap) is achieved.
Valid primal bounds come from feasible solutions found during the search. They can be found using various heuristics, or they may come from solutions to the current node relaxation that happen to satisfy the nonlinear and integrality constraints (relaxation solutions will always satisfy linear and bound constraints because these are still present in the relaxation). Valid dual bounds are found by solving an LP over a polyhedron that contains all feasible solutions to the original linear and nonlinear constraints in that region (the outer approximation).
Dealing with nonlinear expressions#
You might have noticed that our description of the spatial branch-and-bound only discusses the case of univariate nonlinear functions and might be wondering how more complex nonlinear expressions are handled.
It’s a property of the nonlinear expressions that we support that they can be disaggregated into a cascade of univariate functions. For example, consider the function
It can be disaggregated into
by adding an appropriate number of auxiliary variables. Gurobi will automatically compute such a formulation when you use nonlinear constraints.
Despite the disaggregated reformulation being mathematically equivalent to
the original function, small violations in the disaggregated formulation can
lead to much larger ones in the original functions.
For example, suppose we are using the default feasibility tolerance of
However, if we compute the value of the original function, we get:
The difference with the value for
This example illustrates that a solution that is feasible for the disaggregated model may not necessarily be feasible for the aggregated (original) model. Clearly, the above example is an extreme case, but this issue can definitely arise in other situations. This makes scaling and tight coefficient ranges extremely important when formulating and solving nonlinear models.
We strongly recommend using our nonlinear functions rather that trying to formulate complex nonlinear functions using our function constraints. Gurobi will compute the error in the original function and try to deal with it. In some cases, it might not be able to do so and will either return a solution that slightly exceeds the tolerance or exit with a sub-optimal status.
Static Piecewise-Linear Approximation#
The alternative for managing function constraints is to perform a static piecewise-linear approximation, yielding a MIP model that can be handed to the standard MIP solver. Gurobi will take this approach for any function where the FuncNonlinear attribute is set to 0 (or that attribute is set to -1 and the global FuncNonlinear parameter is set to 0).
With this approach to handling non-linearity, you face a fundamental cost-versus-accuracy tradeoff: adding more pieces produces smaller approximation errors, but also increases the cost of solving the problem. The tradeoff can be complex. Gurobi provides a set of three attributes that help to navigate this tradeoff: FuncPieces, FuncPieceLength, FuncPieceError. They are used as follows:
If you would like to choose the number of pieces to use for the approximation, set the FuncPieces attribute to the desired value. All pieces will have equal width. This approach allows you to control the size of the approximation.
If you would like to choose the width of each piece, set the FuncPieces attribute to a special value of 1 and set the FuncPieceLength attribute equal to the desired width of each piece. This approach provides some control over both the size and the error of the approximation. While this may appear to be a minor variation of the first option, note that presolve may tighten the domain of
, often substantially, which can make it difficult to predict the relationship between the width of each piece and the number of pieces.If you would like to set the maximum error you are willing to tolerate in the approximation, set the FuncPieces attribute to a special value of -1 and set the FuncPieceError attribute equal to the maximum absolute approximation you are willing to tolerate. Gurobi will choose pieces, typically of different sizes, to achieve that error bound. Note that the number of pieces required may be quite large if you set a tight error tolerance. You can control the maximum relative error rather than the absolute error by setting the FuncPieces attribute to -2 instead of -1.
These are attributes on the general constraints, so you can choose different values for each individual constraint.
The other relevant attribute is FuncPieceRatio, which controls whether the approximation is an underestimate of the function (0.0), an overestimate (1.0), or somewhere in between (any value strictly between 0.0 and 1.0). You can also choose the special value of -1, which will choose points that are on the original function.
Consider the following simple example:
The goal is to find an approximation of the polynomial
Recall that you can set FuncPieces to
Note that the approximations are guaranteed to be under- and over-estimates in all cases except for polynomials of degree greater than 5. Finding the roots of higher-degree polynomials, which would be required to guarantee this property, is quite difficult.
If you wish to experiment with different approaches to approximating a
set of functions, it is often convenient to be able to change the
approach for all functions at once. We provide a set of parameters with
the same names as the attributes to make this easier:
FuncPieces, FuncPieceLength,
FuncPieceError, and FuncPieceRatio. If
you set the FuncPieces attribute on a function constraint to
For some of the supported functions, modest
We should point out we handle violations and tolerances differently for PWL approximations, which can sometimes lead to unexpected results. For one, the feasibility tolerance for function constraints is specified in FuncPieceError rather than through the standard feasibility tolerance. We also interpret violations differently. The violation of a function constraint is computed as the Euclidian distance from the solution to the nearest point on the function. Computing this distance exactly can be quite involved for some functions, so we actually compute an over-estimate in some cases. The net result is that the reported violations may be larger than you expect.
Another possible source of unexpected results comes from the fact that
solutions that satisfy the original nonlinear function may not satisfy
the piecewise-linear approximation of that function. This may lead to
sub-optimal solutions or even conclusions of infeasibility. Consider a
simple example with two constraints:
While users could perform piecewise-linear approximations themselves, there are several advantages to asking Gurobi to do it instead. First, Gurobi can often reduce the domains of variables, by using bound strengthening in presolve, or by exploiting repetition in periodic functions like sine or cosine. Smaller domains means fewer pieces to achieve the same accuracy. Gurobi also provides many options to make experimentation easier (for error control, piece length, etc.). These options can be quite difficult to implement and maintain.