Nonlinear Constraints#

In Gurobi, you can express various types of nonlinear restrictions (integrality, quadratic constraints, …). By abuse of terminology, we refer to nonlinear constraints as those constraints that involve nonlinear functions such as logarithm, exponential, sine, cosine, … You can express those relationships in Gurobi either with function constraints or nonlinear constraints (see General Constraints for their definitions). Our intent in this chapter is to detail some aspects of our algorithms that are specific to those constraints.

Similarly to what is done for non-convex quadratic constraints, Gurobi solves models containing such nonlinear constraints to global optimality using a spatial branch-and-bound framework.

Note

As we mentioned for the quadratic case, note that this differs from other solvers that seek a locally optimal solution and don’t aim to prove global optimality.

The difference between function constraints and nonlinear constraints is that the former can only represent a nonlinear relationship between two variables while the latter can handle multivariate composite functions. Nonlinear constraints are more powerful but their representation is more complex. We detail this representation in the section Expression Trees.

We then discuss specific solution strategies for dealing with nonlinear functions. The basic principles of the spatial branch-and-bound is identical for both type of constraints and is discussed in Solution Strategies. But there are also a few differences. In particular, for function constraints, Gurobi can also perform a static piecewise-linear approximation using a MIP formulation and we describe this procedure and its parameters in Static Piecewise-Linear Approximation.

Expression Trees#

Gurobi supports nonlinear constraints of the form \(y = f(x)\) where \(f: \mathbf{R}^n \rightarrow \mathbf{R}\) is given by an expression tree. Such a tree encodes how the function \(f\) is composed of various elementary operations, like arithmetic operations and univariate functions. We’ll discuss both concepts next.

Note

Note that in Python, we recommend using arithmetic operators and our nonlinear function helpers to build nonlinear constraints (see NLExpr for details). Constructing these expressions as described here is considered advanced usage.

Let’s start with a simple example to illustrate the concept. Consider the multivariate expression \(\sin(2.5 x_1) + x_2\). The elementary operations in this expressions are multiplication, addition and the sine function. The dependencies and order of operations are as follows: On the top level there is an addition of the terms \(\sin(2.5 x_1)\) and \(x_2\). The first part decomposes into a sine function which depends on the product of the constant \(2.5\) and the variable \(x_1\). Orienting edges along the flow of data we arrive at the following representation:

../_images/exprtree_example.svg
../_images/exprtree_example-dark.svg

In order to pass such a directed tree to any of Gurobi’s APIs except for Python, we need to encode it with native data types. Speaking abstractly, our format uses three arrays to represent such an expression tree, where each three i-th entries together define the i-th node in the tree. The three arrays contain the following information:

  • The opcode (“operation code”) array. It contains the information which elementary operation this node represents. A list of possible opcode values is given further below.

  • The data array. It contains auxiliary information for this node, like the value of a constant, or the index of a variable in the optimization model. Fill a value of -1 if the corresponding opcode does not assume additional data.

  • The parent array. It defines the tree topology by providing the index of this node’s parent in the array representation. The (unique) root node of the tree must have the value -1 as its parent.

The nodes of the tree must be presented in these arrays in pre-order, that is, each node of the tree must be followed by all its descendants in a consecutive manner.

Continuing our \(\sin(2.5 x_1) + x_2\) example from above, the three arrays would contain the following data:

array index

opcode

data

parent

0

plus

-1

-1

1

sin

-1

0

2

multiply

-1

1

3

constant

2.5

2

4

variable

1

2

5

variable

2

0

Let’s discuss the table line-by-line.

  • Index 0: This is the root node of the tree (its parent value is -1), and it represents a “plus” operation. It doesn’t need additional data, hence its data value is set to -1.

  • Index 1: This is the first child of its parent at index 0, and it represents an application of the sine function. It doesn’t need additional data, hence its data value is set to -1.

  • Index 2: This is the child of its parent sine node (parent value is 1), and represents a “multiply” elementary operation. It doesn’t need additional data, hence its data value is set to -1.

  • Index 3: This is the first child of its parent multiply node (parent value is 2) and represents the constant value 2.5, which is stored as the data value.

  • Index 4: This is the second child of its parent multiply node (parent value is 2), and represents the variable \(x_1\) which has the index 1 in the model, as designated by the corresponding value in the data array.

  • Index 5: This is the second child of its parent plus node (parent value is 0), and represents the variable \(x_2\) which has the index 2 in the model, as designated by the corresponding value in the data array.

Operation Codes#

An operation code (“opcode”) describes an elementary operation that maps input values to an output value. As seen in Expression Trees, each node in an expression tree carries exactly one opcode, and we will discuss all available opcodes now.

OPCODE_CONSTANT

This opcode will “load” a constant value as an argument for its parent node. The value is specified by the corresponding entry in the data array. It cannot be a parent of a node in an expression tree.

OPCODE_VARIABLE

This opcode will “load” a variable as an argument for its parent node. The variable index is specified by the corresponding entry in the data array. Note that in some languages you may need to cast the variable index, which is an integer type, to a floating point value. It cannot be a parent of a node in an expression tree.

OPCODE_PLUS

This n-ary opcode will evaluate to the sum of all its arguments. It can have an arbitrary number of arguments, where a single argument operation amounts to a unary plus (a no-op), and an evaluation with zero arguments results in the value 0.0.

OPCODE_MINUS

This opcode is a binary operation and will evaluate to the difference of its first and second argument.

OPCODE_MULTIPLY

This n-ary opcode will evaluate to the product of all its arguments. It can have an arbitrary number of arguments, where a single argument operation amounts to a unary multiplication (a no-op), and an evaluation with zero arguments results in the value 1.0.

OPCODE_DIVIDE

This opcode is a binary operation and will evaluate to the quotient of its first argument (dividend) and second argument (divisor), viz. \(\frac{x}{y}\). Gurobi will implicitly prevent the divisor \(y\) to take values in the inverval \([-10^{-10}, 10^{-10}]\).

OPCODE_UMINUS

This unary opcode evaluates to the negative of its sole argument.

OPCODE_SQUARE

This unary opcode evaluates to the square of its sole argument.

OPCODE_SQRT

This unary opcode evaluates to the square root of its sole argument. Gurobi will implicitly enforce a lower bound of zero on the argument.

OPCODE_SIN, OPCODE_COS, OPCODE_TAN

Unary opcodes for the trigonometric functions sine, cosine and tangent.

OPCODE_POW

This binary opcode evaluates to \(x^y\), where the base \(x\) is the first argument, and the exponent \(y\) is its second argument. A number of restrictions apply to this opcode:

  • One of \(x\) and \(y\) must evaluate to a constant. More precisely, if Gurobi’s presolve algorithm cannot determine that either the base or the exponent evaluates to a constant, the optimization will stop with an error.

  • If the base evaluates to a constant, that constant must be strictly greater than zero. Otherwise the optimization will stop with an error.

  • If the exponent evaluates to a constant negative integer value, then Gurobi will implicitly prevent the base \(x\) to take values in the interval \([-10^{-10}, 10^{-10}]\).

  • If the exponent evaluates to a constant nonnegative fractional value, then Gurobi will implicitly enforce a lower bound of \(0\) on the base.

  • If the exponent evaluates to a constant negative fractional value, then Gurobi will implicitly enforce a lower bound of \(10^{-10}\) on the base.

OPCODE_EXP

This unary opcode evaluates to the exponential \(y\) of its sole argument \(x\), viz. \(y = \exp(x) = e^x.\)

OPCODE_LOG, OPCODE_LOG2, OPCODE_LOG10

These unary opcodes will evaluate to the logarithm (with bases \(e\), 2, and 10) of its argument. Gurobi will enforce a lower bound of \(10^{-10}\) on the argument.

OPCODE_LOGISTIC

This unary function will evaluate its argument \(x\) to the value \(y\) of the standard logistic function, viz. \(y = \frac{1}{1 + \exp(x)}\).

A compact reference overview of all opcodes is given in Nonlinear Operation Codes.

Solution Strategies#

There is no good way to restate nonlinear constraints so that Gurobi’s (linear) MIP solver can handle them efficiently and accurately. The most effective approach to solving such problems to global optimality consists in forming a linear outer approximation of the functions that is iteratively refined in a branch-and-bound fashion. The linear outer approximations are valid in that they contain all feasible solutions to the nonlinear constraints. The branching is usually done on the variables \(x\) that are the arguments of the functions. This family of algorithm is generally called spatial branch-and-bound. It differs from the more traditional branch-and-bound employed in the MIP algorithm in that (i) branching may happen on continuous variables (ii) the linear relaxation is modified and refined at every node of the branch-and-bound tree.

We note that while the spatial branch-and-bound is often able to compute very accurate solutions, depending on the complexity of the nonlinear functions and the size of the domain of the variables the solver needs to branch on, the search tree may be considerably larger than for traditional MIP.

For function constraints, Gurobi also offers an alternative approach where the constraint is replaced by a static piecewise-linear approximation and reformulated using our PWL general constraint. In this approach, the nonlinear function constraint is replaced before the MIP solution process begins by a fixed approximation. The accuracy of the approximation will of course depend heavily on the number of pieces used. This approach is discussed in the section Static Piecewise-Linear Approximation below.

The spatial branch-and-bound approach is usually a better approach in particular if you want solutions with a good accuracy. If this accuracy is not required for you, then the static approach may be faster in some cases.

Dynamic Piecewise-Linear Approximation#

As noted earlier, the preferred option for managing function constraints is using spatial branching and outer approximation. Gurobi will take this approach for any function where the FuncNonlinear attribute is set to 1 (or that attribute is set to -1 and the global FuncNonlinear parameter is set to -1 or 1).

Consider a linear approximation of \(y \leq x^2\):

../_images/dynamic.svg
../_images/dynamic-dark.svg

The shaded regions show the points that are feasible for the linear approximation but not for the original quadratic constraint. We’ve chosen one point on the left figure, labeled feasible, that lies in this region. It may violate \(y \leq x^2\) by more than the desired amount, but with a static approximation we would have no option other than to accept this solution. A dynamic approximation could branch at \(x=1\), creating a pair of refined linear approximations, one on either side of the branching point. As is evident in the figure on the right, the original feasible point is excluded from both sides.

Solving a model with (nonconvex) nonlinear constraints to global optimality is well known to be a hard task. The idea behind spatial branching is to divide the overall solution space into portions, and to compute valid primal and dual bounds for each portion. Such bounds are obtained by computing a linear approximation to the feasible space for each region (a so-called outer approximation) that contains all feasible solutions for the original nonlinear constraints in that region. Bounds for a region can often be tightened by spatial branching, where the domain of a variable (integer or continuous) is split, allowing hopefully tighter outer approximations to be computed for the resulting feasible subregions. Bounds for the original model can be obtained by combining the bounds from the leafs of the resulting branch-and-bound tree. This process continues until the desired optimality gap (MIPGap) is achieved.

Valid primal bounds come from feasible solutions found during the search. They can be found using various heuristics, or they may come from solutions to the current node relaxation that happen to satisfy the nonlinear and integrality constraints (relaxation solutions will always satisfy linear and bound constraints because these are still present in the relaxation). Valid dual bounds are found by solving an LP over a polyhedron that contains all feasible solutions to the original linear and nonlinear constraints in that region (the outer approximation).

Dealing with nonlinear expressions#

You might have noticed that our description of the spatial branch-and-bound only discusses the case of univariate nonlinear functions and might be wondering how more complex nonlinear expressions are handled.

It’s a property of the nonlinear expressions that we support that they can be disaggregated into a cascade of univariate functions. For example, consider the function

\[y = f(x_1, x_2) = 100000 \cdot \frac{x_1}{x_1 + 1000 \cdot x_2}.\]

It can be disaggregated into

\[\begin{split}z_1 &= x_1 + 1000 \cdot x_2, \\ z_2 &= z_1^{-1}, \\ y &= 100000 \cdot x_1 \cdot z_2,\end{split}\]

by adding an appropriate number of auxiliary variables. Gurobi will automatically compute such a formulation when you use nonlinear constraints.

Despite the disaggregated reformulation being mathematically equivalent to the original function, small violations in the disaggregated formulation can lead to much larger ones in the original functions. For example, suppose we are using the default feasibility tolerance of \(10^{-6}\), the following solution is feasible for the disaggregated model:

\[\begin{split}x_1 &= 1, \\ x_2 &= 1, \\ z_1 &= 1001.000001, \\ z_2 &= 0.001000000998003, \\ y &= 100.0000998003.\end{split}\]

However, if we compute the value of the original function, we get:

\[f(1, 1) = 100000 \cdot \frac{1}{1 + 1000 \cdot 1} = 99.9000999001.\]

The difference with the value for \(y\) above is almost \(0.1\) and is much larger than the specified feasibility tolerance.

This example illustrates that a solution that is feasible for the disaggregated model may not necessarily be feasible for the aggregated (original) model. Clearly, the above example is an extreme case, but this issue can definitely arise in other situations. This makes scaling and tight coefficient ranges extremely important when formulating and solving nonlinear models.

We strongly recommend using our nonlinear functions rather that trying to formulate complex nonlinear functions using our function constraints. Gurobi will compute the error in the original function and try to deal with it. In some cases, it might not be able to do so and will either return a solution that slightly exceeds the tolerance or exit with a sub-optimal status.

Static Piecewise-Linear Approximation#

The alternative for managing function constraints is to perform a static piecewise-linear approximation, yielding a MIP model that can be handed to the standard MIP solver. Gurobi will take this approach for any function where the FuncNonlinear attribute is set to 0 (or that attribute is set to -1 and the global FuncNonlinear parameter is set to 0).

With this approach to handling non-linearity, you face a fundamental cost-versus-accuracy tradeoff: adding more pieces produces smaller approximation errors, but also increases the cost of solving the problem. The tradeoff can be complex. Gurobi provides a set of three attributes that help to navigate this tradeoff: FuncPieces, FuncPieceLength, FuncPieceError. They are used as follows:

  • If you would like to choose the number of pieces to use for the approximation, set the FuncPieces attribute to the desired value. All pieces will have equal width. This approach allows you to control the size of the approximation.

  • If you would like to choose the width of each piece, set the FuncPieces attribute to a special value of 1 and set the FuncPieceLength attribute equal to the desired width of each piece. This approach provides some control over both the size and the error of the approximation. While this may appear to be a minor variation of the first option, note that presolve may tighten the domain of \(x\), often substantially, which can make it difficult to predict the relationship between the width of each piece and the number of pieces.

  • If you would like to set the maximum error you are willing to tolerate in the approximation, set the FuncPieces attribute to a special value of -1 and set the FuncPieceError attribute equal to the maximum absolute approximation you are willing to tolerate. Gurobi will choose pieces, typically of different sizes, to achieve that error bound. Note that the number of pieces required may be quite large if you set a tight error tolerance. You can control the maximum relative error rather than the absolute error by setting the FuncPieces attribute to -2 instead of -1.

These are attributes on the general constraints, so you can choose different values for each individual constraint.

The other relevant attribute is FuncPieceRatio, which controls whether the approximation is an underestimate of the function (0.0), an overestimate (1.0), or somewhere in between (any value strictly between 0.0 and 1.0). You can also choose the special value of -1, which will choose points that are on the original function.

Consider the following simple example:

../_images/func.svg
../_images/func-dark.svg

The goal is to find an approximation of the polynomial \(y=x^2\). We’ve set FuncPieces to \(1\) and FuncPieceLength to \(1.0\), so we’re performing an approximation with fixed-width pieces of width 1.0. The domain of \(x\) is \([0,2]\), so the approximation has two pieces. The figure shows 6 points: \(P_{u1}(0,0), P_{u2}(1,1), P_{u3}(2,4)\), and \(P_{l1}(0,-0.25), P_{l2}(1, 0.75), P_{l3}(2,3.75)\). If FuncPieceRatio is set to \(0.0\), the approximation would be built from the points below the function (\(P_{l1}, P_{l2}\), and \(P_{l3}\)). Similarly, if it is set to \(1.0\), the approximation would be built from the points above the function (\(P_{u1}, P_{u2}\), and \(P_{u3}\)). A value of \(0.6\) would use weighted combinations of the points: \(0.6\) times \(P_{ui}\) plus \(0.4\) times \(P_{li}\). In this case, the line segments would be built from the points \((0, -0.1), (1, 0.9)\), and \((2, 3.9)\). If FuncPieceRatio is set to \(-1\), meaning that the approximation would be built from points that are on the original function, in this case the upper points (\(P_{u1}, P_{u2}\), and \(P_{u3}\)) fit the bill. This will always be the case for a convex function.

Recall that you can set FuncPieces to \(-1\) to control the maximum absolute error. In this case, choosing a FuncPieceError value of \(0.25\) would give the piecewise approximation shown in the figure, since the distance between the upper and lower curves is always \(0.25\). A smaller error value would of course lead to more pieces. We should add that piece widths will typically be non-uniform when limiting the maximum approximation error. The approximation algorithms we use try to limit the number of pieces needed to meet the error targets, which often requires more refinement in some portions of the domain than in others.

Note that the approximations are guaranteed to be under- and over-estimates in all cases except for polynomials of degree greater than 5. Finding the roots of higher-degree polynomials, which would be required to guarantee this property, is quite difficult.

If you wish to experiment with different approaches to approximating a set of functions, it is often convenient to be able to change the approach for all functions at once. We provide a set of parameters with the same names as the attributes to make this easier: FuncPieces, FuncPieceLength, FuncPieceError, and FuncPieceRatio. If you set the FuncPieces attribute on a function constraint to \(0\), then the approximation approach for that constraint will be determined by the parameter settings instead.

For some of the supported functions, modest \(x\) values can lead to enormous \(y\) values (and vice-versa). This can cause numerical issues when solving the resulting piecewise-linear MIP model. To avoid such issues, we limit the range of any \(x\) or \(y\) that participates in a function constraint to [-1e+6, 1e+6]. The parameter FuncMaxVal allows you to change these limits, but we recommend that you proceed with caution.

We should point out we handle violations and tolerances differently for PWL approximations, which can sometimes lead to unexpected results. For one, the feasibility tolerance for function constraints is specified in FuncPieceError rather than through the standard feasibility tolerance. We also interpret violations differently. The violation of a function constraint is computed as the Euclidian distance from the solution to the nearest point on the function. Computing this distance exactly can be quite involved for some functions, so we actually compute an over-estimate in some cases. The net result is that the reported violations may be larger than you expect.

Another possible source of unexpected results comes from the fact that solutions that satisfy the original nonlinear function may not satisfy the piecewise-linear approximation of that function. This may lead to sub-optimal solutions or even conclusions of infeasibility. Consider a simple example with two constraints: \(y = 2x-1\) and \(y = x^2\). Clearly \((x, y) = (1, 1)\) is a feasible solution, but a piecewise-linear approximation could introduce breakpoints at \(x=0.9\) and \(x=1.1\). The resulting approximation gives a \(y\) value of \(1.01\) at \(x=1\), which is sufficiently far from the actual function value that Gurobi will not consider that a valid solution and declare the model infeasible, since there are no other solutions to the constraints. Reducing the maximum approximation error (by setting FuncPieces to -1 and FuncPieceError to a much smaller value) would help, but this isn’t always the best way to address the problem, since tighter error tolerances can substantially increase the number of pieces in the approximation and thus the cost. We recommend the following approach when you encounter unexpected results. For inequalities, you should ask for an approximation that always overestimates or underestimates the function (depending on the sense of the constraint), to ensure that your approximation will always satisfy the constraint. The FuncPieceRatio parameter allows you to do this. For equalities, if you have a sense of where your solution is likely to lie, one option for managing the size of the approximation is to introduce additional variables to capture your function in different ranges, and then perform approximations with different levels of accuracy on these different pieces.

While users could perform piecewise-linear approximations themselves, there are several advantages to asking Gurobi to do it instead. First, Gurobi can often reduce the domains of variables, by using bound strengthening in presolve, or by exploiting repetition in periodic functions like sine or cosine. Smaller domains means fewer pieces to achieve the same accuracy. Gurobi also provides many options to make experimentation easier (for error control, piece length, etc.). These options can be quite difficult to implement and maintain.