Constraints#

A constraint in Gurobi captures a restriction on the values that a set of variables may take. The simplest example is a linear constraint, which states that a linear expression on a set of variables take a value that is either less-than-or-equal, greater-than-or-equal, or equal to another linear expression. More complicated constraints are also supported, including quadratic constraints (e.g., \(x^2 + y^2 <= z^2\)), logical constraints (e.g., logical AND on binary variables, if-then, etc.), and a few non-linear functions (e.g., \(y = sin(x)\)).

We now consider a few more details about linear, SOS, quadratic (both convex and non-convex), and general constraints. General constraints are the catch-all we use for the constraint types that don’t fit in the other categories.

Recall that Gurobi works in finite-precision arithmetic, so constraints are only satisfied to tolerances. Tolerances can be tightened to reduce such violations, but there are limits to how small the violations can be – errors are inherent in floating-point arithmetic. This point will be reiterated in several places in this section.

Linear Constraints#

A linear constraint allows you to restrict the value of a linear expression. For example, you may require that any feasible solution satisfies the constraint \(3 x + 4 y \leq 5z\). Note that the matrix-oriented Gurobi APIs (C, MATLAB, and R) require the right-hand side of a linear constraint to be a constant, while the object-oriented APIs (C++, Java, .NET, and Python) allow arbitrary linear expressions on both sides of the comparator.

The computed solution should satisfy the stated constraint to within FeasibilityTol (although it may not in cases of numerical ill-conditioning – we’ll discuss this shortly).

Gurobi supports a limited set of comparators. Specifically, you can constrain an expression to be less-than-or-equal, greater-than-or-equal, or equal another. We do not support strict less-than, strict greater-than, or not-equal comparators. While these other comparators may seem appropriate for mathematical programming, we exclude them to avoid potential confusion related to numerical tolerances. Consider a simple example of a strict inequality constraint on a pair of continuous variables: \(x > y\). How large would \(x-y\) need to be in order to satisfy the constraint? Rather than trying to embed a subtle and potentially confusing strategy for handling such constraints into the solver, we’ve chosen not to support them instead.

SOS Constraints#

A Special-Ordered Set, or SOS constraint, is a highly specialized constraint that places restrictions on the values that variables in a given list can take. There are two types of SOS constraints. In an SOS constraint of type 1 (an SOS1 constraint), at most one variable in the specified list is allowed to take a non-zero value. In an SOS constraint of type 2 (an SOS2 constraint), at most two variables in the specified, ordered list are allowed to take a non-zero value, and those non-zero variables must be contiguous in the list. The variables in an SOS constraint can be continuous, integer, or binary.

Again, tolerances play an important role in SOS constraints. Specifically, variables that take values less than IntFeasTol (in absolute value) are considered to be zero for the purposes of determining whether an SOS constraint is satisfied.

An SOS constraint is described using a list of variables and a list of corresponding weights. While the weights have historically had intuitive meanings associated with them, we simply use them to order the list of variables. The weights should be unique. This is especially important for an SOS2 constraint, which relies on the notion of contiguous variables. Since the variables in the SOS are ordered by weight, contiguity becomes ambiguous when multiple variables have the same weight.

It is often more efficient to capture SOS structure using linear constraints rather than SOS constraints. The optimizer will often perform this reformulation automatically. This is controlled with four parameters: PreSOS1BigM, PreSOS1Encoding, PreSOS2BigM and PreSOS2Encoding.

The reformulation adds constraints of the form \(x \leq M b\), where \(x\) is the variable that participates in the SOS constraint, \(b\) is a binary variable, and \(M\) is an upper bound on the value of variable \(x\). Large values of \(M\) can lead to numerical issues. The two parameters PreSOS1BigM and PreSOS2BigM control the maximum value of \(M\) that can be introduced by this reformulation. SOS constraints that would require a larger value aren’t converted. Setting one of these parameters to 0 disables the corresponding reformulation.

Additionally, there are several known integer formulations for SOS1 and SOS2 constraints. These reformulations differ in the number of variables that they introduce to the problem, in the complexity of the resulting LP relaxations, and in their properties in terms of branching and cutting planes. The two parameters PreSOS1Encoding and PreSOS2Encoding control the choice of the reformulation performed.

Quadratic Constraints#

A quadratic constraint allows you to restrict the value of a quadratic expression. For example, you may require that any feasible solution satisfy the constraint \(3 x^2 + 4 y^2 + 5 z \leq 10\). Note that the matrix-oriented Gurobi APIs (C, MATLAB, and R) require the right-hand side of a quadratic constraint to be a constant, while the object-oriented APIs (C++, Java, .NET, and Python) allow arbitrary quadratic expressions on both sides of the comparator.

The computed solution should satisfy the stated constraint to within FeasibilityTol. Quadratic constraints are often much more challenging to satisfy than linear constraints, so tightening the parameter may increase runtimes dramatically.

Gurobi can handle both convex and non-convex quadratic constraints. However, there are some subtle and important differences in how the different constraint types are handled. In general, it is much easier to solve a model whose constraints all have convex feasible regions. It is actually quite difficult to recognize all such cases, but the following forms are always recognized:

  • \(x^TQx + q^Tx \le b\), where \(Q\) is Positive Semi-Definite (PSD)

  • \(x^TQx \le y^{2}\), where \(Q\) is Positive Semi-Definite (PSD), \(x\) is a vector of variables, and \(y\) is a non-negative variable (a Second-Order Cone constraint, if \(Q = I\), identity matrix)

  • \(x^TQx \le y z\), where \(Q\) is Positive Semi-Definite (PSD), \(x\) is a vector of variables, and \(y\) and \(z\) are non-negative variables (a rotated Second-Order Cone constraint, if \(Q = I\), identity matrix)

To be more precise, a quadratic constraint will always be recognized as convex if presolve is able to transform it into one of these forms. Note that if all quadratic terms in a quadratic constraint contain at least one binary variable, then presolve will always be able to transform it to a convex form.

Why distinguish between convex and non-convex quadratic constraints? In some situations you may know that your problem should be convex, and thus it may be a sign of a modeling error if your model isn’t recognized as such. To avoid accidentally solving a much harder problem than you may have intended, you can set the NonConvex parameter to either 0 or 1. In the default setting of -1 or if the NonConvex parameter is set to 2, Gurobi will accept arbitrary quadratic constraints and attempt to solve the resulting model using the appropriate algorithm.

Note that other non-convex quadratic solvers often only find locally optimal solutions. The algorithms in Gurobi explore the entire search space, so they provide a globally valid lower bound on the optimal objective value, and given enough time they will find a globally optimal solution (subject to tolerances).

We would like to note a subtle point here regarding terminology. A quadratic constraint that involves only products of disjoint pairs of variables is often called a bilinear constraint, and a model that contains bilinear constraints is often called a bilinear program. Bilinear constraints are a special case of non-convex quadratic constraints, and the algorithms Gurobi uses to handle the latter are also well suited to solving bilinear programming problems.

General Constraints#

The previously-described constraints are typically (but not always) handled directly by the underlying optimization algorithms. Gurobi includes an additional set of higher-level constraints, which we collectively refer to as general constraints, that require special handling. We think of these as belonging to two types: simple constraints and function constraints

Simple general constraints are a modeling convenience. They allow you to state fairly simple relationships between variables (min, max, absolute value, logical OR, etc.). Techniques for translating these constraints into lower-level modeling objects (typically using auxiliary binary variables and linear or SOS constraints) are well known and can be found in optimization modeling textbooks. By automating these translations and removing the need for the modeler to perform them, the hope is that these simple general constraints will allow you to write more readable and more maintainable models.

Function constraints allow you to state much more complex relationships between variables; you can require that \(y=f(x)\), where \(x\) and \(y\) are Gurobi decision variables and \(f()\) is chosen from a predefined list of nonlinear functions. The common theme among these functions is that there is no simple way to restate the associated constraints using the primitive objects (like integer variables and linear constraints) that a standard MIP solver wants to work with. Alternate algorithms are required, and the resulting models are typically much more difficult to solve than models that do not contain these constraints.

Simple General Constraints#

Gurobi supports the following simple general constraints, each with its own syntax and semantics:

  • MAX constraint: The constraint \(r = \max\{x_1,\ldots,x_k,c\}\) states that the resultant variable \(r\) should be equal to the maximum of the operand variables \(x_1,\ldots,x_k\) and the constant \(c\). For example, a solution \((r=3, x_1=2, x_2=3, x_3=0)\) would be feasible for the constraint \(r = \max\{x_1,x_2,x_3,1.7\}\) because \(3\) is indeed the maximum of \(2\), \(3\), \(0\), and \(1.7\).

  • MIN constraint: Similar to a MAX constraint, the constraint \(r = \min\{x_1,\ldots,x_k,c\}\) states that the resultant variable \(r\) should be equal to the minimum of the operand variables \(x_1,\ldots,x_k\) and the constant \(c\).

  • ABS constraint: The constraint \(r = \mbox{abs}\{x\}\) states that the resultant variable \(r\) should be equal to the absolute value of the operand variable \(x\). For example, a solution \((r=3, x=-3)\) would be feasible for the constraint \(r = \mbox{abs}\{x\}\).

  • AND constraint: The constraint \(r = \mbox{and}\{x_1,\ldots,x_k\}\) states that the binary resultant variable \(r\) should be \(1\) if and only if all of the binary operand variables \(x_1,\ldots,x_k\) are equal to \(1\). For example, a solution \((r=1, x_1=1, x_2=1, x_3=1)\) would be feasible for the constraint \(r = \mbox{and}\{x_1,x_2,x_3\}\). Note that any involved variables that are not already binary are converted to binary.

  • OR constraint: Similar to an AND constraint, the constraint \(r = \mbox{or}\{x_1,\ldots,x_k\}\) states that the binary resultant variable \(r\) should be \(1\) if and only if at least one of the binary operand variables \(x_1,\ldots,x_k\) is equal to \(1\). Note that any involved variables that are not already binary are converted to binary.

  • NORM constraint: The constraint \(r = \mbox{norm}\{x_1,\ldots,x_k\}\) states that the resultant variable \(r\) should be equal to the vector norm of the operand variables \(x_1,\ldots,x_k\). A few options are available: the 0-norm, 1-norm, 2-norm, and infinity-norm.

  • INDICATOR constraints: An indicator constraint \(y = f \rightarrow a^Tx \leq b\) states that if the binary indicator variable \(y\) is equal to \(f\) in a given solution, where \(f \in \{0,1\}\), then the linear constraint \(a^Tx \leq b\) has to be satisfied. On the other hand, if \(y \neq f\) (i.e., \(y = 1-f\)) then the linear constraint may be violated. Note that the sense of the linear constraint can also be \(=\) or \(\geq\); refer to this earlier section for a more detailed description of linear constraints. Note also that declaring an INDICATOR constraint implicitly declares the indicator variable to be of binary type.

  • Piecewise-linear constraints: A piecewise-linear constraint \(y = f(x)\) states that the point \((x, y)\) must lie on the piecewise-linear function \(f()\) defined by a set of points \((x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\). Refer to the description of piecewise-linear objectives for details of how piecewise-linear functions are defined.

Note that adding any of these constraints to an otherwise continuous model will transform it into a MIP

As stated above, each general constraint has an equivalent MIP formulation that consists of linear and SOS constraints, and possibly auxiliary variables. Thus, you could always model such constraints yourself without using a Gurobi general constraint. For example, the MAX constraint \(r = \max\{x_1,\ldots,x_k,c\}\) can be modeled as follows:

\[\begin{split}\begin{array}{rcll} r & = & x_j + s_j & \mbox{ for all } j = 1,\ldots,k \\ r & = & c + s_{k+1} & \\ z_1 + \ldots + z_{k+1} & = & 1 & \\ SOS1(s_j, z_j) & & & \mbox{ for all } j = 1,\ldots,k+1 \\ s_j & \geq & 0 & \mbox{ for all } j = 1,\ldots,k+1 \\ z_j & \in & \{0,1\} & \mbox{ for all } j = 1,\ldots,k+1 \end{array}\end{split}\]

The first two constraints state that \(r \geq \max\{x_1,\ldots,x_k,c\}\), i.e., that the resultant variable \(r\) must be at least as large as each of the operand variables \(x_j\) and the constant \(c\). This could be modeled using inequalities, but the explicit slack variables play an important role in the constraints that follow.

The next two constraints enforce \(r \leq \max\{x_1,\ldots,x_k,c\}\), which ensures that \(r\) is equal to the MAX expression. Enforcing this side of the equality is actually a lot more complicated. We need to introduce binary auxiliary variables \(z_j \in \{0,1\}\), and SOS1 constraints to required that at most one of the two variables \(s_j\) and \(z_j\) can be non-zero, which models the implication \(z_j = 1 \rightarrow s_j = 0\). Due to the third constraint, one \(z_j\) will be equal to \(1\) and thus at least one \(s_j\) will be zero. Hence, \(r = x_j\) for at least one \(j\) due to the first constraint, or \(r = c\) due to the second constraint.

Tolerances play a role in general constraints, although as you might expect, the exact role depends on the constraint type. As a general rule, violations in the resultant will be smaller than the feasibility tolerance, and integrality violations in integer resultants will also satisfy the integrality tolerance.

By most measures, general constraints are just a means of concisely capturing relationships between variables while removing the burden of creating an equivalent MIP formulation. However, general constraints have another potential advantage: Gurobi might be able to simplify the MIP formulation if it can prove during presolve that the simplified version suffices for the correctness of the model. For this reason, Gurobi might be able to produce a smaller or tighter representation of the general constraint than you would get from the most general formulation. For example, it might be the case that \(r \leq \max\{x_1,\ldots,x_k,c\}\) is already implied by the other constraints in the model, so that a simple set of inequalities

\[\begin{split}\begin{array}{rcll} r & \geq & x_j \;\;\mbox{ for all } j = 1,\ldots,k \\ r & \geq & c \end{array}\end{split}\]

to describe \(r \geq \max\{x_1,\ldots,x_k,c\}\) suffices to model the relevant part of the MAX constraint.

Norm Constraint#

The norm constraint introduces a few complications that are important to be aware of. As mentioned above, this constraint allows you to set one variable equal to the norm of a vector of variables. A few norms are available. The L1 norm is equal to the sum of the absolute values of the operand variables. The L-infinity norm is equal to the maximum absolute value of any operand. The L2 norm is equal to the square root of the sum of the squares of the operands. The L0 norm counts the number of non-zero values among the operands.

Regarding the L2 norm, one obvious complication comes from the fact that enforcing it requires a quadratic constraint. If your model only ever bounds the result from above (e.g., \(r=||x||<=1\)), then the resulting constraint will be convex. If your model was otherwise convex, the resulting model will be a (convex) QCP. However, if you try to bound the result from below (e.g., \(r=||x||>=y\)), adding the L2 norm constraint will lead to a non-convex QCP model, which will typically be significantly harder to solve.

Regarding the L0 norm, note that results obtained with this constraint can be counter-intuitive. This is a consequence of the fact that for nearly any feasible solution with a variable at exactly 0, you can add a small value into that variable while still satisfying all associated constraints to tolerances. The net result is that a lower bound on the L0 norm is often satisfied by “cheating” - by setting enough variables to values that are slightly different from zero. We strongly recommend that you only bound the result from above. That is, you should avoid using the resultant in situations where the model incentivizes a larger value. This would include situations where the objective coefficient is negative, as well as situations where a larger value for the variable could help to satisfy a constraint (e.g., a greater-than constraint where the resultant appears with a positive coefficient).

Function Constraints#

Gurobi supports the following function constraints, each with somewhat different syntax and semantics (\(x\) and \(y\) below are Gurobi decision variables, and other terms are constants provided as input when the constraint is added to the model):

  • Polynomial: \(y = p_0 x^n + p_1 x^{n-1} + ... + p_{n-1} x + p_n\)

  • Natural exponential: \(y = \exp(x)\) or \(y = e^x\)

  • Exponential: \(y = a^x\), where \(a > 0\) is the base for the exponential function

  • Natural logarithm: \(y = \log_e(x)\) or \(y = \ln(x)\)

  • Logarithm: \(y = \log_a(x)\), where \(a > 0\) is the base for the logarithmic function

  • Logistic: \(y = \frac{1}{1 + \exp(-x)}\) or \(y = \frac{1}{1 + e^{-x}}\)

  • Power: \(y = x^a\), where \(x \geq 0\) for any \(a\) and \(x > 0\) for \(a < 0\)

  • Sine: \(y = \sin(x)\)

  • Cosine: \(y = \cos(x)\)

  • Tangent: \(y = \tan(x)\)

As noted earlier, there is no good way to restate these nonlinear constraints so that a standard MIP solver can handle them efficiently and accurately. The most effective approaches to solving such problems involve replacing the constraints with piecewise-linear approximations. Gurobi provides two different mechanisms for doing so:

  • A static piecewise-linear approximation, where the nonlinear function is approximated once, before the MIP solution process begins. That model can then be solved using the standard MIP solver, but the accuracy of the approximation will of course depend heavily on the number of pieces used. For some nonlinear functions, the number of pieces required to achieve the desired accuracy may be lead to excessive solution times.

  • A dynamic piecewise-linear approximation, using outer approximation and spatial branch-and-bound. In this approach, the MIP solver is extended to compute an approximation in the neighborhood of the current relaxation solution at each node of the branch-and-bound search. The solver can compute very accurate solutions using this approach, but having the model change at each node in the branch-and-bound tree greatly complicates the search, which can lead to much less efficient tree exploration.

The most effective approach will depend on your model and your accuracy goals. That’s why we provide both options.

Function Constraints with Static Piecewise-Linear Approximation#

As noted earlier, the first option for managing function constraints is to perform a static piecewise-linear approximation, yielding a MIP model that can be handed to the standard MIP solver. Gurobi will take this approach for any function where the FuncNonlinear attribute is set to 0 (or that attribute is set to -1 and the global FuncNonlinear parameter is set to 0).

With this approach to handling non-linearity, you face a fundamental cost-versus-accuracy tradeoff: adding more pieces produces smaller approximation errors, but also increases the cost of solving the problem. The tradeoff can be complex. Gurobi provides a set of three attributes that help to navigate this tradeoff: FuncPieces, FuncPieceLength, FuncPieceError. They are used as follows:

  • If you would like to choose the number of pieces to use for the approximation, set the FuncPieces attribute to the desired value. All pieces will have equal width. This approach allows you to control the size of the approximation.

  • If you would like to choose the width of each piece, set the FuncPieces attribute to a special value of 1 and set the FuncPieceLength attribute equal to the desired width of each piece. This approach provides some control over both the size and the error of the approximation. While this may appear to be a minor variation of the first option, note that presolve may tighten the domain of \(x\), often substantially, which can make it difficult to predict the relationship between the width of each piece and the number of pieces.

  • If you would like to set the maximum error you are willing to tolerate in the approximation, set the FuncPieces attribute to a special value of -1 and set the FuncPieceError attribute equal to the maximum absolute approximation you are willing to tolerate. Gurobi will choose pieces, typically of different sizes, to achieve that error bound. Note that the number of pieces required may be quite large if you set a tight error tolerance. You can control the maximum relative error rather than the absolute error by setting the FuncPieces attribute to -2 instead of -1.

These are attributes on the general constraints, so you can choose different values for each individual constraint.

The other relevant attribute is FuncPieceRatio, which controls whether the approximation is an underestimate of the function (0.0), an overestimate (1.0), or somewhere in between (any value strictly between 0.0 and 1.0). You can also choose the special value of -1, which will choose points that are on the original function.

Consider the following simple example:

../../_images/func.svg
../../_images/func-dark.svg

The goal is to find an approximation of the polynomial \(y=x^2\). We’ve set FuncPieces to \(1\) and FuncPieceLength to \(1.0\), so we’re performing an approximation with fixed-width pieces of width 1.0. The domain of \(x\) is \([0,2]\), so the approximation has two pieces. The figure shows 6 points: \(P_{u1}(0,0), P_{u2}(1,1), P_{u3}(2,4)\), and \(P_{l1}(0,-0.25), P_{l2}(1, 0.75), P_{l3}(2,3.75)\). If FuncPieceRatio is set to \(0.0\), the approximation would be built from the points below the function (\(P_{l1}, P_{l2}\), and \(P_{l3}\)). Similarly, if it is set to \(1.0\), the approximation would be built from the points above the function (\(P_{u1}, P_{u2}\), and \(P_{u3}\)). A value of \(0.6\) would use weighted combinations of the points: \(0.6\) times \(P_{ui}\) plus \(0.4\) times \(P_{li}\). In this case, the line segments would be built from the points \((0, -0.1), (1, 0.9)\), and \((2, 3.9)\). If FuncPieceRatio is set to \(-1\), meaning that the approximation would be built from points that are on the original function, in this case the upper points (\(P_{u1}, P_{u2}\), and \(P_{u3}\)) fit the bill. This will always be the case for a convex function.

Recall that you can set FuncPieces to \(-1\) to control the maximum absolute error. In this case, choosing a FuncPieceError value of \(0.25\) would give the piecewise approximation shown in the figure, since the distance between the upper and lower curves is always \(0.25\). A smaller error value would of course lead to more pieces. We should add that piece widths will typically be non-uniform when limiting the maximum approximation error. The approximation algorithms we use try to limit the number of pieces needed to meet the error targets, which often requires more refinement in some portions of the domain than in others.

Note that the approximations are guaranteed to be under- and over-estimates in all cases except for polynomials of degree greater than 5. Finding the roots of higher-degree polynomials, which would be required to guarantee this property, is quite difficult.

If you wish to experiment with different approaches to approximating a set of functions, it is often convenient to be able to change the approach for all functions at once. We provide a set of parameters with the same names as the attributes to make this easier: FuncPieces, FuncPieceLength, FuncPieceError, and FuncPieceRatio. If you set the FuncPieces attribute on a function constraint to \(0\), then the approximation approach for that constraint will be determined by the parameter settings instead.

For some of the supported functions, modest \(x\) values can lead to enormous \(y\) values (and vice-versa). This can cause numerical issues when solving the resulting piecewise-linear MIP model. To avoid such issues, we limit the range of any \(x\) or \(y\) that participates in a function constraint to [-1e+6, 1e+6]. The parameter FuncMaxVal allows you to change these limits, but we recommend that you proceed with caution.

We should point out we handle violations and tolerances differently for PWL approximations, which can sometimes lead to unexpected results. For one, the feasibility tolerance for function constraints is specified in FuncPieceError rather than through the standard feasibility tolerance. We also interpret violations differently. The violation of a function constraint is computed as the Euclidian distance from the solution to the nearest point on the function. Computing this distance exactly can be quite involved for some functions, so we actually compute an over-estimate in some cases. The net result is that the reported violations may be larger than you expect.

Another possible source of unexpected results comes from the fact that solutions that satisfy the original nonlinear function may not satisfy the piecewise-linear approximation of that function. This may lead to sub-optimal solutions or even conclusions of infeasibility. Consider a simple example with two constraints: \(y = 2x-1\) and \(y = x^2\). Clearly \((x, y) = (1, 1)\) is a feasible solution, but a piecewise-linear approximation could introduce breakpoints at \(x=0.9\) and \(x=1.1\). The resulting approximation gives a \(y\) value of \(1.01\) at \(x=1\), which is sufficiently far from the actual function value that Gurobi will not consider that a valid solution and declare the model infeasible, since there are no other solutions to the constraints. Reducing the maximum approximation error (by setting FuncPieces to -1 and FuncPieceError to a much smaller value) would help, but this isn’t always the best way to address the problem, since tighter error tolerances can substantially increase the number of pieces in the approximation and thus the cost. We recommend the following approach when you encounter unexpected results. For inequalities, you should ask for an approximation that always overestimates or underestimates the function (depending on the sense of the constraint), to ensure that your approximation will always satisfy the constraint. The FuncPieceRatio parameter allows you to do this. For equalities, if you have a sense of where your solution is likely to lie, one option for managing the size of the approximation is to introduce additional variables to capture your function in different ranges, and then perform approximations with different levels of accuracy on these different pieces.

While users could perform piecewise-linear approximations themselves, there are several advantages to asking Gurobi to do it instead. First, Gurobi can often reduce the domains of variables, by using bound strengthening in presolve, or by exploiting repetition in periodic functions like sine or cosine. Smaller domains means fewer pieces to achieve the same accuracy. Gurobi also provides many options to make experimentation easier (for error control, piece length, etc.). These options can be quite difficult to implement and maintain.

Function Constraints With Dynamic Piecewise-Linear Approximation#

The alternative for managing function constraints is a more dynamic approach, using spatial branching and outer approximation. Gurobi will take this approach for any function where the FuncNonlinear attribute is set to 1 (or that attribute is set to -1 and the global FuncNonlinear parameter is set to 1).

Solving a model with (nonconvex) nonlinear constraints to global optimality is well known to be a hard task. The idea behind spatial branching is to divide the overall solution space into portions, and to compute valid primal and dual bounds for each portion. Such bounds are obtained by computing a linear approximation to the feasible space for each region (a so-called outer approximation) that contains all feasible solutions for the original nonlinear constraints in that region. Bounds for a region can often be tightened by spatial branching, where the domain of a variable (integer or continuous) is split, allowing hopefully tighter outer approximations to be computed for the resulting feasible subregions. Bounds for the original model can be obtained by combining the bounds from the leafs of the resulting branch-and-bound tree. This process continues until the desired optimality gap (MIPGap) is achieved.

Valid primal bounds come from feasible solutions found during the search. They can be found using various heuristics, or they may come from solutions to the current node relaxation that happen to satisfy the nonlinear and integrality constraints (relaxation solutions will always satisfy linear and bound constraints because these are still present in the relaxation). Valid dual bounds are found by solving an LP over a polyhedron that contains all feasible solutions to the original linear and nonlinear constraints in that region (the outer approximation).

Currently, Gurobi only directly supports the univariate nonlinear functions listed in the Function Constraints section. More complex functions must be disaggregated into a cascade of supported univariate functions. For example, the nonlinear equality constraint

\[x_3 = 100000 \cdot \frac{x_1}{x_1 + 1000 \cdot x_2}\]

can be disaggregated into

\[\begin{split}\begin{array}{ll} z_1 = & x_1 + 1000 \cdot x_2 \\ z_2 = & z_1^{-1} \\ x_3 = & 100000 \cdot x_1 \cdot z_2. \end{array}\end{split}\]

Such disaggregation is standard practice for most global non-linear solvers (but it is typically performend behind the scenes).

While the disaggregated reformulation is mathematically equivalent to the original model, we should note that the presence of large coefficients in the expression can lead to larger violations of the original constraints than you might expect, due to the accumulation of violations of the disaggregated constraints.

Suppose we are using the default feasibility tolerance of \(10^{-6}\) and Gurobi produces the following solution for the disaggregated model:

\[\begin{split}\begin{array}{ll} x_1 = & 1 \\ x_2 = & 1 \\ z_1 = & 1001.000001 \\ z_2 = & 0.001000000998003 \\ x_3 = & 100.0000998003 \end{array}\end{split}\]

As one can see, this solution is indeed feasible within the tolerance for the disaggregated model. However,

\[100000 \cdot \frac{1}{1 + 1000 \cdot 1} = 99.9000999001 \neq 100.0000998003 = x_3.\]

The 100000 coefficient magnifies the error in the intermediate result, leading to a violation of almost \(0.1\) on the original model, which is much larger than the specified feasibility tolerance.

This example illustrates that a solution that is feasible for the disaggregated model may not necessarily be feasible for the aggregated (original) model. Clearly, the above example is an extreme case, but this issue can definitely arise in other situations. This makes scaling and tight coefficient ranges extremely important when formulating and solving nonlinear models.