Constraints#
A constraint in Gurobi captures a restriction on the values that a set of variables may take. The simplest example is a linear constraint, which states that a linear expression on a set of variables take a value that is either less-than-or-equal, greater-than-or-equal, or equal to another linear expression. Recall that Gurobi works in finite-precision arithmetic, so constraints are only satisfied to tolerances. Tolerances can be tightened to reduce such violations, but there are limits to how small the violations can be – errors are inherent in floating-point arithmetic.
The available constraint types are linear, SOS, quadratic (both convex and non-convex), and general.
Linear Constraints#
A linear constraint allows you to restrict the value of a linear expression. For example, you may require that any feasible solution satisfies the constraint \(3 x + 4 y \leq 5z\). Note that the matrix-oriented Gurobi APIs (C, MATLAB, and R) require the right-hand side of a linear constraint to be a constant, while the object-oriented APIs (C++, Java, .NET, and Python) allow arbitrary linear expressions on both sides of the comparator.
The computed solution should satisfy the stated constraint to within FeasibilityTol (although it may not in cases of numerical ill-conditioning – we’ll discuss this shortly).
Gurobi supports a limited set of comparators. Specifically, you can constrain an expression to be less-than-or-equal, greater-than-or-equal, or equal another. We do not support strict less-than, strict greater-than, or not-equal comparators. While these other comparators may seem appropriate for mathematical programming, we exclude them to avoid potential confusion related to numerical tolerances. Consider a simple example of a strict inequality constraint on a pair of continuous variables: \(x > y\). How large would \(x-y\) need to be in order to satisfy the constraint? Rather than trying to embed a subtle and potentially confusing strategy for handling such constraints into the solver, we’ve chosen not to support them instead.
SOS Constraints#
A Special-Ordered Set, or SOS constraint, is a highly specialized constraint that places restrictions on the values that variables in a given list can take. There are two types of SOS constraints. In an SOS constraint of type 1 (an SOS1 constraint), at most one variable in the specified list is allowed to take a non-zero value. In an SOS constraint of type 2 (an SOS2 constraint), at most two variables in the specified, ordered list are allowed to take a non-zero value, and those non-zero variables must be contiguous in the list. The variables in an SOS constraint can be continuous, integer, or binary.
Again, tolerances play an important role in SOS constraints. Specifically, variables that take values less than IntFeasTol (in absolute value) are considered to be zero for the purposes of determining whether an SOS constraint is satisfied.
An SOS constraint is described using a list of variables and a list of corresponding weights. While the weights have historically had intuitive meanings associated with them, we simply use them to order the list of variables. The weights should be unique. This is especially important for an SOS2 constraint, which relies on the notion of contiguous variables. Since the variables in the SOS are ordered by weight, contiguity becomes ambiguous when multiple variables have the same weight.
It is often more efficient to capture SOS structure using linear constraints rather than SOS constraints. The optimizer will often perform this reformulation automatically. This is controlled with four parameters: PreSOS1BigM, PreSOS1Encoding, PreSOS2BigM and PreSOS2Encoding.
The reformulation adds constraints of the form \(x \leq M b\), where \(x\) is the variable that participates in the SOS constraint, \(b\) is a binary variable, and \(M\) is an upper bound on the value of variable \(x\). Large values of \(M\) can lead to numerical issues. The two parameters PreSOS1BigM and PreSOS2BigM control the maximum value of \(M\) that can be introduced by this reformulation. SOS constraints that would require a larger value aren’t converted. Setting one of these parameters to 0 disables the corresponding reformulation.
Additionally, there are several known integer formulations for SOS1 and SOS2 constraints. These reformulations differ in the number of variables that they introduce to the problem, in the complexity of the resulting LP relaxations, and in their properties in terms of branching and cutting planes. The two parameters PreSOS1Encoding and PreSOS2Encoding control the choice of the reformulation performed.
Quadratic Constraints#
A quadratic constraint allows you to restrict the value of a quadratic expression. For example, you may require that any feasible solution satisfy the constraint \(3 x^2 + 4 y^2 + 5 z \leq 10\). Note that the matrix-oriented Gurobi APIs (C, MATLAB, and R) require the right-hand side of a quadratic constraint to be a constant, while the object-oriented APIs (C++, Java, .NET, and Python) allow arbitrary quadratic expressions on both sides of the comparator.
The computed solution should satisfy the stated constraint to within FeasibilityTol. Quadratic constraints are often much more challenging to satisfy than linear constraints, so tightening the parameter may increase runtimes dramatically.
Gurobi can handle both convex and non-convex quadratic constraints. However, there are some subtle and important differences in how the different constraint types are handled. The default algorithms in Gurobi only accept a few forms of quadratic constraints that are known to have convex feasible regions. Constraints of the following forms are always accepted:
\(x^TQx + q^Tx \le b\), where \(Q\) is Positive Semi-Definite (PSD)
\(x^Tx \le y^{2}\), where \(x\) is a vector of variables, and \(y\) is a non-negative variable (a Second-Order Cone constraint)
\(x^Tx \le y z\), where \(x\) is a vector of variables, and \(y\) and \(z\) are non-negative variables (a rotated Second-Order Cone constraint)
To be more precise, a constraint will be accepted if presolve is able to transform it into one of these forms. Note that if the quadratic terms each contain at least one binary variable, then presolve will always be able to transform it.
If you add a constraint that can’t be transformed into one of these forms, then with default settings you will get an error (Q_NOT_PSD) when you try to solve the model. Quadratic equality constraints are always non-convex; they will give a QCP_EQUALITY_CONSTRAINT error with default settings.
Why distinguish between quadratic constraints in this form and other types of quadratic constraints? Solving models with non-convex quadratic constraints is typically much more expensive. To avoid accidentally solving a much harder problem than may have been intended, Gurobi rejects such constraints by default. If you set the NonConvex parameter to 2, however, then Gurobi will accept arbitrary quadratic constraints and attempt to solve the resulting model.
Note that other non-convex quadratic solvers often only find locally optimal solutions. The algorithms in Gurobi explore the entire search space, so they provide a globally valid lower bound on the optimal objective value, and given enough time they will find a globally optimal solution (subject to tolerances).
We would like to note a subtle point here regarding terminology. A quadratic constraint that involves only products of disjoint pairs of variables is often called a bilinear constraint, and a model that contains bilinear constraints is often called a bilinear program. Bilinear constraints are a special case of non-convex quadratic constraints, and the algorithms Gurobi uses to handle the latter are also well suited to solving bilinear programming problems.
General Constraints#
The previously-described constraints are typically handled directly by the underlying optimization algorithms (but not always). Gurobi includes an additional set of constraints, which we collectively refer to as general constraints. General constraints are mostly a convenience feature, designed to allow you to define certain variable relationships easily without having to immerse yourself in the often esoteric details of how to model these relationships in terms of the more fundamental constraints of MIP. Capturing a single one of these general constraints can often require a large set of linear and SOS constraints, plus a number of auxiliary decision variables. By supporting them directly in the Gurobi API, we simplify the modeling process by performing the transformation to a corresponding MIP formulation automatically and transparently during the solution process.
What sorts of variable relationships can be captured with general constraints? We think of them as belonging to two types: function constraints and simple constraints. Function constraints allow you to state a relationship \(y=f(x)\), where \(x\) and \(y\) are Gurobi decision variables and \(f()\) is chosen from a predefined list of functions. Gurobi performs a piecewise-linear approximation of that function within the domain of \(x\). Simple general constraints allow you to state common but more direct relationships between decision variables. The translation that goes on under the hood for these is much simpler, and the result is an exact representation of the original constraint (not an approximation).
Simple General Constraints#
Gurobi supports the following simple general constraints, each with its own syntax and semantics:
MAX constraint: The constraint \(r = \max\{x_1,\ldots,x_k,c\}\) states that the resultant variable \(r\) should be equal to the maximum of the operand variables \(x_1,\ldots,x_k\) and the constant \(c\). For example, a solution \((r=3, x_1=2, x_2=3, x_3=0)\) would be feasible for the constraint \(r = \max\{x_1,x_2,x_3,1.7\}\) because \(3\) is indeed the maximum of \(2\), \(3\), \(0\), and \(1.7\).
MIN constraint: Similar to a MAX constraint, the constraint \(r = \min\{x_1,\ldots,x_k,c\}\) states that the resultant variable \(r\) should be equal to the minimum of the operand variables \(x_1,\ldots,x_k\) and the constant \(c\).
ABS constraint: The constraint \(r = \mbox{abs}\{x\}\) states that the resultant variable \(r\) should be equal to the absolute value of the operand variable \(x\). For example, a solution \((r=3, x=-3)\) would be feasible for the constraint \(r = \mbox{abs}\{x\}\).
AND constraint: The constraint \(r = \mbox{and}\{x_1,\ldots,x_k\}\) states that the binary resultant variable \(r\) should be \(1\) if and only if all of the binary operand variables \(x_1,\ldots,x_k\) are equal to \(1\). For example, a solution \((r=1, x_1=1, x_2=1, x_3=1)\) would be feasible for the constraint \(r = \mbox{and}\{x_1,x_2,x_3\}\). Note that any involved variables that are not already binary are converted to binary.
OR constraint: Similar to an AND constraint, the constraint \(r = \mbox{or}\{x_1,\ldots,x_k\}\) states that the binary resultant variable \(r\) should be \(1\) if and only if at least one of the binary operand variables \(x_1,\ldots,x_k\) is equal to \(1\). Note that any involved variables that are not already binary are converted to binary.
NORM constraint: The constraint \(r = \mbox{norm}\{x_1,\ldots,x_k\}\) states that the resultant variable \(r\) should be equal to the vector norm of the operand variables \(x_1,\ldots,x_k\). A few options are available: the 0-norm, 1-norm, 2-norm, and infinity-norm.
INDICATOR constraints: An indicator constraint \(y = f \rightarrow a^Tx \leq b\) states that if the binary indicator variable \(y\) is equal to \(f\) in a given solution, where \(f \in \{0,1\}\), then the linear constraint \(a^Tx \leq b\) has to be satisfied. On the other hand, if \(y \neq f\) (i.e., \(y = 1-f\)) then the linear constraint may be violated. Note that the sense of the linear constraint can also be \(=\) or \(\geq\); refer to this earlier section for a more detailed description of linear constraints. Note also that declaring an INDICATOR constraint implicitly declares the indicator variable to be of binary type.
Piecewise-linear constraints: A piecewise-linear constraint \(y = f(x)\) states that the point \((x, y)\) must lie on the piecewise-linear function \(f()\) defined by a set of points \((x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\). Refer to the description of piecewise-linear objectives for details of how piecewise-linear functions are defined.
Note that adding any of these constraints to an otherwise continuous model will transform it into a MIP
As stated above, each general constraint has an equivalent MIP formulation that consists of linear and SOS constraints, and possibly auxiliary variables. Thus, you could always model such constraints yourself without using a Gurobi general constraint. For example, the MAX constraint \(r = \max\{x_1,\ldots,x_k,c\}\) can be modeled as follows:
The first two constraints state that \(r \geq \max\{x_1,\ldots,x_k,c\}\), i.e., that the resultant variable \(r\) has to be at least as large as each of the operand variables \(x_j\) and the constant \(c\). This can be modeled using inequalities, but we turned them into equations by introducing explicit continuous slack variables \(s_j \geq 0\), which we will reuse below.
Those slack variables and the remaining constraints model \(r \leq \max\{x_1,\ldots,x_k,c\}\), which is more complicated. In addition to the explicit slacks, this requires the introduction of binary auxiliary variables \(z_j \in \{0,1\}\). The SOS1 constraints state that at most one of the two variables \(s_j\) and \(z_j\) can be non-zero, which models the implication \(z_j = 1 \rightarrow s_j = 0\). Due to the third constraint, one \(z_j\) will be equal to \(1\) and thus at least one \(s_j\) will be zero. Hence, \(r = x_j\) for at least one \(j\) due to the first constraint, or \(r = c\) due to the second constraint.
Tolerances play a role in general constraints, although as you might expect, the exact role depends on the constraint type. Generally, violations in the resultant will be smaller than the feasibility tolerance, and integrality violations in integer resultants will also satisfy the integrality tolerance.
By most measures, general constraints are just a means of concisely capturing relationships between variables while removing the burden of creating an equivalent MIP formulation. However, general constraints have another potential advantage: Gurobi might be able to simplify the MIP formulation if it can prove during presolve that the simplified version suffices for the correctness of the model. For this reason, Gurobi might be able to produce a smaller or tighter representation of the general constraint than you would get from the most general formulation. For example, it might be the case that \(r \leq \max\{x_1,\ldots,x_k,c\}\) is already implied by the other constraints in the model, so that a simple set of inequalities
to describe \(r \geq \max\{x_1,\ldots,x_k,c\}\) suffices to model the relevant part of the MAX constraint.
Norm Constraint#
The norm constraint introduces a few complications that are important to be aware of. As mentioned above, this constraint allows you to set one variable equal to the norm of a vector of variables. A few norms are available. The L1 norm is equal to the sum of the absolute values of the operand variables. The L-infinity norm is equal to the maximum absolute value of any operand. The L2 norm is equal to the square root of the sum of the squares of the operands. The L0 norm counts the number of non-zero values among the operands.
Regarding the L2 norm, one obvious complication comes from the fact that enforcing it requires a quadratic constraint. If your model only ever bounds the result from above (e.g., \(r=||x||<=1\)), then the resulting constraint will be convex. If your model was otherwise convex, the resulting model will be a (convex) QCP. However, if you try to bound the result from below (e.g., \(r=||x||>=y\)), adding the L2 norm constraint will lead to a non-convex QCP model, which will typically be significantly harder to solve.
Regarding the L0 norm, note that results obtained with this constraint can be counter-intuitive. This is a consequence of the fact that for nearly any feasible solution with a variable at exactly 0, you can add a small value into that variable while still satisfying all associated constraints to tolerances. The net result is that a lower bound on the L0 norm is often satisfied by “cheating” - by setting enough variables to values that are slightly different from zero. We strongly recommend that you only bound the result from above. That is, you should avoid using the resultant in situations where the model incentivizes a larger value. This would include situations where the objective coefficient is negative, as well as situations where a larger value for the variable could help to satisfy a constraint (e.g., a greater-than constraint where the resultant appears with a positive coefficient).
Function Constraints#
Gurobi supports the following function constraints, each with somewhat different syntax and semantics (\(x\) and \(y\) below are Gurobi decision variables, and other terms are constants provided as input when the constraint is added to the model):
Polynomial: \(y = p_0 x^n + p_1 x^{n-1} + ... + p_{n-1} x + p_n\)
Natural exponential: \(y = \exp(x)\) or \(y = e^x\)
Exponential: \(y = a^x\), where \(a > 0\) is the base for the exponential function
Natural logarithm: \(y = \log_e(x)\) or \(y = \ln(x)\)
Logarithm: \(y = \log_a(x)\), where \(a > 0\) is the base for the logarithmic function
Logistic: \(y = \frac{1}{1 + \exp(-x)}\) or \(y = \frac{1}{1 + e^{-x}}\)
Power: \(y = x^a\), where \(x \geq 0\) for any \(a\) and \(x > 0\) for \(a < 0\)
Sine: \(y = \sin(x)\)
Cosine: \(y = \cos(x)\)
Tangent: \(y = \tan(x)\)
As noted earlier, Gurobi will automatically add a piecewise-linear approximation of the function to the model. You face a fundamental cost-versus-accuracy tradeoff when performing such an approximation, though: adding more pieces produces smaller approximation errors, but also increases the cost of solving the problem. The tradeoff can be complex. Gurobi provides a set of three attributes that help to navigate this tradeoff: FuncPieces, FuncPieceLength, FuncPieceError. They are used as follows:
If you would like to choose the number of pieces to use for the approximation, set the FuncPieces attribute to the desired value. All pieces will have equal width. This approach allows you to control the size of the approximation.
If you would like to choose the width of each piece, set the FuncPieces attribute to a special value of 1 and set the FuncPieceLength attribute equal to the desired width of each piece. This approach provides some control over both the size and the error of the approximation. While this may appear to be a minor variation of the first option, note that presolve may tighten the domain of \(x\), often substantially, which can make it difficult to predict the relationship between the width of each piece and the number of pieces.
If you would like to set the maximum error you are willing to tolerate in the approximation, set the FuncPieces attribute to a special value of -1 and set the FuncPieceError attribute equal to the maximum absolute approximation you are willing to tolerate. Gurobi will choose pieces, typically of different sizes, to achieve that error bound. Note that the number of pieces required may be quite large if you set a tight error tolerance. You can control the maximum relative error rather than the absolute error by setting the FuncPieces attribute to -2 instead of -1.
These are attributes on the general constraints, so you can choose different values for each individual constraint.
The other relevant attribute is FuncPieceRatio, which controls whether the approximation is an underestimate of the function (0.0), an overestimate (1.0), or somewhere in between (any value strictly between 0.0 and 1.0). You can also choose the special value of -1, which will choose points that are on the original function.
Consider the following simple example:
The goal is to find an approximation of the polynomial \(y=x^2\). We’ve set FuncPieces to \(1\) and FuncPieceLength to \(1.0\), so we’re performing an approximation with fixed-width pieces of width 1.0. The domain of \(x\) is \([0,2]\), so the approximation has two pieces. The figure shows 6 points: \(P_{u1}(0,0), P_{u2}(1,1), P_{u3}(2,4)\), and \(P_{l1}(0,-0.25), P_{l2}(1, 0.75), P_{l3}(2,3.75)\). If FuncPieceRatio is set to \(0.0\), the approximation would be built from the points below the function (\(P_{l1}, P_{l2}\), and \(P_{l3}\)). Similarly, if it is set to \(1.0\), the approximation would be built from the points above the function (\(P_{u1}, P_{u2}\), and \(P_{u3}\)). A value of \(0.6\) would use weighted combinations of the points: \(0.6\) times \(P_{ui}\) plus \(0.4\) times \(P_{li}\). In this case, the line segments would be built from the points \((0, -0.1), (1, 0.9)\), and \((2, 3.9)\). If FuncPieceRatio is set to \(-1\), meaning that the approximation would be built from points that are on the original function, in this case the upper points (\(P_{u1}, P_{u2}\), and \(P_{u3}\)) fit the bill. This will always be the case for a convex function.
Recall that you can set FuncPieces to \(-1\) to control the maximum absolute error. In this case, choosing a FuncPieceError value of \(0.25\) would give the piecewise approximation shown in the figure, since the distance between the upper and lower curves is always \(0.25\). A smaller error value would of course lead to more pieces. We should add that piece widths will typically be non-uniform when limiting the maximum approximation error. The approximation algorithms we use try to limit the number of pieces needed to meet the error targets, which often requires more refinement in some portions of the domain than in others.
Note that the approximations are guaranteed to be under- and over-estimates in all cases except for polynomials of degree greater than 5. Finding the roots of higher-degree polynomials, which would be required to guarantee this property, is quite difficult.
If you wish to experiment with different approaches to approximating a set of functions, it is often convenient to be able to change the approach for all functions at once. We provide a set of parameters with the same names as the attributes to make it easier to do this: FuncPieces, FuncPieceLength, FuncPieceError, and FuncPieceRatio. If you set the FuncPieces attribute on a function constraint to \(0\), then the approximation approach for that constraint will be determined by the parameter settings instead.
For some of the supported functions, modest \(x\) values can lead to enormous \(y\) values (and vice-versa). This can cause numerical issues when solving the resulting piecewise-linear MIP model. To avoid such issues, we limit the range of any \(x\) or \(y\) that participates in a function constraint to [-1e+6, 1e+6]. The parameter FuncMaxVal allows you to change these limits, but we recommend that you proceed with caution.
We should point out that PWL approximations can sometimes cause unexpected results, including sub-optimal solutions or even infeasible conclusions on feasible models. Consider a simple example with two constraints: \(y = 2x-1\) and \(y = x^2\). Clearly \((x, y) = (1, 1)\) is a feasible solution, but a piecewise-linear approximation could introduce breakpoints at \(x=0.9\) and \(x=1.1\). The resulting approximation gives a \(y\) value of \(1.01\) at \(x=1\), which is sufficiently far from the actual function value that Gurobi will not consider that a valid solution and declare the model infeasible, since there are no other solutions to the constraints. Reducing the maximum approximation error (by setting FuncPieces to -1 and FuncPieceError to a much smaller value) would help, but this isn’t always the best way to address the problem, since tighter error tolerances can substantially increase the number of pieces in the approximation and thus the cost. We recommend the following approach when you encounter unexpected results. For inequalities, you should ask for an approximation that always overestimates or underestimates the function (depending on the sense of the constraint), to ensure that your approximation will always satisfy the constraint. The FuncPieceRatio parameter allows you to do this. For equalities, if you have a sense of where your solution is likely to lie, one option for managing the size of the approximation is to introduce additional variables to capture your function in different ranges, and then perform approximations with different levels of accuracy on these different pieces.
While users could perform piecewise-linear approximations themselves, there are several advantages to asking Gurobi to do it instead. First, Gurobi can often reduce the domains of variables, by using bound strengthening in presolve, or by exploiting repetition in periodic functions like sine or cosine. Smaller domains means fewer pieces to achieve the same accuracy. Gurobi also provides many options to make experimentation easier (for error control, piece length, etc.). These options can be quite difficult to implement and maintain.