Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Engineering Design Optimization. Cambridge University Press, Jan 2022

11 Convex Optimization

General nonlinear optimization problems are difficult to solve. Depending on the particular optimization algorithm, they may require tuning parameters, providing derivatives, adjusting scaling, and trying multiple starting points. Convex optimization problems do not have any of those issues and are thus easier to solve. The challenge is that these problems must meet strict requirements. Even for candidate problems with the potential to be convex, significant experience is usually needed to recognize and utilize techniques that reformulate the problems into an appropriate form.

11.1 Introduction

Convex optimization problems have desirable characteristics that make them more predictable and easier to solve. Because a convex problem has provably only one optimum, convex optimization methods always converge to the global minimum. Solving convex problems is straightforward and does not require a starting point, parameter tuning, or derivatives, and such problems scale well up to millions of design variables.1

All we need to solve a convex problem is to set it up appropriately; there is no need to worry about convergence, local optima, or noisy functions. Some convex problems are so straightforward that they are not recognized as an optimization problem and are just thought of as a function or operation. A familiar example of the latter is the linear-least-squares problem (described previously in Section 10.3.1 and revisited in a subsequent section).

Although these are desirable properties, the catch is that convex problems must satisfy strict requirements. Namely, the objective and all inequality constraints must be convex functions, and the equality constraints must be affine.[1]A function ff is convex if

f((1η)x1+ηx2)(1η)f(x1)+ηf(x2) f \left( (1-\eta) x_1 + \eta x_2 \right) \le (1-\eta) f(x_1) + \eta f(x_2)

for all x1x_1 and x2x_2 in the domain, where 0η10 \le \eta \le 1. This requirement is illustrated in Figure 11.1 for the one-dimensional case. The right-hand side of the inequality is just the equation of a line from f(x1)f(x_1) to f(x2)f(x_2) (the blue line), whereas the left-hand side is the function f(x)f(x) evaluated at all points between x1x_1 and x2x_2 (the black curve).

Convex function definition in the one-dimensional case: The function (black line) must be below a line that connects any two points in the domain (blue line).

Figure 11.1:Convex function definition in the one-dimensional case: The function (black line) must be below a line that connects any two points in the domain (blue line).

The inequality says that the function must always be below a line joining any two points in the domain. Stated informally, a convex function looks something like a bowl.

Unfortunately, even these strict requirements are not enough. In general, we cannot identify a given problem as convex or take advantage of its structure to solve it efficiently and must therefore treat it as a general nonlinear problem. There are two approaches to taking advantage of convexity. The first one is to directly formulate the problem in a known convex form, such as a linear program or a quadratic program (discussed later in this chapter). The second option is to use disciplined convex optimization, a specific set of rules and mathematical functions that we can use to build up a convex problem. By following these rules, we can automatically translate the problem into an efficiently solvable form.

Although both of these approaches are straightforward to apply, they also expose the main weakness of these methods: we need to express the objective and inequality constraints using only these elementary functions and operations. In most cases, this requirement means that the model must be simplified.

Often, a problem is not directly expressed in a convex form, and a combination of experience and creativity is needed to reformulate the problem in an equivalent manner that is convex.

Simplifying models usually results in a fidelity reduction. This is less problematic for optimization problems intended to be solved repeatedly, such as in optimal control and machine learning, which are domains in which convex optimization is heavily used. In these cases, simplification by local linearization, for example, is less problematic because the linearization can be updated in the next time step. However, this fidelity reduction is problematic for design applications.

In design scenarios, the optimization is performed once, and the design cannot continue to be updated after it is created. For this reason, convex optimization is less frequently used for design applications, except for some limited uses in geometric programming, a topic discussed in more detail in Section 11.6.

This chapter just introduces convex optimization and is not a replacement for more comprehensive textbooks on the topic.[2] We focus on understanding what convex optimization is useful for and describing the most widely used forms.

The known categories of convex optimization problems include linear programming, quadratic programming, second-order cone programming, semidefinite programming, cone programming, and graph form programming. Each of these categories is a subset of the next (Figure 11.2).[3]

We focus on the first three because they are the most widely used, including in other chapters in this book. The latter three forms are less frequently formulated directly. Instead, users apply elementary functions and operations and the rules specified by disciplined convex programming, and a software tool transforms the problem into a suitable conic form that can be solved. Section 11.5 describes this procedure.

Relationship between various convex optimization problems.

Figure 11.2:Relationship between various convex optimization problems.

After covering the three main categories of convex optimization problems, we discuss geometric programming. Geometric programming problems are not convex, but with a change of variables, they can be transformed into an equivalent convex form, thus extending the types of problems that can be solved with convex optimization.

11.2 Linear Programming

A linear program (LP) is an optimization problem with a linear objective and linear constraints and can be written as

minimizexfxsubject toAx+b=0Cx+d0, \begin{aligned} \underset{x}{\text{minimize}} &\quad f^\intercal x\\ \text{subject to} &\quad A x + b = 0\\ &\quad Cx + d \le 0 \, , \end{aligned}

where ff, bb, and dd are vectors and AA and CC are matrices. All LPs are convex.

LPs frequently occur with allocation or assignment problems, such as choosing an optimal portfolio of stocks, deciding what mix of products to build, deciding what tasks should be assigned to each worker, or determining which goods to ship to which locations. These types of problems frequently occur in domains such as operations research, finance, supply chain management, and transportation.[4]A common consideration with LPs is whether or not the variables should be discrete. In Example 11.1, xix_i is a continuous variable, and purchasing fractional amounts of food may or may not be possible, depending on the type of food. Suppose we were performing an optimal stock allocation. In that case, we can purchase fractional amounts of stock. However, if we were optimizing how much of each product to manufacture, it might not be feasible to build 32.4 products. In these cases, we need to restrict the variables to be integers using integer constraints. These types of problems require discrete optimization algorithms, which are covered in Chapter 8. Specifically, we discussed a mixed-integer LP in (Section 8.3).

11.3 Quadratic Programming

A quadratic program (QP) has a quadratic objective and linear constraints. Quadratic programming was introduced in Section 5.5 in the context of sequential quadratic programming. A general QP can be expressed as follows:

minimizex12xQx+fxsubject toAx+b=0Cx+d0. \begin{aligned} \underset{x}{\text{minimize}} &\quad \frac{1}{2}x^\intercal Q x + f^\intercal x\\ \text{subject to} &\quad Ax + b = 0\\ &\quad C x + d \le 0 \,. \end{aligned}

A QP is only convex if the matrix QQ is positive semidefinite. If Q=0Q = 0, a QP reduces to an LP.

One of the most common QP examples is least squares regression, which was discussed previously in Section 10.3.1 and is used in many applications such as data fitting.

The linear least-squares problem has an analytic solution if AA has full rank, so the machinery of a QP is not necessary. However, we can add constraints in QP form to solve constrained least squares problems, which do not have analytic solutions in general.

11.4 Second-Order Cone Programming

A second-order cone program (SOCP) has a linear objective and a second-order cone constraint:

minimizexfxsubject toAix+bi2cix+diGx+h=0.\begin{aligned} \underset{x}{\text{minimize}} &\quad f^\intercal x\\ \text{subject to} &\quad \|A_i x + b_i\|_2 \le c_i^\intercal x + d_i\\ &\quad Gx + h = 0 \, . \end{aligned}

If Ai=0A_i = 0, then this form reduces to an LP.

One useful subset of SOCP is a quadratically constrained quadratic program (QCQP). A QCQP is the same as a QP but has quadratic inequality constraints instead of linear ones, that is,

minimizex12xQx+fxsubject toAx+b=012xRix+cix+di0 for i=1,,m,\begin{aligned} \underset{x}{\text{minimize}} &\quad \frac{1}{2}x^\intercal Q x + f^\intercal x\\ \text{subject to} &\quad A x + b = 0 \\ &\quad \frac{1}{2}x^\intercal R_i x + c_i^\intercal x + d_i \le 0 \text{ for } i = 1, \ldots, m \, , \end{aligned}

where QQ and RR must be positive semidefinite for the QCQP to be convex. A QCQP reduces to a QP if R=0R = 0. We formulated QCQPs when solving trust-region problems in Section 4.5. However, for trust-region problems, only an approximate solution method is typically used.

Every QCQP can be expressed as an SOCP (although not vice versa). The QCQP in Equation 11.5 can be written in the equivalent form,

minimizex,ββsubject toFx+g2βAx+b=0Gix+hi20.\begin{aligned} \underset{x,\beta}{\text{minimize}} &\quad \beta \\ \text{subject to} &\quad \|Fx + g\|_2 \le \beta\\ &\quad A x + b = 0\\ &\quad \|G_i x + h_i\|_2 \le 0 \, . \end{aligned}

If we square both sides of the first and last constraints, this formulation is exactly equivalent to the QCQP where Q=2FFQ = 2 F^\intercal F, f=2Fgf = 2 F^\intercal g, Ri=2GiGiR_i = 2 G_i^\intercal G_i, ci=2Gihic_i = 2 G_i^\intercal h_i, and di=hihid_i = h_i^\intercal h_i. The matrices FF and GiG_i are the square roots of the matrices QQ and RiR_i, respectively (divided by 2), and would be computed from a factorization.

11.5 Disciplined Convex Optimization

Disciplined convex optimization builds convex problems using a specific set of rules and mathematical functions. By following this set of rules, the problem can be translated automatically into a conic form that we can efficiently solve using convex optimization algorithms.5 Table 1 shows several examples of convex functions that can be used to build convex problems. Notice that not all functions are continuously differentiable because this is not a requirement of convexity.

Table 1:Examples of convex functions.

FunctionsExamples
eaxe^{ax}
Examples of convex functions.
{xa if 0a1xa otherwise\begin{cases} -x^a &\text{ if } 0 \le a \le 1 \\ x^a &\text{ otherwise} \\ \end{cases}
Examples of convex functions.
log(x)-\log(x)
Examples of convex functions.
x1,x2,\|x\|_1, \|x\|_2, \ldots
Examples of convex functions.
max(x1,x2,,xn)\max(x_1, x_2, \ldots, x_n)
Examples of convex functions.
ln(ex1+ex2++exn)\ln \left(e^{x_1} + e^{x_2} + \ldots + e^{x_n} \right)
Examples of convex functions.

A disciplined convex problem can be formulated using any of these functions for the objective and inequality constraints. We can also use various operations that preserve convexity to build up more complex functions. Some of the more common operations are as follows:

  • Multiplying a convex function by a positive constant

  • Adding convex functions

  • Composing a convex function with an affine function (i.e., if f(x)f(x) is convex, then f(Ax+b)f(Ax + b) is also convex)

  • Taking the maximum of two convex functions

Although these functions and operations greatly expand the types of convex problems that we can solve beyond LPs and QPs, they are still restrictive within the broader scope of nonlinear optimization. Still, for objectives and constraints that require only simple mathematical expressions, there is the possibility that the problem can be posed as a disciplined convex optimization problem.

The original expression of a problem is often not convex but can be made convex through a transformation to a mathematically equivalent problem. These transformation techniques include implementing a change of variables, adding slack variables, or expressing the objective in a different form. Successfully recognizing and applying these techniques is a skill requiring experience.

11.6 Geometric Programming

A geometric program (GP) is not convex but can be transformed into an equivalent convex problem. GPs are formulated using monomials and posynomials. A monomial is a function of the following form:

f(x)=cx1a1x2a2xmam,f(x) = c x_1^{a_1} x_2^{a_2} \cdots x_m^{a_m},

where c>0c > 0, and all xi>0x_i > 0. A posynomial is a sum of monomials:

f(x)=j=1ncjx1a1jx2a2jxmamj,f(x) = \sum_{j=1}^n c_j x_1^{a_{1j}} x_2^{a_{2j}} \cdots x_m^{a_{mj}},

where all cj>0c_j > 0.

A GP in standard form is written as follows:

minimizexf0(x)subject tofi(x)1hi(x)=1, \begin{aligned} \underset{x}{\text{minimize}} &\quad f_0(x)\\ \text{subject to} &\quad f_i(x) \le 1 \\ &\quad h_i(x) = 1 \, , \end{aligned}

where fif_i are posynomials, and hih_i are monomials. This problem does not fit into any of the convex optimization problems defined in the previous section, and it is not convex. This formulation is useful because we can convert it into an equivalent convex optimization problem.

First, we take the logarithm of the objective and of both sides of the constraints:

minimizexlnf0(x)subject tolnfi(x)0lnhi(x)=0.\begin{aligned} \underset{x}{\text{minimize}} &\quad \ln f_0(x)\\ \text{subject to} &\quad \ln f_i(x) \le 0 \\ &\quad \ln h_i(x) = 0 \, . \end{aligned}

Let us examine the equality constraints further. Recall that hih_i is a monomial, so writing one of the constraints explicitly results in the following form:

ln(cx1a1x2a2xmam)=0.\ln \left(c x_1^{a_1} x_2^{a_2} \ldots x_m^{a_m} \right) = 0 \, .

Using the properties of logarithms, this can be expanded to the equivalent expression:

lnc+a1lnx1+a2lnx2++amlnxm=0.\ln c + a_1 \ln x_1 + a_2 \ln x_2 + \ldots + a_m \ln x_m = 0 \, .

Introducing the change of variables yi=lnxiy_i = \ln x_i results in the following equality constraint:

a1y1+a2y2++amym+lnc=0,ay+lnc=0,\begin{aligned} a_1 y_1 + a_2 y_2 + \ldots + a_m y_m + \ln c &= 0 \, , a^\intercal y + \ln c &= 0 \, , \end{aligned}

which is an affine constraint in yy.

The objective and inequality constraints are more complex because they are posynomials. The expression lnfi\ln f_i written in terms of a posynomial results in the following:

ln(j=1ncjx1a1jx2a2jxmamj).\ln \left( \sum_{j=1}^n c_j x_1^{a_{1j}} x_2^{a_{2j}} \ldots x_m^{a_{mj}} \right) .

Because this is a sum of products, we cannot use the logarithm to expand each term. However, we still introduce the same change of variables (expressed as xi=eyix_i = e^{y_i}):

lnfi=ln(j=1ncjexp(y1a1j)exp(y2a2j)exp(ymamj))=ln(j=1ncjexp(y1a1j+y2a2j+ymamj))=ln(j=1nexp(ajy+bj)),wherebj=lncj.\begin{aligned} \ln f_i &= \ln \left( \sum_{j=1}^n c_j \exp\left(y_{1} a_{1j}\right) \exp\left(y_2 a_{2j}\right) \ldots \exp\left(y_m a_{mj}\right) \right)\\ &= \ln \left( \sum_{j=1}^n c_j \exp\left(y_{1} a_{1j} + y_2 a_{2j} + y_m a_{mj}\right) \right)\\ &= \ln \left( \sum_{j=1}^n \exp\left(a_j^\intercal y + b_j\right) \right), \quad \text{where} \quad b_j = \ln c_j \, . \end{aligned}

This is a log-sum-exp of an affine function. As mentioned in the previous section, log-sum-exp is convex, and a convex function composed of an affine function is a convex function. Thus, the objective and inequality constraints are convex in yy. Because the equality constraints are also affine, we have a convex optimization problem obtained through a change of variables.

Unfortunately, many other functions do not fit this form (e.g., design variables that can be positive or negative, terms with negative coefficients, trigonometric functions, logarithms, and exponents). GP modelers use various techniques to extend usability, including using a Taylor series across a restricted domain, fitting functions to posynomials,7 and rearranging expressions to other equivalent forms, including implicit relationships.

Creativity and some sacrifice in fidelity are usually needed to create a corresponding GP from a general nonlinear programming problem. However, if the sacrifice in fidelity is not too great, there is a significant advantage because the formulation comes with all the benefits of convexity—guaranteed convergence, global optimality, efficiency, no parameter tuning, and limited scaling issues.

One extension to geometric programming is signomial programming. A signomial program has the same form, except that the coefficients cic_i can be positive or negative (the design variables xix_i must still be strictly positive). Unfortunately, this problem cannot be transformed into a convex one, so a global optimum is no longer guaranteed. Still, a signomial program can usually be solved using a sequence of geometric programs, so it is much more efficient than solving the general nonlinear problem. Signomial programs have been used to extend the range of design problems that can be solved using geometric programming techniques.89

11.7 Summary

Convex optimization problems are highly desirable because they do not require parameter tuning, starting points, or derivatives and converge reliably and rapidly to the global optimum.

The trade-off is that the form of the objective and constraints must meet stringent requirements. These requirements often necessitate simplifying the physics models and implementing clever reformulations. The reduction in model fidelity is acceptable in domains where optimizations are performed repeatedly in time (e.g., controls, machine learning) or for high-level conceptual design studies. Linear programming and quadratic programming, in particular, are widely used across many domains and form the basis of many of the gradient-based algorithms used to solve general nonconvex problems.

Problems

Footnotes
  1. An affine function consists of a linear transformation and a translation. Informally, this type of function is often referred to as linear (including in this book), but strictly, these are distinct concepts. For example: AxAx is a linear function in xx, whereas Ax+bAx + b is an affine function in xx.

  2. 2 is the most cited textbook on convex optimization.

  3. Several references exist with examples for those categories that we do not discuss in detail.3^{\text{--}}4

  4. See Section 2.3 for a brief historical background on the development of LP and its applications.

  5. This is an example of a multiobjective function, which is explained in Chapter 9.

  6. In the machine learning community, this optimization problem is known as a support vector machine. This problem is an example of supervised learning because classification labels were provided. Classification can be done without labels but requires a different approach under the umbrella of unsupervised learning.

  7. Based on an example from 6

References
  1. Diamond, S., & Boyd, S. (2015). Convex optimization with abstract linear operators. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Institute of Electrical. 10.1109/iccv.2015.84
  2. Boyd, S. P., & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.
  3. Lobo, M. S., Vandenberghe, L., Boyd, S., & Lebret, H. (1998). Applications of second-order cone programming. Linear Algebra and Its Applications, 284(1–3), 193–228. 10.1016/s0024-3795(98)10032-0
  4. Parikh, N., & Boyd, S. (2013). Block splitting for distributed optimization. Mathematical Programming Computation, 6(1), 77–102. 10.1007/s12532-013-0061-8
  5. Grant, M., Boyd, S., & Ye, Y. (2006). Disciplined convex programming. In L. Liberti & N. Maculan (Eds.), Global Optimization—From Theory to Implementation (pp. 155–210). Springer. 10.1007/0-387-30528-9_7
  6. Boyd, S., Kim, S.-J., Vandenberghe, L., & Hassibi, A. (2007). A tutorial on geometric programming. Optimization and Engineering, 8(1), 67–127. 10.1007/s11081-007-9001-7
  7. Hoburg, W., Kirschen, P., & Abbeel, P. (2016). Data fitting with geometric-programming-compatible softmax functions. Optimization and Engineering, 17(4), 897–918. 10.1007/s11081-016-9332-3
  8. Kirschen, P. G., York, M. A., Ozturk, B., & Hoburg, W. W. (2018). Application of signomial programming to aircraft design. Journal of Aircraft, 55(3), 965–987. 10.2514/1.c034378
  9. York, M. A., Hoburg, W. W., & Drela, M. (2018). Turbofan engine sizing and tradeoff analysis via signomial programming. Journal of Aircraft, 55(3), 988–1003. 10.2514/1.c034463