Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Engineering Design Optimization. Cambridge University Press, Jan 2022

Appendix: Linear Solvers

In Section 3.6, we present an overview of solution methods for discretized systems of equations, followed by an introduction to Newton-based methods for solving nonlinear equations. Here, we review the solvers for linear systems required to solve for each step of Newton-based methods.[1]

B.1 Systems of Linear Equations

If the equations are linear, they can be written as

Au=b, Au = b \, ,

where AA is a square (n×nn \times n) matrix, and bb is a vector, and neither of these depends on uu. If this system of equations has a unique solution, then the system and the matrix AA are nonsingular. This is equivalent to saying that AA has an inverse, A1A^{-1}. If A1A^{-1} does not exist, the matrix and the system are singular.

A matrix AA is singular if its rows (or equivalently, its columns) are linearly dependent (i.e., if one of the rows can be written as a linear combination of the others).

If the matrix AA is nonsingular and we know its inverse A1A^{-1}, the solution of the linear system (Equation B.1) can be written as x=A1bx=A^{-1}b. However, the numerical methods described here do not form A1A^{-1}. The main reason for this is that forming A1A^{-1} is expensive: the computational cost is proportional to n3n^3.

For practical problems with large nn, it is typical for the matrix AA to be sparse, that is, for most of its entries to be zeros. An entry AijA_{ij} represents the interaction between variables ii and jj. When solving differential equations on a discretized grid, for example, a given variable ii only interacts with variables jj in its vicinity in the grid. These interactions correspond to nonzero entries, whereas all other entries are zero. Sparse linear systems tend to have a number of nonzero terms that is proportional to nn. This is in contrast with a dense matrix, which has n2n^2 nonzero entries. Solvers should take advantage of sparsity to remain efficient for large nn.

We rewrite the linear system (Equation B.1) as a set of residuals,

r(u)=Aub=0. r(u) = Au - b = 0 .

To solve this system of equations, we can use either a direct method or an iterative method. We explain these briefly in the rest of this appendix, but we do not cover more advanced techniques that take advantage of sparsity.

B.2 Conditioning

The distinction between singular and nonsingular systems blurs once we have to deal with finite-precision arithmetic. Systems that are singular in the exact sense are ill-conditioned when a small change in the data (entries of AA or bb) results in a large change in the solution. This large sensitivity of the solution to the problem parameters is an issue because the parameters themselves have finite precision. Then, any imprecision in these parameters can lead to significant errors in the solution, even if no errors are introduced in the numerical solution of the linear system.

The conditioning of a linear system can be quantified by the condition number of the matrix, which is defined as the scalar

cond(A)=AA1,\mathrm{cond}(A) = \|A\| \cdot \|A^{-1}\|,

where any matrix norm can be used.

Because AA1AA1\|A\| \cdot \|A^{-1}\| \ge \|A A^{-1}\|, we have

cond(A)1\mathrm{cond}(A) \ge 1

for all matrices. A matrix AA is well-conditioned if cond(A)\mathrm{cond}(A) is small and ill-conditioned if cond(A)\mathrm{cond}(A) is large.

B.3 Direct Methods

The standard way to solve linear systems of equations with a computer is Gaussian elimination, which in matrix form is equivalent to LU factorization. This is a factorization (or decomposition) of AA, such as A=LUA = L U, where LL is a unit lower triangular matrix, and UU is an upper triangular matrix, as shown in Figure B.1.

LU factorization.

Figure B.1:LULU factorization.

The factorization transforms the matrix AA into an upper triangular matrix UU by introducing zeros below the diagonal, one column at a time, starting with the first one and progressing from left to right. This is done by subtracting multiples of each row from subsequent rows. These operations can be expressed as a sequence of multiplications with lower triangular matrices LiL_i,

Ln1L2L1L1A=U.\underbrace{L_{n-1} \cdots L_2 L_1}_{L^{-1}} A = U .

After completing these operations, we have UU, and we can find LL by computing L=L11L21Ln11L=L_1^{-1} L_2^{-1} \cdots L_{n-1}^{-1}.

Once we have this factorization, we have LUu=bLUu=b. Setting UuUu to yy, we can solve Ly=bLy=b for yy by forward substitution. Now we have Uu=yUu=y, which we can solve by back substitution for uu.

This process is not stable in general because roundoff errors are magnified in the backward substitution when diagonal elements of AA have a small magnitude. This issue is resolved by partial pivoting, which interchanges rows to obtain more favorable diagonal elements.

Cholesky factorization is an LU factorization specialized for the case where the matrix AA is symmetric and positive definite. In this case, pivoting is unnecessary because the Gaussian elimination is always stable for symmetric positive-definite matrices. The factorization can be written as

A=LDL,A = L D L^\intercal,

where D=diag[U11,,Unn]D = \text{diag} [U_{11}, \ldots, U_{nn}].

This can be expressed as the matrix product

A=GG,A = G G^\intercal,

where G=LD1/2G = L D^{1/2}.

B.4 Iterative Methods

Although direct methods are usually more efficient and robust, iterative methods have several advantages:

  • Iterative methods make it possible to trade between computational cost and precision because they can be stopped at any point and still yield an approximation of uu. On the other hand, direct methods only get the solution at the end of the process with the final precision.

  • Iterative methods have the advantage when a good guess for uu exists. This is often the case in optimization, where the uu from the previous optimization iteration can be used as the guess for the new evaluations (called a warm start).

  • Iterative methods do not require forming and manipulating the matrix AA, which can be computationally costly in terms of both time and memory. Instead, iterative methods require the computation of the residuals r(u)=Aubr(u)=Au-b and, in the case of Krylov subspace methods, products of AA with a given vector. Therefore, iterative methods can be more efficient than direct methods for cases where AA is large and sparse. All that is needed is an efficient process to get the product of AA with a given vector, as shown in Figure B.2.

Iterative methods just require a process (which can be a black box) to compute products of A with an arbitrary vector v.

Figure B.2:Iterative methods just require a process (which can be a black box) to compute products of AA with an arbitrary vector vv.

Iterative methods are divided into stationary methods (also known as fixed-point iteration methods) and Krylov subspace methods.

B.4.1 Jacobi, Gauss–Seidel, and SOR

Fixed-point methods generate a sequence of iterates u1,,uk,u_{1}, \ldots, u_{k}, \ldots using a function

uk+1=G(uk),k=0,1,u_{k+1} = G \left( u_{k} \right) , \quad k = 0, 1, \ldots

starting from an initial guess u0u_0. The function G(u)G(u) is devised such that the iterates converge to the solution uu^*, which satisfies r(u)=0r(u^*)=0.

Many stationary methods can be derived by splitting the matrix such that A=MNA=M-N. Then, Au=bAu=b leads to Mu=Nu+bMu = Nu + b, and substituting this into the linear system yields

u=M1(Nu+b).u = M^{-1} ( Nu + b ) .

Because Nu=MuAuNu = Mu - Au, substituting this into the previous equation results in the iteration

uk+1=uk+M1(bAuk).u_{k+1} = u_{k} + M^{-1} \left(b - A u_{k} \right).

Defining the residual at iteration kk as

r(uk)=bAuk,r\left(u_{k}\right) = b - A u_{k},

we can write

uk+1=uk+M1r(uk). u_{k+1} = u_{k} + M^{-1} r\left(u_{k}\right).

The splitting matrix MM is fixed and constructed so that it is easy to invert. The closer M1M^{-1} is to the inverse of AA, the better the iterations work.

We now introduce three stationary methods corresponding to three different splitting matrices.

The Jacobi method consists of setting MM to be a diagonal matrix DD, where the diagonal entries are those of AA. Then,

uk+1=uk+D1r(uk). u_{k+1} = u_{k} + D^{-1} r\left(u_{k}\right) .

In component form, this can be written as

uik+1=1Aii[bij=1,jinuAijujk],i=1,,nu.{u_i}_{k+1} = \frac{1}{A_{ii}} \left[ b_i - \sum_{j=1, j \neq i}^{n_u} A_{ij} {u_j}_{k} \right], \quad i=1,\ldots,n_u.

Using this method, each component in uk+1u_{k+1} is independent of each other at a given iteration; they only depend on the previous iteration values, uku_k, and can therefore be done in parallel.

The Gauss–Seidel method is obtained by setting MM to be the lower triangular portion of AA and can be written as

uk+1=uk+E1r(uk), u_{k+1} = u_k + E^{-1} r(u_k) ,

where EE is the lower triangular matrix. Because of the triangular matrix structure, each component in uk+1u_{k+1} is dependent on the previous elements in the vector, but the iteration can be performed in a single forward sweep. Writing this in component form yields

uik+1=1Aii[bij<iAijujk+1j>iAijujk],i=1,,nu.{u_i}_{k+1} = \frac{1}{A_{ii}} \left[ b_i - \sum_{j < i} A_{ij} {u_j}_{k+1} - \sum_{j > i} A_{ij} {u_j}_{k} \right], \quad i=1,\ldots,n_u .

Unlike the Jacobi iterations, a Gauss–Seidel iteration cannot be performed in parallel because of the terms where j<ij < i, which require the latest values. Instead, the states must be updated sequentially. However, the advantage of Gauss–Seidel is that it generally converges faster than Jacobi iterations.

The successive over-relaxation (SOR) method uses an update that is a weighted average of the Gauss–Seidel update and the previous iteration,

uk+1=uk+ω((1ω)D+ωE)1r(uk), u_{k+1} = u_k + \omega \left( \left( 1-\omega \right) D + \omega E \right)^{-1} r(u_k) ,

where ω\omega, the relaxation factor, is a scalar between 1 and 2. Setting ω=1\omega=1 yields the Gauss–Seidel method. SOR in component form is as follows:

uik+1=(1ω)uik+ωAii[bij<iAijujk+1j>iAijujk],i=1,,nu.{u_i}_{k+1} = (1-\omega) {u_i}_{k} + \frac{\omega}{A_{ii}} \left[ b_i - \sum_{j < i} A_{ij} {u_j}_{k+1} - \sum_{j > i} A_{ij} {u_j}_{k} \right], \quad i=1,\ldots,n_u .

With the correct value of ω\omega, SOR converges faster than Gauss–Seidel.

B.4.2 Conjugate Gradient Method

The conjugate gradient method applies to linear systems where AA is symmetric and positive definite. This method can be adapted to solve nonlinear minimization problems (see Section 4.4.2).

We want to solve a linear system (Equation B.2) iteratively. This means that at a given iteration uku_{k}, the residual is not necessarily zero and can be written as

rk=Aukb.r_{k} = A u_{k} - b \, .

Solving this linear system is equivalent to minimizing the quadratic function

f(u)=12uAubu.f(u) = \frac{1}{2} u^\intercal A u - b^\intercal u \, .

This is because the gradient of this function is

f(u)=Aub.\nabla f(u) = A u - b \, .

Thus, the gradient of the quadratic is the residual of the linear system,

rk=f(uk).r_{k} = \nabla f \left( u_{k} \right) \, .

We can express the path from any starting point to a solution uu^* as a sequence of nn steps with directions pkp_{k} and length αk\alpha_{k}:

u=k=0n1αkpk.u^* = \sum_{k=0}^{n-1} \alpha_{k} p_{k} \, .

Substituting this into the quadratic (Equation B.20), we get

f(u)=f(k=0n1αkpk)=12(k=0n1αkpk)A(k=0n1αkpk)b(k=0n1αkpk)=12k=0n1j=0n1αkαjpkApjk=0n1αkbpk.\begin{aligned} f(u^*) &= f \left(\sum_{k=0}^{n-1} \alpha_{k} p_{k}\right) \\ & = \frac{1}{2} \left(\sum_{k=0}^{n-1} \alpha_{k} p_{k}\right)^\intercal A \left(\sum_{k=0}^{n-1} \alpha_{k} p_{k}\right) - b^\intercal \left(\sum_{k=0}^{n-1} \alpha_{k} p_{k}\right) \\ & = \frac{1}{2} \sum_{k=0}^{n-1} \sum_{j=0}^{n-1} \alpha_{k} \alpha_{j} {p_{k}}^\intercal A p_{j} - \sum_{k=0}^{n-1} \alpha_{k} b^\intercal p_{k} \, . \end{aligned}

The conjugate gradient method uses a set of nn vectors pkp_{k} that are conjugate with respect to matrix AA. Such vectors have the following property:

pkApj=0,for allkj.{p_{k}}^\intercal A p_{j} = 0, \quad \text{for all} \quad k \neq j \, .

Using this conjugacy property, the double-sum term can be simplified to a single sum,

12k=0n1j=0n1αkαjpkApj=12k=0n1αk2pkApk.\frac{1}{2} \sum_{k=0}^{n-1} \sum_{j=0}^{n-1} \alpha_{k} \alpha_{j} {p_{k}}^\intercal A p_{j} = \frac{1}{2} \sum_{k=0}^{n-1} {\alpha_{k}}^2 {p_{k}}^\intercal A p_{k} .

Then, Equation B.24 simplifies to

f(u)=k=0n1(12αk2pkApkαkbpk).f(u^*) = \sum_{k=0}^{n-1} \left( \frac{1}{2} {\alpha_{k}}^2 {p_{k}}^\intercal A p_{k} - \alpha_{k} b^\intercal p_{k} \right) \, .

Because each term in this sum involves only one direction pkp_{k}, we have reduced the original problem to a series of one-dimensional quadratic functions that can be minimized one at a time. Each one-dimensional problem corresponds to minimizing the quadratic with respect to the step length αk\alpha_{k}. Differentiating each term and setting it to zero yields the following:

αkpkApkbpk=0αk=bpkpkApk.\alpha_{k} {p_{k}}^\intercal A p_{k} - b^\intercal p_{k} = 0 \Rightarrow \alpha_{k} = \frac{b^\intercal p_{k}}{{p_{k}}^\intercal A p_{k}} \, .

Now, the question is: How do we find this set of directions? There are many sets of directions that satisfy conjugacy. For example, the eigenvectors of AA satisfy Equation B.25.[2]However, it is costly to compute the eigenvectors of a matrix. We want a more convenient way to compute a sequence of conjugate vectors.

The conjugate gradient method sets the first direction to the steepest-descent direction of the quadratic at the first point. Because the gradient of the function is the residual of the linear system (Equation B.22), this first direction is obtained from the residual at the starting point,

p1=r(u0).p_{1} = - r \left(u_{0} \right) \, .

Each subsequent direction is set to a new conjugate direction using the update

pk+1=rk+1+βkpk,p_{k+1} = - r_{k+1} + \beta_{k} p_{k} \, ,

where β\beta is set such that pk+1p_{k+1} and pkp_{k} are conjugate with respect to AA.

We can find the expression for β\beta by starting with the conjugacy property that we want to achieve,

pk+1Apk=0.{p_{k+1}}^\intercal A p_{k} =0 \, .

Substituting the new direction pk+1p_{k+1} with the update (Equation B.30), we get

(rk+1+βkpk)Apk=0.\left( - r_{k+1} + \beta_{k} p_{k} \right)^\intercal A p_{k} = 0 \, .

Expanding the terms and solving for β\beta, we get

βk=rk+1ApkpkApk.\beta_{k} = \frac{{r_{k+1}}^\intercal A p_{k}}{{p_{k}}^\intercal A p_{k}} \, .

For each search direction pkp_{k}, we can perform an exact line search by minimizing the quadratic analytically. The directional derivative of the quadratic at a point xx along the search direction pp is as follows:

f(x+αp)α=α(12(x+αp)A(x+αp)b(x+αp))=pA(x+αp)pb=p(Axb)+αpAp=pr(x)+αpAp.\begin{aligned} \frac{\partial f(x + \alpha p)}{\partial \alpha} &= \frac{\partial}{\partial \alpha} \left( \frac{1}{2} (x+\alpha p)^\intercal A (x+\alpha p) - b^\intercal (x+\alpha p) \right) \\ & = p^\intercal A (x+\alpha p) - p^\intercal b \\ &= p^\intercal (Ax-b) + \alpha p^\intercal A p \\ &= p^\intercal r(x) + \alpha p^\intercal A p \, . \end{aligned}

By setting this derivative to zero, we can get the step size that minimizes the quadratic along the line to be

αk=rkpkpkApk.\alpha_{k} = - \frac{{r_{k}}^\intercal p_{k}}{{p_{k}}^\intercal A p_{k}} \, .

The numerator can be written as a function of the residual alone. Replacing pkp_{k} with the conjugate direction update (Equation B.30), we get

rkpk=rk(rk+βkpk1)=rkrk+βkrkpk1=rkrk.\begin{aligned} {r_{k}}^\intercal p_{k} &= {r_{k}}^\intercal \left( -{r_{k}}^\intercal + \beta_{k} p_{k-1} \right) \\ & = - {r_{k}}^\intercal {r_{k}}^\intercal + \beta_{k} {r_{k}}^\intercal p_{k-1} \\ & = - {r_{k}}^\intercal r_{k}. \end{aligned}

Here we have used the property of the conjugate directions stating that the residual vector is orthogonal to all previous conjugate directions, so that ripi{r_{i}}^\intercal p_{i} for i=0,1,,k1i=0,1,\ldots,k-1.[3] Thus, we can now write,

αk=rkrkpkApk.\alpha_{k} = - \frac{{r_{k}}^\intercal r_{k}}{p_{k}^\intercal A p_{k}} \, .

The numerator of the expression for β\beta (Equation B.33) can also be written in terms of the residual alone. Using the expression for the residual (Equation B.19) and taking the difference between two subsequent residuals, we get

rk+1rk=(Auk+1b)(Aukb)=A(uk+1uk)=A(uk+αkpkuk)=αkApk.\begin{aligned} r_{k+1} - r_{k} &= \left(A u_{k+1} - b \right) - \left( A u_{k} -b \right) = A \left( u_{k+1} - u_{k} \right) \\ &= A \left( u_{k} + \alpha_{k} p_{k} - u_{k} \right) \\ &= \alpha_{k} A p_{k} \, . \end{aligned}

Using this result in the numerator of β\beta in Equation B.33, we can write

rk+1Apk=1αkrk+1(rk+1rk)=1αk(rk+1rk+1rk+1rk)\begin{aligned} {r_{k+1}}^\intercal A p_{k} &= \frac{1}{\alpha_{k}} {r_{k+1}}^\intercal \left( r_{k+1} - r_{k} \right) \\ &= \frac{1}{\alpha_{k}} \left( {r_{k+1}}^\intercal r_{k+1} - {r_{k+1}}^\intercal r_{k} \right) \\ \end{aligned}
rk+1Apk=1αk(rk+1rk+1),\begin{aligned} {r_{k+1}}^\intercal A p_{k} &= \frac{1}{\alpha_{k}} \left( {r_{k+1}}^\intercal r_{k+1} \right) \, , \end{aligned}

where we have used the property that the residual at any conjugate residual iteration is orthogonal to the residuals at all previous iterations, so rk+1rk=0{r_{k+1}}^\intercal r_{k} = 0.[4]

Now, using this new numerator and using Equation B.37 to write the denominator as a function of the previous residual, we obtain

βk=rkrkrk1rk1.\beta_{k} = \frac{{r_{k}}^\intercal {r_{k}}^{}}{{r_{k-1}}^\intercal {r_{k-1}}^{}} \, .

We use this result in the nonlinear conjugate gradient method for function minimization in Section 4.4.2.

The linear conjugate gradient steps are listed in Algorithm 2. The advantage of this method relative to the direct method is that AA does not need to be stored or given explicitly. Instead, we only need to provide a function that computes matrix-vector products with AA. These products are required to compute residuals (r=Aubr = Au-b) and the ApAp term in the computation of α\alpha. Assuming a well-conditioned problem with good enough arithmetic precision, the algorithm should converge to the solution in nn steps.[5]

B.4.3 Krylov Subspace Methods

Krylov subspace methods are a more general class of iterative methods.[6] The conjugate gradient is a special case of a Krylov subspace method that applies only to symmetric positive-definite matrices. However, more general Krylov subspace methods, such as the generalized minimum residual (GMRES) method, do not have such restrictions on the matrix. Compared with stationary methods of Section B.4.1, Krylov methods have the advantage that they use information gathered throughout the iterations. Instead of using a fixed splitting matrix, Krylov methods effectively vary the splitting so that MM is changed at each iteration according to some criteria that use the information gathered so far. For this reason, Krylov methods are usually more efficient than stationary methods.

Like stationary iteration methods, Krylov methods do not require forming or storing AA. Instead, the iterations require only matrix-vector products of the form AvA v, where vv is some vector given by the Krylov algorithm. The matrix-vector product could be given by a black box, as shown in Figure B.2.

For the linear conjugate gradient method (Section B.4.2), we found conjugate directions and minimized the residual of the linear system in a sequence of these directions.

Krylov subspace methods minimize the residual in a space,

x0+Kk,x_0 + \mathcal{K}_k \, ,

where x0x_0 is the initial guess, and Kk\mathcal{K}_k is the Krylov subspace,

Kk(A;r0)span{r0,Ar0,A2r0,,Ak1r0}.\mathcal{K}_k(A; r_0) \equiv \text{span} \{ r_0, A r_0, A^2 r_0, \ldots, A^{k-1} r_0 \} \, .

In other words, a Krylov subspace method seeks a solution that is a linear combination of the vectors r0,Ar0,,Ak1r0r_0, A r_0, \ldots, A^{k-1} r_0. The definition of this particular sequence is convenient because these terms can be computed recursively with the matrix-vector product black box as r0,A(r0),A(A(r0)),A(A(A(r0))),r_0, A(r_0), A(A(r_0)), A(A(A(r_0))), \ldots .

Under certain conditions, it can be shown that the solution of the linear system of size nn is contained in the subspace Kn\mathcal{K}_n.

Krylov subspace methods (including the conjugate gradient method) converge much faster when using preconditioning. Instead of solving Ax=bAx=b, we solve

(M1A)x=M1b,(M^{-1} A ) x = M^{-1} b \, ,

where MM is the preconditioning matrix (or simply preconditioner). The matrix MM should be similar to AA and correspond to a linear system that is easier to solve.

The inverse, M1M^{-1}, should be available explicitly, and we do not need an explicit form for MM. The matrix resulting from the product M1AM^{-1} A should have a smaller condition number so that the new linear system is better conditioned.

In the extreme case where M=AM=A, that means we have computed the inverse of AA, and we can get xx explicitly. In another extreme, MM could be a diagonal matrix with the diagonal elements of AA, which would scale AA such that the diagonal elements are 1.[7]Krylov subspace solvers require three main components: (1) an orthogonal basis for the Krylov subspace, (2) an optimal property that determines the solution within the subspace, and (3) an effective preconditioner. Various Krylov subspace methods are possible, depending on the choice for each of these three components. One of the most popular Krylov subspace methods is the GMRES.4[8]“‘

Footnotes
  1. 1 provides a much more detailed explanation of linear solvers.

  2. Suppose we have two eigenvectors, vkv_{k} and vjv_{j}. Then vkAvj=vk(λjvj)=λjvkvj{v_{k}}^\intercal A v_{j} = {v_{k}}^\intercal (\lambda_{j} v_{j}) = \lambda_{j} {v_{k}}^\intercal v_{j}. This dot product is zero because the eigenvectors of a symmetric matrix are mutually orthogonal.

  3. For a proof of this property, see Theorem 5.2 in 2

  4. For a proof of this property, see Theorem 5.3 in 2

  5. Because the linear conjugate gradient method converges in nn steps, it was originally thought of as a direct method. It was initially dismissed in favor of more efficient direct methods, such as LU factorization. However, the conjugate gradient method was later reframed as an effective iterative method to obtain approximate solutions to large problems.

  6. This is just an overview of Krylov subspace methods; for more details, see 1 or 3

  7. The splitting matrix MM we used in the equation for the stationary methods (Section B.4.1) is effectively a preconditioner. An MM using the diagonal entries of AA corresponds to the Jacobi method (Equation B.13).

  8. GMRES and other Krylov subspace methods are available in most programming languages, including C/C++, Fortran, Julia, MATLAB, and Python.

References
  1. Trefethen, L. N., & Bau III, D. (1997). Numerical Linear Algebra. SIAM.
  2. Nocedal, J., & Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer. 10.1007/978-0-387-40065-5
  3. Saad, Y. (2003). Iterative Methods for Sparse Linear Systems (2nd ed.). SIAM. https://www.google.ca/books/edition/Iterative_Methods_for_Sparse_Linear_Syst/qtzmkzzqFmcC
  4. Saad, Y., & Schultz, M. H. (1986). GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 7(3), 856–869. 10.1137/0907058