Appendix: Linear Solvers - Engineering Design Optimization

In Section 3.6, we present an overview of solution methods for discretized systems of equations, followed by an introduction to Newton-based methods for solving nonlinear equations. Here, we review the solvers for linear systems required to solve for each step of Newton-based methods.^[1]

B.1 Systems of Linear Equations¶

If the equations are linear, they can be written as

Au = b \, ,

(B.1)

where $A$ is a square ( $n \times n$ ) matrix, and $b$ is a vector, and neither of these depends on $u$ . If this system of equations has a unique solution, then the system and the matrix $A$ are nonsingular. This is equivalent to saying that $A$ has an inverse, $A^{-1}$ . If $A^{-1}$ does not exist, the matrix and the system are singular.

A matrix $A$ is singular if its rows (or equivalently, its columns) are linearly dependent (i.e., if one of the rows can be written as a linear combination of the others).

If the matrix $A$ is nonsingular and we know its inverse $A^{-1}$ , the solution of the linear system (Equation B.1) can be written as $x=A^{-1}b$ . However, the numerical methods described here do not form $A^{-1}$ . The main reason for this is that forming $A^{-1}$ is expensive: the computational cost is proportional to $n^3$ .

For practical problems with large $n$ , it is typical for the matrix $A$ to be sparse, that is, for most of its entries to be zeros. An entry $A_{ij}$ represents the interaction between variables $i$ and $j$ . When solving differential equations on a discretized grid, for example, a given variable $i$ only interacts with variables $j$ in its vicinity in the grid. These interactions correspond to nonzero entries, whereas all other entries are zero. Sparse linear systems tend to have a number of nonzero terms that is proportional to $n$ . This is in contrast with a dense matrix, which has $n^2$ nonzero entries. Solvers should take advantage of sparsity to remain efficient for large $n$ .

We rewrite the linear system (Equation B.1) as a set of residuals,

r(u) = Au - b = 0 .

(B.2)

To solve this system of equations, we can use either a direct method or an iterative method. We explain these briefly in the rest of this appendix, but we do not cover more advanced techniques that take advantage of sparsity.

B.2 Conditioning¶

The distinction between singular and nonsingular systems blurs once we have to deal with finite-precision arithmetic. Systems that are singular in the exact sense are ill-conditioned when a small change in the data (entries of $A$ or $b$ ) results in a large change in the solution. This large sensitivity of the solution to the problem parameters is an issue because the parameters themselves have finite precision. Then, any imprecision in these parameters can lead to significant errors in the solution, even if no errors are introduced in the numerical solution of the linear system.

The conditioning of a linear system can be quantified by the condition number of the matrix, which is defined as the scalar

\mathrm{cond}(A) = \|A\| \cdot \|A^{-1}\|,

(B.3)

where any matrix norm can be used.

Because $\|A\| \cdot \|A^{-1}\| \ge \|A A^{-1}\|$ , we have

\mathrm{cond}(A) \ge 1

(B.4)

for all matrices. A matrix $A$ is well-conditioned if $\mathrm{cond}(A)$ is small and ill-conditioned if $\mathrm{cond}(A)$ is large.

B.3 Direct Methods¶

The standard way to solve linear systems of equations with a computer is Gaussian elimination, which in matrix form is equivalent to LU factorization. This is a factorization (or decomposition) of $A$ , such as $A = L U$ , where $L$ is a unit lower triangular matrix, and $U$ is an upper triangular matrix, as shown in Figure B.1.

LU factorization. — Figure B.1: $LU$ factorization.

The factorization transforms the matrix $A$ into an upper triangular matrix $U$ by introducing zeros below the diagonal, one column at a time, starting with the first one and progressing from left to right. This is done by subtracting multiples of each row from subsequent rows. These operations can be expressed as a sequence of multiplications with lower triangular matrices $L_i$ ,

\underbrace{L_{n-1} \cdots L_2 L_1}_{L^{-1}} A = U .

(B.5)

After completing these operations, we have $U$ , and we can find $L$ by computing $L=L_1^{-1} L_2^{-1} \cdots L_{n-1}^{-1}$ .

Once we have this factorization, we have $LUu=b$ . Setting $Uu$ to $y$ , we can solve $Ly=b$ for $y$ by forward substitution. Now we have $Uu=y$ , which we can solve by back substitution for $u$ .

Algorithm 1 (Solving

Au=b

LU

factorization)

Algorithm

Inputs:

$\quad$ $A$ : Nonsingular square matrix

$\quad$ $b$ : A vector

Outputs:

$\quad$ $u$ : Solution to $Au=b$

Perform forward substitution to solve $L y = b$ for $y$ :

y_1 = \frac{b_1}{L_{11}}, \qquad y_i = \frac{1}{L_{ii}} \left( b_i - \sum_{j=1}^{i-1} L_{ij} y_j \right) \quad \text{for} \quad i=2,\ldots,n

(B.6)

Perform backward substitution to solve the following $U u = y$ for $u$ :

u_n = \frac{y_n}{U_{nn}}, \qquad u_i = \frac{1}{U_{ii}} \left( y_i - \sum_{j=i+1}^{n} U_{ij} u_j \right) \quad \text{for} \quad i=n-1,\ldots,1

(B.7)

This process is not stable in general because roundoff errors are magnified in the backward substitution when diagonal elements of $A$ have a small magnitude. This issue is resolved by partial pivoting, which interchanges rows to obtain more favorable diagonal elements.

Cholesky factorization is an LU factorization specialized for the case where the matrix $A$ is symmetric and positive definite. In this case, pivoting is unnecessary because the Gaussian elimination is always stable for symmetric positive-definite matrices. The factorization can be written as

A = L D L^\intercal,

(B.8)

where $D = \text{diag} [U_{11}, \ldots, U_{nn}]$ .

This can be expressed as the matrix product

A = G G^\intercal,

(B.9)

where $G = L D^{1/2}$ .

B.4 Iterative Methods¶

Although direct methods are usually more efficient and robust, iterative methods have several advantages:

Iterative methods make it possible to trade between computational cost and precision because they can be stopped at any point and still yield an approximation of $u$ . On the other hand, direct methods only get the solution at the end of the process with the final precision.
Iterative methods have the advantage when a good guess for $u$ exists. This is often the case in optimization, where the $u$ from the previous optimization iteration can be used as the guess for the new evaluations (called a warm start).
Iterative methods do not require forming and manipulating the matrix $A$ , which can be computationally costly in terms of both time and memory. Instead, iterative methods require the computation of the residuals $r(u)=Au-b$ and, in the case of Krylov subspace methods, products of $A$ with a given vector. Therefore, iterative methods can be more efficient than direct methods for cases where $A$ is large and sparse. All that is needed is an efficient process to get the product of $A$ with a given vector, as shown in Figure B.2.

Iterative methods just require a process (which can be a black box) to compute products of A with an arbitrary vector v. — Figure B.2:Iterative methods just require a process (which can be a black box) to compute products of $A$ with an arbitrary vector $v$ .

Iterative methods are divided into stationary methods (also known as fixed-point iteration methods) and Krylov subspace methods.

B.4.1 Jacobi, Gauss–Seidel, and SOR¶

Fixed-point methods generate a sequence of iterates $u_{1}, \ldots, u_{k}, \ldots$ using a function

u_{k+1} = G \left( u_{k} \right) , \quad k = 0, 1, \ldots

(B.10)

starting from an initial guess $u_0$ . The function $G(u)$ is devised such that the iterates converge to the solution $u^*$ , which satisfies $r(u^*)=0$ .

Many stationary methods can be derived by splitting the matrix such that $A=M-N$ . Then, $Au=b$ leads to $Mu = Nu + b$ , and substituting this into the linear system yields

u = M^{-1} ( Nu + b ) .

(B.11)

Because $Nu = Mu - Au$ , substituting this into the previous equation results in the iteration

u_{k+1} = u_{k} + M^{-1} \left(b - A u_{k} \right).

(B.12)

Defining the residual at iteration $k$ as

r\left(u_{k}\right) = b - A u_{k},

(B.13)

we can write

u_{k+1} = u_{k} + M^{-1} r\left(u_{k}\right).

(B.14)

The splitting matrix $M$ is fixed and constructed so that it is easy to invert. The closer $M^{-1}$ is to the inverse of $A$ , the better the iterations work.

We now introduce three stationary methods corresponding to three different splitting matrices.

The Jacobi method consists of setting $M$ to be a diagonal matrix $D$ , where the diagonal entries are those of $A$ . Then,

u_{k+1} = u_{k} + D^{-1} r\left(u_{k}\right) .

(B.13)

In component form, this can be written as

{u_i}_{k+1} = \frac{1}{A_{ii}} \left[ b_i - \sum_{j=1, j \neq i}^{n_u} A_{ij} {u_j}_{k} \right], \quad i=1,\ldots,n_u.

(B.14)

Using this method, each component in $u_{k+1}$ is independent of each other at a given iteration; they only depend on the previous iteration values, $u_k$ , and can therefore be done in parallel.

The Gauss–Seidel method is obtained by setting $M$ to be the lower triangular portion of $A$ and can be written as

u_{k+1} = u_k + E^{-1} r(u_k) ,

(B.17)

where $E$ is the lower triangular matrix. Because of the triangular matrix structure, each component in $u_{k+1}$ is dependent on the previous elements in the vector, but the iteration can be performed in a single forward sweep. Writing this in component form yields

{u_i}_{k+1} = \frac{1}{A_{ii}} \left[ b_i - \sum_{j < i} A_{ij} {u_j}_{k+1} - \sum_{j > i} A_{ij} {u_j}_{k} \right], \quad i=1,\ldots,n_u .

(B.16)

Unlike the Jacobi iterations, a Gauss–Seidel iteration cannot be performed in parallel because of the terms where $j < i$ , which require the latest values. Instead, the states must be updated sequentially. However, the advantage of Gauss–Seidel is that it generally converges faster than Jacobi iterations.

The successive over-relaxation (SOR) method uses an update that is a weighted average of the Gauss–Seidel update and the previous iteration,

u_{k+1} = u_k + \omega \left( \left( 1-\omega \right) D + \omega E \right)^{-1} r(u_k) ,

(B.19)

where $\omega$ , the relaxation factor, is a scalar between 1 and 2. Setting $\omega=1$ yields the Gauss–Seidel method. SOR in component form is as follows:

{u_i}_{k+1} = (1-\omega) {u_i}_{k} + \frac{\omega}{A_{ii}} \left[ b_i - \sum_{j < i} A_{ij} {u_j}_{k+1} - \sum_{j > i} A_{ij} {u_j}_{k} \right], \quad i=1,\ldots,n_u .

(B.20)

With the correct value of $\omega$ , SOR converges faster than Gauss–Seidel.

Example 1 (Iterative methods applied to a simple linear system.)

Suppose we have the following linear system of two equations:

\begin{bmatrix} 2 & -1 \\ -2 & 3 \end{bmatrix} \begin{bmatrix} u_1 \\ {u_2} \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} .

This corresponds to the two lines shown in Figure B.3, where the solution is at their intersection.

Figure B.3:Jacobi, Gauss–Seidel, and SOR iterations.

Applying the Jacobian iteration (Equation B.14),

\begin{aligned} {u_1}_{k+1} &= \frac{1}{2} {u_2}_{k} \\ {u_2}_{k+1} &= \frac{1}{3} \left( 1 + 2 {u_1}_{k} \right) . \end{aligned}

Starting with the guess $u^{(0)}=(2,1)$ , we get the iterations shown in Figure B.3.

The Gauss–Seidel iteration (Equation B.16) is similar, where the only change is that the second equation uses the latest state from the first one:

\begin{aligned} {u_1}_{k+1} &= \frac{1}{2} {u_2}_{k} \\ {u_2}_{k+1} &= \frac{1}{3} \left( 1 + 2 {u_1}_{k+1} \right) . \end{aligned}

As expected, Gauss–Seidel converges faster than the Jacobi iteration, taking a more direct path. The SOR iteration is

\begin{aligned} {u_1}_{k+1} &= (1-\omega) {u_1}_{k} + \frac{\omega}{2} {u_2}_{k} \\ {u_2}_{k+1} &= (1-\omega) {u_2}_{k} + \frac{\omega}{3} \left( 1 + 2 {u_1}_{k} \right) . \end{aligned}

SOR converges even faster for the right values of $\omega$ . The result shown here is for $\omega=1.2$ .

B.4.2 Conjugate Gradient Method¶

The conjugate gradient method applies to linear systems where $A$ is symmetric and positive definite. This method can be adapted to solve nonlinear minimization problems (see Section 4.4.2).

We want to solve a linear system (Equation B.2) iteratively. This means that at a given iteration $u_{k}$ , the residual is not necessarily zero and can be written as

r_{k} = A u_{k} - b \, .

(B.19)

Solving this linear system is equivalent to minimizing the quadratic function

f(u) = \frac{1}{2} u^\intercal A u - b^\intercal u \, .

(B.20)

This is because the gradient of this function is

\nabla f(u) = A u - b \, .

(B.23)

Thus, the gradient of the quadratic is the residual of the linear system,

r_{k} = \nabla f \left( u_{k} \right) \, .

(B.22)

We can express the path from any starting point to a solution $u^*$ as a sequence of $n$ steps with directions $p_{k}$ and length $\alpha_{k}$ :

u^* = \sum_{k=0}^{n-1} \alpha_{k} p_{k} \, .

(B.25)

Substituting this into the quadratic (Equation B.20), we get

\begin{aligned} f(u^*) &= f \left(\sum_{k=0}^{n-1} \alpha_{k} p_{k}\right) \\ & = \frac{1}{2} \left(\sum_{k=0}^{n-1} \alpha_{k} p_{k}\right)^\intercal A \left(\sum_{k=0}^{n-1} \alpha_{k} p_{k}\right) - b^\intercal \left(\sum_{k=0}^{n-1} \alpha_{k} p_{k}\right) \\ & = \frac{1}{2} \sum_{k=0}^{n-1} \sum_{j=0}^{n-1} \alpha_{k} \alpha_{j} {p_{k}}^\intercal A p_{j} - \sum_{k=0}^{n-1} \alpha_{k} b^\intercal p_{k} \, . \end{aligned}

(B.24)

The conjugate gradient method uses a set of $n$ vectors $p_{k}$ that are conjugate with respect to matrix $A$ . Such vectors have the following property:

{p_{k}}^\intercal A p_{j} = 0, \quad \text{for all} \quad k \neq j \, .

(B.25)

Using this conjugacy property, the double-sum term can be simplified to a single sum,

\frac{1}{2} \sum_{k=0}^{n-1} \sum_{j=0}^{n-1} \alpha_{k} \alpha_{j} {p_{k}}^\intercal A p_{j} = \frac{1}{2} \sum_{k=0}^{n-1} {\alpha_{k}}^2 {p_{k}}^\intercal A p_{k} .

(B.28)

Then, Equation B.24 simplifies to

f(u^*) = \sum_{k=0}^{n-1} \left( \frac{1}{2} {\alpha_{k}}^2 {p_{k}}^\intercal A p_{k} - \alpha_{k} b^\intercal p_{k} \right) \, .

(B.29)

Because each term in this sum involves only one direction $p_{k}$ , we have reduced the original problem to a series of one-dimensional quadratic functions that can be minimized one at a time. Each one-dimensional problem corresponds to minimizing the quadratic with respect to the step length $\alpha_{k}$ . Differentiating each term and setting it to zero yields the following:

\alpha_{k} {p_{k}}^\intercal A p_{k} - b^\intercal p_{k} = 0 \Rightarrow \alpha_{k} = \frac{b^\intercal p_{k}}{{p_{k}}^\intercal A p_{k}} \, .

(B.30)

Now, the question is: How do we find this set of directions? There are many sets of directions that satisfy conjugacy. For example, the eigenvectors of $A$ satisfy Equation B.25.^[2]However, it is costly to compute the eigenvectors of a matrix. We want a more convenient way to compute a sequence of conjugate vectors.

The conjugate gradient method sets the first direction to the steepest-descent direction of the quadratic at the first point. Because the gradient of the function is the residual of the linear system (Equation B.22), this first direction is obtained from the residual at the starting point,

p_{1} = - r \left(u_{0} \right) \, .

(B.31)

Each subsequent direction is set to a new conjugate direction using the update

p_{k+1} = - r_{k+1} + \beta_{k} p_{k} \, ,

(B.30)

where $\beta$ is set such that $p_{k+1}$ and $p_{k}$ are conjugate with respect to $A$ .

We can find the expression for $\beta$ by starting with the conjugacy property that we want to achieve,

{p_{k+1}}^\intercal A p_{k} =0 \, .

(B.33)

Substituting the new direction $p_{k+1}$ with the update (Equation B.30), we get

\left( - r_{k+1} + \beta_{k} p_{k} \right)^\intercal A p_{k} = 0 \, .

(B.34)

Expanding the terms and solving for $\beta$ , we get

\beta_{k} = \frac{{r_{k+1}}^\intercal A p_{k}}{{p_{k}}^\intercal A p_{k}} \, .

(B.33)

For each search direction $p_{k}$ , we can perform an exact line search by minimizing the quadratic analytically. The directional derivative of the quadratic at a point $x$ along the search direction $p$ is as follows:

\begin{aligned} \frac{\partial f(x + \alpha p)}{\partial \alpha} &= \frac{\partial}{\partial \alpha} \left( \frac{1}{2} (x+\alpha p)^\intercal A (x+\alpha p) - b^\intercal (x+\alpha p) \right) \\ & = p^\intercal A (x+\alpha p) - p^\intercal b \\ &= p^\intercal (Ax-b) + \alpha p^\intercal A p \\ &= p^\intercal r(x) + \alpha p^\intercal A p \, . \end{aligned}

(B.36)

By setting this derivative to zero, we can get the step size that minimizes the quadratic along the line to be

\alpha_{k} = - \frac{{r_{k}}^\intercal p_{k}}{{p_{k}}^\intercal A p_{k}} \, .

(B.37)

The numerator can be written as a function of the residual alone. Replacing $p_{k}$ with the conjugate direction update (Equation B.30), we get

\begin{aligned} {r_{k}}^\intercal p_{k} &= {r_{k}}^\intercal \left( -{r_{k}}^\intercal + \beta_{k} p_{k-1} \right) \\ & = - {r_{k}}^\intercal {r_{k}}^\intercal + \beta_{k} {r_{k}}^\intercal p_{k-1} \\ & = - {r_{k}}^\intercal r_{k}. \end{aligned}

(B.38)

Here we have used the property of the conjugate directions stating that the residual vector is orthogonal to all previous conjugate directions, so that ${r_{i}}^\intercal p_{i}$ for $i=0,1,\ldots,k-1$ .^[3] Thus, we can now write,

\alpha_{k} = - \frac{{r_{k}}^\intercal r_{k}}{p_{k}^\intercal A p_{k}} \, .

(B.37)

The numerator of the expression for $\beta$ (Equation B.33) can also be written in terms of the residual alone. Using the expression for the residual (Equation B.19) and taking the difference between two subsequent residuals, we get

\begin{aligned} r_{k+1} - r_{k} &= \left(A u_{k+1} - b \right) - \left( A u_{k} -b \right) = A \left( u_{k+1} - u_{k} \right) \\ &= A \left( u_{k} + \alpha_{k} p_{k} - u_{k} \right) \\ &= \alpha_{k} A p_{k} \, . \end{aligned}

(B.40)

Using this result in the numerator of $\beta$ in Equation B.33, we can write

\begin{aligned} {r_{k+1}}^\intercal A p_{k} &= \frac{1}{\alpha_{k}} {r_{k+1}}^\intercal \left( r_{k+1} - r_{k} \right) \\ &= \frac{1}{\alpha_{k}} \left( {r_{k+1}}^\intercal r_{k+1} - {r_{k+1}}^\intercal r_{k} \right) \\ \end{aligned}

\begin{aligned} {r_{k+1}}^\intercal A p_{k} &= \frac{1}{\alpha_{k}} \left( {r_{k+1}}^\intercal r_{k+1} \right) \, , \end{aligned}

(B.41)

where we have used the property that the residual at any conjugate residual iteration is orthogonal to the residuals at all previous iterations, so ${r_{k+1}}^\intercal r_{k} = 0$ .^[4]

Now, using this new numerator and using Equation B.37 to write the denominator as a function of the previous residual, we obtain

\beta_{k} = \frac{{r_{k}}^\intercal {r_{k}}^{}}{{r_{k-1}}^\intercal {r_{k-1}}^{}} \, .

(B.40)

We use this result in the nonlinear conjugate gradient method for function minimization in Section 4.4.2.

The linear conjugate gradient steps are listed in Algorithm 2. The advantage of this method relative to the direct method is that $A$ does not need to be stored or given explicitly. Instead, we only need to provide a function that computes matrix-vector products with $A$ . These products are required to compute residuals ( $r = Au-b$ ) and the $Ap$ term in the computation of $\alpha$ . Assuming a well-conditioned problem with good enough arithmetic precision, the algorithm should converge to the solution in $n$ steps.^[5]

Algorithm 2 (Linear conjugate gradient)

Algorithm

\quad u(0)u^{(0)}u(0): Starting point
\quad τ\tauτ: Convergence tolerance
\quad u∗u^*u∗: Solution of linear system
k=0k=0k=0 Initialize iteration counter
while ∥rk∥∞>τ\|r_{k}\|_\infty > \tau∥rk​∥∞​>τ do
if k=0k=0k=0 then
pk=−rkp_{k} = -r_{k}pk​=−rk​ First direction is steepest descent
βk=rk⊺rkrk−1⊺rk−1\beta_{k} = \dfrac{{r_{k}}^\intercal {r_{k}}^{}}{{r_{k-1}}^\intercal {r_{k-1}}^{}}βk​=rk−1​⊺rk−1​rk​⊺rk​​
pk=−rk+βkpk−1p_{k} = -r_{k} + \beta_{k} p_{k-1}pk​=−rk​+βk​pk−1​ Conjugate gradient direction update
αk=−rk⊺rkpk⊺Apk\alpha_{k} = - \dfrac{{r_{k}}^\intercal r_{k}}{p_{k}^\intercal A p_{k}}αk​=−pk⊺​Apk​rk​⊺rk​​ Step length
uk+1=uk+αkpku_{k+1} = u_{k} + \alpha_{k} p_{k}uk+1​=uk​+αk​pk​ Update variables
k=k+1k = k + 1k=k+1 Increment iteration index

B.4.3 Krylov Subspace Methods¶

Krylov subspace methods are a more general class of iterative methods.^[6] The conjugate gradient is a special case of a Krylov subspace method that applies only to symmetric positive-definite matrices. However, more general Krylov subspace methods, such as the generalized minimum residual (GMRES) method, do not have such restrictions on the matrix. Compared with stationary methods of Section B.4.1, Krylov methods have the advantage that they use information gathered throughout the iterations. Instead of using a fixed splitting matrix, Krylov methods effectively vary the splitting so that $M$ is changed at each iteration according to some criteria that use the information gathered so far. For this reason, Krylov methods are usually more efficient than stationary methods.

Like stationary iteration methods, Krylov methods do not require forming or storing $A$ . Instead, the iterations require only matrix-vector products of the form $A v$ , where $v$ is some vector given by the Krylov algorithm. The matrix-vector product could be given by a black box, as shown in Figure B.2.

For the linear conjugate gradient method (Section B.4.2), we found conjugate directions and minimized the residual of the linear system in a sequence of these directions.

Krylov subspace methods minimize the residual in a space,

x_0 + \mathcal{K}_k \, ,

(B.43)

where $x_0$ is the initial guess, and $\mathcal{K}_k$ is the Krylov subspace,

\mathcal{K}_k(A; r_0) \equiv \text{span} \{ r_0, A r_0, A^2 r_0, \ldots, A^{k-1} r_0 \} \, .

(B.42)

In other words, a Krylov subspace method seeks a solution that is a linear combination of the vectors $r_0, A r_0, \ldots, A^{k-1} r_0$ . The definition of this particular sequence is convenient because these terms can be computed recursively with the matrix-vector product black box as $r_0, A(r_0), A(A(r_0)), A(A(A(r_0))), \ldots$ .

Under certain conditions, it can be shown that the solution of the linear system of size $n$ is contained in the subspace $\mathcal{K}_n$ .

Krylov subspace methods (including the conjugate gradient method) converge much faster when using preconditioning. Instead of solving $Ax=b$ , we solve

(M^{-1} A ) x = M^{-1} b \, ,

(B.43)

where $M$ is the preconditioning matrix (or simply preconditioner). The matrix $M$ should be similar to $A$ and correspond to a linear system that is easier to solve.

The inverse, $M^{-1}$ , should be available explicitly, and we do not need an explicit form for $M$ . The matrix resulting from the product $M^{-1} A$ should have a smaller condition number so that the new linear system is better conditioned.

In the extreme case where $M=A$ , that means we have computed the inverse of $A$ , and we can get $x$ explicitly. In another extreme, $M$ could be a diagonal matrix with the diagonal elements of $A$ , which would scale $A$ such that the diagonal elements are 1.^[7]Krylov subspace solvers require three main components: (1) an orthogonal basis for the Krylov subspace, (2) an optimal property that determines the solution within the subspace, and (3) an effective preconditioner. Various Krylov subspace methods are possible, depending on the choice for each of these three components. One of the most popular Krylov subspace methods is the GMRES.4^[8]“‘

Footnotes¶

1 provides a much more detailed explanation of linear solvers.
↩
Suppose we have two eigenvectors, $v_{k}$ and $v_{j}$ . Then ${v_{k}}^\intercal A v_{j} = {v_{k}}^\intercal (\lambda_{j} v_{j}) = \lambda_{j} {v_{k}}^\intercal v_{j}$ . This dot product is zero because the eigenvectors of a symmetric matrix are mutually orthogonal.
↩
For a proof of this property, see Theorem 5.2 in 2
↩
For a proof of this property, see Theorem 5.3 in 2
↩
Because the linear conjugate gradient method converges in $n$ steps, it was originally thought of as a direct method. It was initially dismissed in favor of more efficient direct methods, such as LU factorization. However, the conjugate gradient method was later reframed as an effective iterative method to obtain approximate solutions to large problems.
↩
This is just an overview of Krylov subspace methods; for more details, see 1 or 3
↩
The splitting matrix $M$ we used in the equation for the stationary methods (Section B.4.1) is effectively a preconditioner. An $M$ using the diagonal entries of $A$ corresponds to the Jacobi method (Equation B.13).
↩
GMRES and other Krylov subspace methods are available in most programming languages, including C/C++, Fortran, Julia, MATLAB, and Python.
↩

References¶

Trefethen, L. N., & Bau III, D. (1997). Numerical Linear Algebra. SIAM.
Nocedal, J., & Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer. 10.1007/978-0-387-40065-5
Saad, Y. (2003). Iterative Methods for Sparse Linear Systems (2nd ed.). SIAM. https://www.google.ca/books/edition/Iterative_Methods_for_Sparse_Linear_Syst/qtzmkzzqFmcC
Saad, Y., & Schultz, M. H. (1986). GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 7(3), 856–869. 10.1137/0907058