Optimization Under Uncertainty - Engineering Design Optimization

Uncertainty is always present in engineering design. Manufacturing processes create deviations from the specifications, operating conditions vary from the ideal, and some parameters are inherently variable. Optimization with deterministic inputs can lead to poorly performing designs. Optimization under uncertainty (OUU) is the optimization of systems in the presence of random parameters or design variables. The objective is to produce robust and reliable designs. A design is robust when the objective function is less sensitive to inherent variability. A design is reliable when it is less prone to violating a constraint when accounting for the variability.^[1]This chapter discusses how uncertainty can be used in the objective function to obtain robust designs and how it can be used in constraints to get reliable designs. We introduce methods that propagate input uncertainties through a computational model to produce output statistics.

We assume familiarity with basic statistics concepts such as expected value, variance, probability density functions (PDFs), cumulative distribution functions (CDFs), and some common probability distributions. A brief review of these topics is provided in Section A.9 if needed.

12.1 Robust Design¶

We call a design robust if its performance is less sensitive to inherent variability. In optimization, “performance” is directly associated with the objective function. Satisfying the design constraints is a requirement, but adding a margin to a constraint does not increase performance in the standard optimization formulation. Thus, for a robust design, the objective function is less sensitive to variations in the random design variables and parameters. We can achieve this by formulating an objective function that considers such variations and reflects uncertainty.

A common example of robust design is considering the performance of an engineering device at different operating conditions. If we had deterministic operating conditions, it would make sense to maximize the performance for those conditions. For example, suppose we knew the exact wind speeds and wind directions a sailboat would experience in a race. In that case, we could optimize the hull and sail design to minimize the time around the course. Unfortunately, if variability does exist, the sailboat designed for deterministic conditions will likely perform poorly in off-design conditions. A better strategy considers the uncertainty in the operating conditions and maximizes the expected performance across a range of conditions. A robust design achieves good performance even with uncertain wind speeds and directions.

There are many options for formulating robust design optimization problems. The most common OUU objective is to minimize the expected value of the objective function $(\text{min}\ \mu_f(x))$ . This yields robust designs because the average performance under variability is considered.

Consider the function shown on the left in Figure 12.1. If $x$ is deterministic, minimizing this function yields the global minimum on the right. Now consider what happens when $x$ is uncertain. “Uncertain” means that $x$ is no longer a deterministic input. Instead, it is a random variable with some probability distribution. For example, $x=0.5$ represents a random variable with a mean of $\mu_x = 0.5$ . We can compute the average value of the objective $\mu_f$ at each $x$ from the expected value of a function (Equation A.65):

\mu_f (x) = \int_{-\infty}^\infty f(z) p(z) \mathrm{d} z, \quad \text{where} \quad p(z) \sim \mathcal{N}(x, \sigma_x) \, ,

(12.1)

and $z$ is a dummy variable for integration. Repeating this integral at each $x$ value gives the expected value as a function of $x$ .

The global minimum of the expected value \mu_f can shift depending on the standard deviation of x, \sigma_x. The bottom row of figures shows the normal probability distributions at x=0.5. — Figure 12.1:The global minimum of the expected value $\mu_f$ can shift depending on the standard deviation of $x$ , $\sigma_x$ . The bottom row of figures shows the normal probability distributions at $x=0.5$ .

Figure 12.1 shows the expected value of the objective for three different standard deviations. The probability distribution of $x$ for a mean value of $x = 0.5$ and three different standard deviations is shown on the bottom row the figure. For a small variance ( $\sigma_x=0.01$ ), the expected value function $\mu_f(x)$ is indistinguishable from the deterministic function $f(x)$ , and the global minimum is the same for both functions. However, for $\sigma_x=0.2$ , the minimum of the expected value function is different from that of the deterministic function. Therefore, the minimum on the right is not as robust as the one on the left. The minimum one on the right is a narrow valley, so the expected value increases rapidly with increased variance. The opposite is true for the minimum on the left. Because it is in a broad valley, the expected value is less sensitive to variability in $x$ . Thus, a design whose performance changes rapidly with respect to variability is not robust.

Of course, the mean is just one possible statistical output metric. Variance, or standard deviation ( $\sigma_f$ ), is another common metric. However, directly minimizing the variance is less common because although low variability is often desirable, such an objective has no incentive to improve mean performance and so usually performs poorly. These two metrics represent a trade-off between risk (variance) and reward (mean). The compromise between these two metrics can be quantified through multiobjective optimization (see Chapter 9), which would result in a Pareto front with the notional behavior illustrated in Figure 12.2.

When designing for robustness, there is an inherent trade-off between risk (represented by the variance, \sigma_f) and reward (represented by the expected value, \mu_f). — Figure 12.2:When designing for robustness, there is an inherent trade-off between risk (represented by the variance, $\sigma_f$ ) and reward (represented by the expected value, $\mu_f$ ).

Because both multiobjective optimization and uncertainty quantification are costly, the overall cost of producing such a Pareto front might be prohibitive. Therefore, we might instead seek to minimize the expected value while constraining the variance to a value that the designer can tolerate. Another option is to minimize the mean plus weighted standard deviations.

Many other relevant statistical objectives do not involve statistical moments like mean or variance. Examples include minimizing the 95th percentile of the distribution or employing a reliability metric, $\text{Pr}(f(x) > f_{\text{crit}})$ , that minimizes the probability that the objective exceeds some critical value.

Example 12.1 (Robust airfoil optimization)

Consider an airfoil optimization, where the profile shape of a wing is optimized to minimize the drag coefficient while constraining the lift coefficient to be equal to a target value. Figure 12.3 shows how the drag coefficient of an RAE 2822 airfoil varies with the Mach number (the airplane speed) in blue, as evaluated by a Navier–Stokes flow solver.^[2] This is a typical drag rise curve, where increasing the Mach number leads to stronger shock waves and an associated increase in wave drag.

Now let us optimize the airfoil shape so that we can fly faster without a large increase in drag. Minimizing the drag of this airfoil at Mach 0.71 results in the red drag curve shown in Figure 12.3. The drag is much lower at Mach 0.71 (as requested!), but any deviation from the target Mach number causes significant drag penalties. In other words, the design is not robust.

Figure 12.3:Single-point optimization performs the best at the target speed but poorly away from the condition. Multipoint optimization is more robust to changes in speed.

One way to improve the design is to use multipoint optimization, where we minimize a weighted sum of the drag coefficient evaluated at different Mach numbers. In this case, we use Mach $= 0.68, 0.71, 0.725$ . Compared with the single-point design, the multipoint design has a higher drag at Mach 0.71 but a lower drag at the other Mach numbers, as shown in Figure 12.3. Thus, a trade-off in peak performance was required to achieve enhanced robustness.

A multipoint optimization is a simplified example of OUU. Effectively, we have treated the Mach number as a random parameter with a given probability at three discrete values. We then minimized the expected value of the drag. This simple change significantly increased the robustness of the design.

Example 12.2 (Robust wind farm layout optimization)

Wind farm layout optimization is another example of OUU but has a more involved probability distribution than the multipoint formulation.^[3] The positions of wind turbines on a wind farm have a substantial impact on overall performance because their wakes interfere. The primary goal of wind farm layout optimization is to position the wind turbines to reduce interference and thus maximize power production. In this example, we optimized the position of nine turbines subject to the constraints that the turbines must stay within a specified boundary and must not be too close to any other turbine.

One of the primary challenges of wind farm layout optimization is that the wind is uncertain and highly variable. To keep this example simple, we assume that wind speed is constant, and only the wind direction is an uncertain parameter. Figure 12.4 shows a PDF of the wind direction for an actual wind farm, known as a wind rose, which is commonly visualized as shown in the plot on the right. The predominant wind directions are from the west and the south. Because of the variable nature of the wind, it would be challenging to intuit the optimal layout.

(a)

(b)

Figure 12.4:Probability density function of wind direction (a) and corresponding wind rose (b).

We solve this problem using two approaches. The first approach is to solve the problem deterministically (i.e., ignore the variability). This is usually done by using mean values for uncertain parameters, often assuming that the variability is Gaussian or at least symmetric. The wind direction is periodic and asymmetric, so we optimize using the most probable wind direction ( $261^\circ$ ).

The second approach is to treat this as an OUU problem. Instead of maximizing the power for one direction, we maximize the expected value of the power for all directions. This is straightforward to compute from the definition of expected value because this is a one-dimensional function. Section 12.3 explains other ways to perform forward propagation.

Figure 12.5 shows the power as a function of wind direction for both cases. The deterministic approach results in higher power production when the wind comes from the west (and $180^\circ$ from that), but that power reduces considerably for other directions. In contrast, the OUU result is less sensitive to changes in wind direction. The expected value of power is 58.6 MW for the deterministic case and 66.1 MW for the OUU case, an improvement of over 12 percent.^[4]

(a)

(b)

Figure 12.5:Wind farm power as a function of wind direction for two optimization approaches: deterministic optimization using the most probable direction and OUU.

We can also analyze the trade-off in the optimal layouts. The left side of Figure 12.6 shows the optimal layout using the deterministic formulation, with the wind coming from the predominant direction (the direction we optimized for). The wakes are shown in blue, and the boundaries are depicted with a dashed line. The optimization spaced the wind turbines out so that there is minimal wake interference. However, the performance degrades significantly when the wind changes direction. The right side of Figure 12.6 shows the same layout but with the wind coming from the second-most-probable direction. In this case, many of the turbines are operating in the wake of another turbine and produce much less power.

(a)

(b)

Figure 12.6:Deterministic cases with the primary wind direction (a) and the secondary wind direction (b).

In contrast, the robust layout is shown in Figure 12.7, with the predominant wind direction on the left and the second-most-probable direction on the right. In both cases, the wake effects are relatively minor.

The turbines are not ideally placed for the predominant direction, but trading the performance for that one direction yields better overall performance when considering other wind directions.

(a)

(b)

Figure 12.7:OUU cases with the primary wind direction (a) and the secondary wind direction (b).

12.2 Reliable Design¶

We call a design reliable when it is less prone to failure under variability. In other words, the constraints have a lower probability of being violated under variations in the random design variables and parameters. In a robust design, we consider the effect of uncertainty on the objective function. In reliable design, we consider that effect on the constraints.

A common example of reliability is structural safety. Consider Example 3.9, where we formulated a mass minimization subject to stress constraints. In such structural optimization problems, many of the stress constraints are active at the optimum. Constraining the stress to be equal to or below the yield stress value as if this value were deterministic is probably not a good idea because variations in the material properties or manufacturing could result in structural failure. Instead, we might want to include this variability so that we can reduce the probability of failure.

To generate a reliable design, we want the probability of satisfying the constraints to exceed some preselected reliability level. Thus, we change deterministic inequality constraints $g(x) \le 0$ to ensure that the probability of constraint satisfaction exceeds a specified reliability level $r$ , that is,

\text{Pr}(g(x) \le 0) \ge r \, .

(12.2)

For example, if we set $r_i = 0.999$ , then constraint $i$ must be satisfied with a probability of 99.9 percent. Thus, we can explicitly set the reliability level that we wish to achieve, with associated trade-offs in the level of performance for the objective function.

Example 12.3 (Reliability with the Barnes function)

Consider the Barnes problem shown on the left side of Figure 12.8. The three red lines are the three nonlinear constraints of the problem, and the red regions highlight regions of infeasibility. With deterministic inputs, the optimal value is on the constraint line. An uncertainty ellipse shown around the optimal point highlights the fact that the solution is not reliable. Any variability in the inputs can cause one or more constraints to be violated.

(a)

(b)

Figure 12.8:The deterministic optimum design is on the constraint line (a), and the constraint might be violated if there is variability. The reliable design optimum (b) satisfies the constraints despite the variability.

Conversely, the right side of Figure 12.8 shows a reliable optimum, with the same uncertainty ellipse. In this case, it is much more probable that the design will satisfy all constraints under the input variations. However, as noted in the introduction, increased reliability presents a performance trade-off, with a corresponding increase in the objective function. The higher the reliability we seek, the more we need to give up on performance.

In some engineering disciplines, increasing reliability is handled simply through safety factors. These safety factors are deterministic but are usually derived through statistical means.

Example 12.4 (Relating safety factors to reliability)

If we were constraining the stress ( $\sigma$ ) in a structure to be less than the material’s yield stress ( $\sigma_y$ ), we would not want to use a constraint of the following form:

\sigma(x) \le \sigma_y .

This would be dangerous because we know there is inherent variability in the loads and uncertainty in the yield stress of the material. Instead, we often use a simple safety factor and enforce the following constraint:

\sigma(x) \le \eta \sigma_y ,

where $\eta$ is a total safety factor that accounts for safety factors from loads, materials, and failure modes. Of course, not all applications have standards-driven safety factors already determined. The statistical approach discussed in this chapter is useful in these situations to obtain reliable designs.

12.3 Forward Propagation¶

In the previous sections, we have assumed that we know the statistics (e.g., mean and standard deviation) of the outputs of interest (objectives and constraints). However, we generally do not have that information. Instead, we might only know the PDFs of the inputs.^[5]Forward-propagation methods propagate input uncertainties through a numerical model to compute output statistics.

Uncertainty quantification is a large field unto itself, and we only provide an introduction to it in this chapter. We introduce four well-known nonintrusive methods for forward propagation: first-order perturbation methods, direct quadrature, Monte Carlo methods, and polynomial chaos.

12.3.1 First-Order Perturbation Method¶

Perturbation methods are based on a local Taylor series expansion of the functional output. In the following, $f$ represents an output of interest, and $x$ represents all the random variables (not necessarily all the variables that $f$ depends on). A first-order Taylor series approximation of $f$ about the mean of $x$ is given by

f(x) \approx f(\mu_x) + \sum_{i=1}^n \frac{\partial f}{\partial x_i} (x_i - {\mu_x}_i) \, ,

(12.3)

where $n$ is the dimensionality of $x$ . We can estimate the average value of $f$ by taking the expected value of both sides and using the linearity of expectation as follows:

\begin{aligned} \mu_f &= \mathbb{E}(f(x)) \\ &\approx \mathbb{E}(f(\mu_x)) + \sum_i \mathbb{E}\left(\frac{\partial f}{\partial x_i} (x_i - {\mu_x}_i) \right) \\ &= f(\mu_x) + \sum_i \frac{\partial f}{\partial x_i} \left(\mathbb{E}(x_i) - {\mu_x}_i \right) \\ &= f(\mu_x) + \sum_i \frac{\partial f}{\partial x_i} \left( {\mu_x}_i - {\mu_x}_i \right) \, . \end{aligned}

(12.4)

The last first-order term is zero, so we can write

\mu_f = f(\mu_x) \, .

(12.5)

That is, when considering only first-order terms, the mean of the function is the function evaluated at the mean of the input.

The variance of $f$ is given by

\begin{aligned} \sigma^2_f &= \mathbb{E}(f(x)^2) - \left(\mathbb{E}(f(x))\right)^2\\ &\approx \mathbb{E}\left[f(\mu_x)^2 + 2 f(\mu_x)\sum_i \frac{\partial f}{\partial x_i} (x_i - {\mu_x}_i) + \right. \\ &\quad\left.\sum_i\sum_j \frac{\partial f}{\partial x_i}\frac{\partial f}{\partial x_j} (x_i - {\mu_x}_i) (x_j - {\mu_x}_j) \right] - f(\mu_x)^2 \\ &= \sum_i\sum_j \frac{\partial f}{\partial x_i}\frac{\partial f}{\partial x_j} \mathbb{E}\left[(x_i - {\mu_x}_i) (x_j - {\mu_x}_j) \right] \, . \end{aligned}

(12.6)

The expectation term in this equation is the covariance matrix $\Sigma(x_i, x_j)$ , so we can write this in matrix notation as

\sigma^2_f = (\nabla_x f)^\intercal \Sigma (\nabla_x f) \, .

(12.7)

We often assume that each random input variable is mutually independent. This is true for the design variables for a well-posed optimization problem, but the parameters may or may not be independent.

When the parameters are independent (this assumption is often made even if not strictly true), the covariance matrix is diagonal, and the variance estimation simplifies to

\sigma_f^2 = \sum_{i=1}^n \left(\frac{\partial f}{\partial x_i} \sigma_{x_i}\right)^2 \, .

(12.8)

These equations are frequently used to propagate errors from experimental measurements. Major limitations of this approach are that (1) it relies on a linearization (first-order Taylor series), which has limited accuracy;^[6](2) it assumes that all uncertain parameters are uncorrelated, which is true for design variables but is not necessarily true for parameters (this assumption can be relaxed by providing the covariances); and (3) it implicitly assumes symmetry in the input distributions because we neglect all higher-order moments (e.g., skewness, kurtosis) and is, therefore, less applicable for problems that are highly asymmetric, such as the wind farm example (Example 12.2).

We have not assumed that the input or output distributions are normal probability distributions (i.e., Gaussian). However, we can only estimate the mean and variance with a first-order series and not the higher-order moments.

The equation for the variance (Equation 12.8) is straightforward, but the derivative terms can be challenging when using gradient-based optimization.

The first-order derivatives in Equation 12.7 can be computed using any of the methods from Chapter 6. If they are computed efficiently using a method appropriate to the problem, the forward propagation is efficient as well. However, second-order derivatives are required to use gradient-based optimization (assuming some of the design variables are also random variables). That is because the uncertain objectives and constraints now contain derivatives, and we need derivatives of those functions.

Because computing accurate second derivatives is costly, these methods are used less often than the other techniques discussed in this chapter.

We can use a simpler approach if we ignore variability in the objective and focus only on the variability in the constraints (reliability-based optimization). In this case, we can approximate the effect of the uncertainty by pulling it outside of the optimization iterations. We demonstrate one such approach, where we make the additional assumption that each constraint is normally distributed.6

If $g(x)$ is normally distributed, we can rewrite the probabilistic constraint (Equation 12.2) as

g(x) + z \sigma_g \le 0 \, ,

(12.9)

where $z$ is chosen for the desired reliability level $r$ . For example, $z = 2$ implies a reliability level of 97.72 percent (one-sided tail of the normal distribution). In many cases, an output distribution is reasonably approximated as normal, but this method tends to be less effective for cases with nonnormal output.

With multiple active constraints, we must be careful to appropriately choose the reliability level for each constraint such that the overall reliability is in the desired range. We often simplify the problem by assuming that the constraints are uncorrelated. Thus, the total reliability is the product of the reliabilities of each constraint.

This simplified approach has the following steps:

Compute the deterministic optimum.
Estimate the standard deviation of each constraint $\sigma_g$ using Equation 12.8.
Adjust the constraints to $g(x) + z \sigma_g \le 0$ for some desired reliability level and re-optimize.
Repeat steps 1–3 as needed.

This method is easy to use, and although approximate, the magnitude of error is usually appropriate for the conceptual design phase. If the errors are unacceptable, the standard deviation can be computed inside the optimization. The major limitation of this method is that it only applies to reliability-based optimization.

Example 12.5 (Iterative reliability-based optimization)

Consider the following problem:

\begin{align*} \text{minimize} &\quad f = x_1^2 + 2x_2^2 + 3x_3^2\\ \text{by varying} &\quad x_1, x_2, x_3 \\ \text{subject to} &\quad g_1 = - 2x_1 - x_2 - 2x_3 + 6 \le 0\\ &\quad g_2 = -5 x_1 + x_2 + 3 x_3 + 10 \le 0 \, . \end{align*}

(12.10)

All the design variables are random variables with standard deviations $\sigma_{x_1} = \sigma_{x_2} = 0.033$ , and $\sigma_{x_3} = 0.0167$ . We seek a reliable optimum, where each constraint has a target reliability of 99.865 percent.

First, we compute the deterministic optimum, which is

x^* = [2.35\overline{15}, 0.3\overline{75}, 0.4\overline{60}], \quad f^* = 6.4\overline{48} \, .

We compute the standard deviation of each constraint, using Equation 12.8, about the deterministic optimum, yielding $\sigma_{g_1} = 0.081, \sigma_{g_2} = 0.176$ . Using an inverse CDF function (discussed in Section 10.2.1) shows that a CDF of 0.99865 corresponds to a $z$ -score of 3. We then re-optimize with the new reliability constraints to obtain the solution:

x^* = [2.462, 0.3836, 0.4673], \quad f^* = 7.013 \, .

In this case, we sacrificed approximately 9 percent in the objective value to obtain a more reliable design.

Figure 12.9:Histogram of maximum constraint violation across 100,000 samples for both the deterministic and reliability-based optimization.

Because there are two constraints, and each had a target reliability of 99.865 percent, the estimated overall reliability (assuming independence of constraints) is 99.865 percent $\times$ 99.865 percent = 99.73 percent.

To check these results, we use Monte Carlo simulations (explained in Section 12.3.3) with 100,000 samples to produce the output histograms shown in Figure 12.9. The deterministic optimum fails often ( $\|g(x)\|_\infty > 0$ ), so its reliability is a surprisingly poor 34.6 percent. The reliable optimum shifts the distribution to the left, yielding a reliability of 99.75 percent, which is close to our design target.

12.3.2 Direct Quadrature¶

Another approach to estimating statistical outputs of interest is to apply numerical integration (also known as quadrature) directly to their definitions. For example:

\begin{align} \mu_f &= \int f(x) p(x) \mathrm{d} x\\ \sigma_f^2 &= \int f(x)^2 p(x) \mathrm{d} x - \mu_f^2 . \end{align}

(12.11)

Discretizing $x$ using $n$ points, we get the summation

\int f(x) \mathrm{d} x \approx \sum_{i=1}^n f(x_i) w_i \, .

(12.12)

The quadrature strategy determines the evaluation nodes ( $x_i$ ) and the corresponding weights ( $w_i)$ .

The most common quadratures originate from composite Newton–Cotes formulas: the composite midpoint, trapezoidal, and Simpson’s rules. These methods use equally spaced nodes, a specification that can be relaxed but still results in a predetermined set of fixed nodes. To reach a specified level of accuracy, it is often desirable to use nesting. In this strategy, a refined mesh (smaller spacing between nodes) reuses nodes from the coarser spacing. For example, a simple nesting strategy is to add a new node between all existing nodes. Thus, the accuracy of the integral can be improved up to a specified tolerance while reusing previous function evaluations.

Although straightforward to apply, the Newton–Cotes formulas are usually much less efficient than Gaussian quadrature, at least for smooth, nonperiodic functions. Efficiency is highly desirable because the output functions must be called many times for forward propagation, as well as throughout the optimization. The Newton–Cotes formulas are based on fitting polynomials: constant (midpoint), linear (trapezoidal), and quadratic (Simpson’s).

The weights are adjusted between the different methods, but the nodes are fixed. Gaussian quadrature includes the nodes as degrees of freedom selected by the quadrature strategy. The method approximates the integrand as a polynomial and then efficiently evaluates the integral for the polynomial exactly. Because some of the concepts from Gaussian quadrature are used later in this chapter, we review them here.

An $n$ -point Gaussian quadrature has $2n$ degrees of freedom ( $n$ node positions and $n$ corresponding weights), so it can be used to exactly integrate any polynomial up to order $2n - 1$ if the weights and nodes are appropriately chosen. For example, a 2-point Gaussian quadrature can exactly integrate all polynomials up to order 3. To illustrate, consider an integral over the bounds -1 to 1 (we will later see that these bounds can be used as a general representation of any finite bounds through a change of variables):

\int_{-1}^1 f(x) \mathrm{d} x \approx w_1 f(x_1) + w_2 f(x_2) \, .

(12.13)

We want this model to be exact for all polynomials up to order 3. If the actual function were a constant ( $f(x) = a$ ), then the integral equation would result in the following:

2a = a (w_1 + w_2).

(12.14)

Repeating this process for polynomials of order 1, 2, and 3 yields four equations and four unknowns:

\begin{aligned} 2 &= w_1 + w_2 \\ 0 &= w_1 x_1 + w_2 x_2\\ \frac{2}{3} &= w_1 x_1^2 + w_2 x_2^2\\ 0 &= w_1 x_1^3 + w_2 x_2^3 \, . \end{aligned}

(12.15)

Solving these equations yields $w_1 = w_2 = 1, x_1 = -x_2 = 1/\sqrt{3}$ . Thus, we have the weights and node positions that integrate a cubic (or lower-order) polynomial exactly using just two function evaluations, that is,

\int_{-1}^1 f(x) \mathrm{d} x = f\left(-\frac{1}{\sqrt{3}} \right) + f\left(\frac{1}{\sqrt{3}} \right) \, .

(12.16)

More generally, this means that if we can reasonably approximate a general function with a cubic polynomial over the interval, we can provide a good estimate for its integral efficiently.

We would like to extend this procedure to any number of points without the cumbersome approach just applied. The derivation is lengthy (particularly for the weights), so it is not repeated here, other than to explain some of the requirements and the results. The derivation of Gaussian quadrature requires orthogonal polynomials. Two vectors are orthogonal if their dot product is zero. The definition is similar for functions, but because functions have an infinite dimension, we require an integral instead of a summation. Thus, two functions $f$ and $g$ are orthogonal over an interval $a$ to $b$ if their inner product is zero. Different definitions can be used for the inner product. The simplest definition is as follows:

\int_a^b f(x) g(x) \mathrm{d} x = 0 \, .

(12.17)

For the Gaussian quadrature derivation, we need a set of polynomials that are not only orthogonal to each other but also to any polynomial of lower order. For the previous inner product, it turns out that Legendre polynomials ( $L_n$ is a Legendre polynomial of order $n$ ) possess the desired properties:

\int_{-1}^{1} x^k L_n(x) \mathrm{d} x = 0, \,\, \text{for any} \,\, k < n .

(12.18)

Legendre polynomials can be generated by the recurrence relationship,

L_{n+1}(x) = \frac{(2 n + 1)}{(n+1)} x L_n(x) - \frac{n}{(n+1)} L_{n-1}(x) \, ,

(12.19)

where $L_0 = 1$ , and $L_1 = x$ . Figure 12.10 shows a plot of the first few Legendre polynomials.

From the Gaussian quadrature derivation, we find that we can integrate any polynomial of order $2n - 1$ exactly by choosing the node positions $x_i$ as the roots of the Legendre polynomial $L_n$ , with the corresponding weights given by

w_i = \frac{2}{(1 - x_i^2)\left[L_n^\prime(x_i)\right]^2} \, .

(12.20)

Legendre polynomials are defined over the interval $[-1, 1]$ , but we can reformulate them for an arbitrary interval $[a, b]$ through a change of variables:

x = \left(\frac{b - a}{2}\right) z + \left(\frac{b + a}{2}\right) \, ,

(12.21)

where $z \in [-1, 1]$ .

Using the change of variables, we can write

\int_a^b f(x) \mathrm{d} x = \int_{-1}^{1} f\left(\frac{(b - a)}{2} z + \frac{b + a}{2}\right) \left(\frac{b - a}{2}\right) \mathrm{d} z \, .

(12.22)

Now, applying a quadrature rule, we can approximate the integral as

\int_a^b f(x) \mathrm{d} x \approx \left(\frac{b - a}{2}\right) \sum_{i=1}^m w_i f\left(\frac{(b - a)}{2} z_i + \frac{b + a}{2} \right) \, ,

(12.23)

where the node locations and respective weights come from the Legendre polynomials.

Recall that what we are after in this section is not just any generic integral but, rather, metrics such as the expected value,

\mu_f = \int f(x) p(x) \mathrm{d} x \, .

(12.24)

As compared to our original integral (Equation 12.12), we have an additional function $p(x)$ , referred to as a weight function. Thus, we extend the definition of orthogonal polynomials (Equation 12.17) to orthogonality with respect to the weight $p(x)$ , also known as a weighted inner product:

\langle f, g \rangle = \int_a^b f(x) g(x) p(x) \mathrm{d} x = 0 \, .

(12.25)

For our purposes, the weight function is $p(x)$ , or it is related to it through a change of variables.

Orthogonal polynomials for various weight functions are listed in Table 1. The weight function in the table does not always correspond exactly to the typically used PDF $(p(x))$ , so a change of variables (like Equation 12.22) might be needed. The formula described previously is known as Gauss–Legendre quadrature, whereas the variants listed in Table 1 are called Gauss–Hermite, and so on. Formulas and tables with node locations and corresponding weight values exist for most standard probability distributions. For any given weight function, we can generate orthogonal polynomials,7 and we can generate orthogonal polynomials for general distributions (e.g., ones that were empirically derived).

Table 1:Orthogonal polynomials that correspond to some common probability distributions.

Prob. dist.	Weight function	Polynomial	Support range
Uniform	1	Legendre	$[-1, 1]$
Normal	$e^{-x^2}$	Hermite	$(-\infty, \infty)$
Exponential	$e^{-x}$	Laguerre	$[0, \infty)$
Beta	$(1-x)^\alpha(1+x)^\beta$	Jacobi	$(-1, 1)$
Gamma	$x^\alpha e^{-x}$	Generalized Laguerre	$[0, \infty)$

We now provide more details on Gauss–Hermite quadrature because normal distributions are common. The Hermite polynomials ( $H_n$ ) follow the recurrence relationship,

H_{n+1}(x) = x H_n(x) - n H_{n-1}(x) \, ,

(12.26)

where $H_0(x) = 1$ , and $H_1(x) = x$ .

Figure 12.11:The first few Hermite polynomials.

The first few polynomials are plotted in Figure 12.11. For Gauss–Hermite quadrature, the nodes are positioned at the roots of $H_n(x)$ , and their weights are

w_i = \frac{\sqrt{\pi} n!}{n^2 \left(H_{n-1}(\sqrt{2} x_i)\right)^2} \, .

(12.27)

A coordinate transformation is needed because the standard normal distribution differs slightly from the weight function in Table 1. For example, if we are seeking an expected value, with $x$ normally distributed, then the integral is given by

\mu_f = \int_{-\infty}^\infty f(x) \frac{1}{\sigma \sqrt{2 \pi}} \exp\left( -\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2\right) \mathrm{d} x.

(12.28)

We use the change of variables,

z = \frac{x - \mu}{\sqrt{2} \sigma}.

(12.29)

Then, the resulting integral becomes

\mu_f = \frac{1}{\sqrt{\pi}} \int_{-\infty}^\infty f\left(\mu + \sqrt{2} \sigma z \right) \exp \left(-z^2 \right) \mathrm{d} z.

(12.30)

This is now in the appropriate form, so the quadrature rule (using the Hermite nodes and weights) is

\mu_f = \frac{1}{\sqrt{\pi}} \sum_{i = 1}^n w_i f\left(\mu + \sqrt{2} \sigma z_i \right) .

(12.31)

Example 12.6 (Gauss--Hermite quadrature)

Suppose we want to compute the expected value $\mu_f$ for the one-dimensional function $f(x) = \cos(x^2)$ at $x = 2$ , assuming that $x$ is normally distributed as $x \sim \mathcal{N}(2, 0.2)$ .

Figure 12.12:Error in the integral as a function of the number of nodes.

Let us use Gauss–Hermite quadrature with an increasing number of nodes. We plot the absolute value of the error, $|\varepsilon|$ , relative to the exact result ( $\mu_f = -0.466842330417276$ ) versus the number of quadrature points in Figure 12.12. The Gauss–Hermite quadrature converges quickly; with only six points, we reduce the error to around 10^-6. Trapezoidal integration, by comparison, requires over 35 function evaluations for a similar error.

In this problem, we could have taken advantage of symmetry, but we are only interested in the trend (for a smooth function, trapezoidal integration generally converges at least quadratically, whereas Gaussian quadrature converges exponentially).

The first-order method of the previous section predicts $\mu_f = -0.6536$ , which is not an acceptable approximation because of the nonlinearity of $f$ .

Gaussian quadrature does not naturally lead to nesting, which, as previously mentioned, can increase the accuracy by adding points to a given quadrature. However, methods such as Gauss–Konrod quadrature adapt Gaussian quadrature to utilize nesting. Although Gaussian quadrature is often used to compute one-dimensional integrals efficiently, it is not always the best method. For non-smooth functions, trapezoidal integration is usually preferable because polynomials are ill-suited for capturing discontinuities. Additionally, for periodic functions such as the one shown in Figure 12.4, the trapezoidal rule is better than Gaussian quadrature, exhibiting exponential convergence.89 This is most easily seen by using a Fourier series expansion.10

Clenshaw–Curtis quadrature applies this idea to a general function by employing a change of variables ( $x = \cos\theta$ ) to create a periodic function that can then be efficiently integrated with the trapezoidal rule. Clenshaw–Curtis quadrature also has the advantage that nesting is straightforward and thus desirable for higher-dimensional functions, as discussed next.

The direct quadrature methods discussed so far focused on integration in one dimension, but most problems have more than one random variable. Extending numerical integration to multiple dimensions (also known as cubature) is much more challenging. The most obvious extension for multidimensional quadrature is a full grid tensor product. This type of grid is created by discretizing each dimension and then evaluating at every combination of nodes. Mathematically, the quadrature formula can be written as

\begin{gather} \int f(x) \mathrm{d} x_1 \mathrm{d} x_2 \ldots \mathrm{d} x_n \approx \\ \sum_i \sum_j \ldots \sum_n f(x_i, x_j, \ldots, x_n) w_i w_j \ldots w_n \, . \end{gather}

(12.32)

Although conceptually straightforward, this approach is subject to the curse of dimensionality.^[7]The number of points we need to evaluate grows exponentially with the number of input dimensions.

One approach to dealing with exponential growth is to use a sparse grid method.11 The basic idea is to neglect higher-order cross terms. For example, assume that we have a two-dimensional problem and that both variables used a fifth-degree polynomial in the quadrature strategy. The cross terms would include terms up to the 10th order. Although we can integrate these high-order polynomials exactly, their contributions become negligible beyond a specific order. We specify a maximum degree that we want to include and remove all higher-order terms from the evaluation. This method significantly reduces the number of evaluation nodes, with minimal loss in accuracy.

For a problem with dimension $d$ and $n$ sample points in each dimension, the entire tensor grid has a computational complexity of $\mathcal{O}(n^d)$ . In contrast, the sparse grid method has a complexity of $\mathcal{O}(n (\log n)^{d-1})$ with comparable accuracy. This scaling alleviates the curse of dimensionality to some extent. However, the number of evaluation points is still strongly dependent on problem dimensionality, making it intractable in high dimensions.

12.3.3 Monte Carlo Simulation¶

Monte Carlo simulation is a sampling-based procedure that computes statistics and output distributions. Sampling methods approximate the integrals mentioned in the previous section by using the law of large numbers. The concept is that output probability distributions can be approximated by running the simulation many times with randomly sampled inputs from the corresponding probability distributions. There are three steps:

Random sampling. Sample $n$ points $x_i$ from the input probability distributions using a random number generator.
Numerical experimentation. Evaluate the outputs at these points, $f_i = f(x_i)$ .
Statistical analysis. Compute statistics on the discrete output distribution $f_i$ .

For example, the discrete form of the mean is

\mu_f = \frac{1}{n}\sum_{i=1}^n f_i \, , ,

(12.33)

and the unbiased estimate of the variance is computed as

\sigma_f^2 = \frac{1}{n-1}\left(\sum_{i=1}^n \left(f_i^2\right) - n \mu_f^2\right) \, .

(12.34)

We can also estimate $\text{Pr}(g(x) \le 0)$ by counting how many times the constraint was satisfied and dividing by $n$ . If we evaluate enough samples, our output statistics converge to the actual values by the law of large numbers. Therein also lies this method’s disadvantage: it requires a large number of samples.

Monte Carlo simulation has three main advantages. First, the convergence rate is independent of the number of inputs. Whether we have 3 or 300 random input variables, the convergence rate is similar because we randomize all input variables for each sample. This is an advantage over direct quadrature for high-dimensional problems because, unlike quadrature, Monte Carlo does not suffer from the curse of dimensionality. Second, the algorithm is easy to parallelize because all of the function evaluations are independent. Third, in addition to statistics like the mean and variance, Monte Carlo generates the output probability distributions. This is a unique advantage compared with first-order perturbation and direct quadrature, which provide summary statistics but not distributions.

The major disadvantage of the Monte Carlo method is that even though the convergence rate does not depend on the number of inputs, the convergence rate is slow— $\mathcal{O}(1/\sqrt{n})$ . This means that every additional digit of accuracy requires about 100 times more samples. It is also hard to know which value of $n$ to use a priori. Usually, we need to determine an appropriate value for $n$ through convergence testing (trying larger $n$ values until the statistics converge).

One approach to achieving converged statistics with fewer iterations is to use Latin hypercube sampling (LHS) or low-discrepancy sequences, as discussed in Section 10.2. Both methods allow us to approximate the input distributions with fewer samples. Low-discrepancy sequences are particularly well suited for this application because convergence testing is iterative. When combined with low-discrepancy sequences, the method is called quasi-Monte Carlo, and the scaling improves to $\mathcal{O}(1/n)$ . Thus, each additional digit of accuracy requires 10 times as many samples. Even with better sampling methods, many simulations are usually required, which can be prohibitive if used as part of an OUU problem.

Example 12.9 (Forward propagation with Monte Carlo)

Consider a problem with the following objective and constraint:

\begin{aligned} f(x) &= x_1^2 + 2 x_2^2 + 3 x_3^2 \\ g(x) &= x_1 + x_2 + x_3 - 3.5 \le 0 \, . \end{aligned}

Suppose that the current optimization iteration is $x = [1, 1, 1]$ . We assume that the first variable is deterministic, whereas the latter two variables have uncertainty under a normal distribution with the following standard deviations: $\sigma_2 = 0.06$ and $\sigma_3 = 0.2$ . We would like to compute the output statistics for $f$ (mean, variance, and a histogram) and compute the reliability of the constraint at this current iteration.

We do not know how many samples we need to get reasonably converged statistics, so we need to perform a convergence study. For a given number of samples, we generate random numbers normally distributed with mean $x_i$ and standard deviation $\sigma_i$ . Then we evaluate the functions and compute the mean (Equation 12.33), variance (Equation 12.34), and reliability of the outputs.

Figure 12.15 shows the convergence of the mean and standard deviation using a random sampling curve, LHS (Section 10.2.1), and quasi-Monte Carlo (using Halton sequence sampling from Section 10.2.2). The latter two methods converge much more quickly than random sampling. LHS performs better for few samples in this case, but generating the convergence data requires more function evaluations than quasi-Monte Carlo because an all-new set of sample points is generated for each $n$ (instead of being incrementally generated as in the Halton sequence for quasi-Monte Carlo). That cost is less problematic for optimization applications because the convergence testing is only done at the preprocessing stage. Once a number of samples $n$ is chosen for convergence, $n$ is fixed throughout the optimization.

Figure 12.15:Convergence of the mean (left) and standard deviation (right) versus the number of samples using Monte Carlo.

From the data, we conclude that we need about $n=10^4$ samples to have well-converged statistics.

Figure 12.16:Histogram of objective function for 10,000 samples.

Using $n=10^4$ yields $\mu = 6.127, \sigma = 1.235$ , and $r = 0.9914$ . The random sampling of these results varies between simulations (except for the Halton sequence in quasi-Monte Carlo, which is deterministic).

The production of an output histogram is a key benefit of this method. The histogram of the objective function is shown in Figure 12.16. Notice that it is not normally distributed in this case.

12.3.4 Polynomial Chaos¶

Polynomial chaos (also known as spectral expansions) is a class of forward-propagation methods that take advantage of the inherent smoothness of the outputs of interest using polynomial approximations.^[8]

The method extends the ideas of Gaussian quadrature to estimate the output function, from which the output distribution and other summary statistics can be efficiently generated. In addition to using orthogonal polynomials to evaluate integrals, we use them to approximate the output function. As in Gaussian quadrature, the polynomials are orthogonal with respect to a specified probability distribution (see Equation 12.25 and Table 1). A general function that depends on uncertain variables $x$ can be represented as a sum of basis functions $\psi_i$ (which are usually polynomials) with weights $\alpha_i$ ,

f(x) = \sum_{i=0}^\infty \alpha_i \psi_i(x) .

(12.35)

In practice, we truncate the series after $n+1$ terms and use

f(x) \approx \sum_{i=0}^n \alpha_i \psi_i(x) \, .

(12.36)

The required number of terms $n$ for a given input dimension $d$ and polynomial order $o$ is

n+1 = \frac{(d + o)!}{d! o!}.

(12.37)

This approach amounts to a truncated generalized Fourier series.

By definition, we choose the first basis function to be $\psi_0 = 1$ . This means that the first term in the series is a constant (polynomial of order 0). Because the basis functions are orthogonal, we know that

\langle \psi_i, \psi_j \rangle = 0 \,\, \text{if} \,\, i \ne j .

(12.38)

Polynomial chaos consists of three main steps:

Select an orthogonal polynomial basis.
Compute coefficients to fit the desired function.
Compute statistics on the function of interest.

These three steps are described in the following sections. We begin with the last step because it provides insight for the first two.

Compute Statistics¶

Using the polynomial approximation (Equation 12.36) in the definition of the mean, we obtain

\mu_f = \int_{-\infty}^{\infty} \sum_i \alpha_i \psi_i(x) p(x) \mathrm{d} x \, .

(12.39)

The coefficients $\alpha_i$ are constants that can be taken out of the integral, so we can write

\begin{align*} \mu_f &= \sum_i \alpha_i \int \psi_i(x) p(x) \mathrm{d} x \\ &= \alpha_0 \int \psi_0(x) p(x) \mathrm{d} x + \alpha_1 \int \psi_1(x) p(x) \mathrm{d} x + \alpha_2 \int \psi_2(x) p(x) \mathrm{d} x + \ldots \, . \end{align*}

(12.40)

We can multiply all terms by $\psi_0$ without changing anything because $\psi_0 = 1$ , so we can rewrite this expression in terms of the inner product as

\begin{align} \mu_f &= \alpha_0 \int p(x) \mathrm{d} x + \alpha_1 \langle \psi_0, \psi_1 \rangle + \alpha_2 \langle \psi_0, \psi_2 \rangle + \ldots. \end{align}

(12.41)

Because the polynomials are orthogonal, all the terms except the first are zero (see Equation 12.38). From the definition of a PDF (Equation A.63), we know that the first term is 1. Thus, the mean of the function is simply the zeroth coefficient,

\mu_f = \alpha_0 \, .

(12.41)

We can derive a formula for the variance using a similar approach.

Substituting the polynomial representation (Equation 12.36) into the definition of variance and using the same techniques used in deriving the mean, we obtain

\begin{align*} \sigma_f^2 &= \int \left(\sum_i \alpha_i \psi_i(x) \right)^2 p(x) \mathrm{d} x - \alpha_0^2\\ &= \sum_i \alpha_i^2 \int \psi_i(x)^2 p(x) \mathrm{d} x - \alpha_0^2\\ \end{align*}

(12.43)

\begin{aligned} \sigma_f^2 &= \alpha_0^2 \int \psi_0^2 p(x) \mathrm{d} x + \sum_{i=1}^n \alpha_i^2\int \psi_i(x)^2 p(x) \mathrm{d} x - \alpha_0^2\\ &= \alpha_0^2 + \sum_{i=1}^n \alpha_i^2 \int \psi_i(x)^2 p(x) \mathrm{d} x - \alpha_0^2\\ &= \sum_{i=1}^n \alpha_i^2\int \psi_i(x)^2 p(x) \mathrm{d} x \\ &= \sum_{i=1}^n \alpha_i^2 \langle \psi_i^2 \rangle \, . \end{aligned}

(12.44)

That last step is just the definition of the weighted inner product (Equation 12.25), providing the variance in terms of the coefficients and polynomials:

\sigma_f^2 = \sum_{i=1}^n \alpha_i^2 \langle \psi_i^2 \rangle \, .

(12.45)

The inner product $\langle \psi_i^2 \rangle = \langle \psi_i, \psi_i \rangle$ can often be computed analytically. For example, using Hermite polynomials with a normal distribution yields

\langle H_n^2 \rangle = n! \, .

(12.46)

For cases without analytic solutions, Gaussian quadrature of this inner product is still straightforward and exact because it only includes polynomials.

For multiple uncertain variables, the formulas are the same, but we use multidimensional basis polynomials. Denoting these multidimensional basis polynomials as $\Psi_i$ , we can write

\begin{align*} [align] \mu_f &= \alpha_0 \\ \sigma_f^2 &= \sum_{i=1}^n \alpha_i^2 \langle \Psi_i^2 \rangle \, . \end{align*}

(12.47)

The multidimensional basis polynomials are defined by products of one-dimensional polynomials, as detailed in the next section. Polynomial chaos computes the mean and variance using these equations and our definition of the inner product. Other statistics can be estimated by sampling the polynomial expansion. Because we now have a simple polynomial representation that no longer requires evaluating the original (potentially expensive) function $f$ , we can use sampling procedures (e.g., Monte Carlo) to create output distributions without incurring high costs. Of course, we have to evaluate the function $f$ to generate the coefficients, as we will discuss later.

Selecting an Orthogonal Polynomial Basis¶

As discussed in Section 12.3.2, we already know appropriate orthogonal polynomials for many continuous probability distributions (see Table 1^[9]). We also have methods to generate other exponentially convergent polynomial sets for any given empirical distribution.13

The multidimensional basis functions we need are defined by tensor products. For example, if we had two variables from a uniform probability distribution (and thus Legendre bases), then the polynomials up through the second-order terms would be as follows:

\begin{align*} \Psi_0(x) &= \psi_0(x_1) \psi_0(x_2) = 1\\ \Psi_1(x) &= \psi_1(x_1) \psi_0(x_2) = x_1\\ \Psi_2(x) &= \psi_0(x_1) \psi_1(x_2) = x_2\\ \Psi_3(x) &= \psi_1(x_1) \psi_1(x_2) = x_1 x_2\\ \Psi_4(x) &= \psi_2(x_1) \psi_0(x_2) = \frac{1}{2} \left(3 x_1^2 - 1 \right)\\ \Psi_5(x) &= \psi_0(x_1) \psi_2(x_2) = \frac{1}{2} \left(3 x_2^2 - 1 \right) \, . \end{align*}

(12.48)

The $\psi_1(x_1)\psi_2(x_2)$ term, for example, does not appear in this list because it is a third-order polynomial, and we truncated the series after the second-order terms. We should expect this number of basis functions because Equation 12.37 with $d = 2$ and $o = 2$ yields $n=6$ .

Determine Coefficients¶

Now that we have selected an orthogonal polynomial basis, $\psi_i(x)$ , we need to determine the coefficients $\alpha_i$ in Equation 12.36. We discuss two approaches for determining the coefficients. The first approach is quadrature, which is also known as spectral projection. The second is with regression, which is also known as stochastic collocation.

Let us start with the quadrature approach. Beginning with the polynomial approximation

f(x) = \sum_i \alpha_i \psi_i(x) \, ,

(12.49)

we take the inner product of both sides with respect to $\psi_j$ ,

\langle f(x), \psi_j \rangle = \sum_i \alpha_i \langle \psi_i, \psi_j \rangle \, .

(12.50)

Using the orthogonality property of the basis functions (Equation 12.38), all the terms in the summation are zero except for

\langle f(x), \psi_i \rangle = \alpha_i \langle \psi_i^2 \rangle \, .

(12.51)

Thus, we can find each coefficient by

\alpha_i = \frac{1}{\langle \psi_i^2 \rangle} \int f(x) \psi_i(x) p(x) \mathrm{d} x \, ,

(12.50)

where we replaced the inner product with the definition given by Equation 12.17.

As expected, the zeroth coefficient corresponds to the definition of the mean,

\alpha_0 = \int f(x) p(x) \mathrm{d} x \, .

(12.51)

These coefficients can be obtained through multidimensional quadrature (see Section 12.3.2) or Monte Carlo simulation (Section 12.3.3), which means that this approach inherits the same limitations of the chosen quadrature approach. However, the process can be more efficient if the selected basis functions are good approximations of the distributions. These integrals are usually evaluated using Gaussian quadrature (e.g., Gauss–Hermite quadrature if $p(x)$ is a normal distribution).

Suppose all we are interested in is the mean (Equation 12.41 and Equation 12.51). In that case, the polynomial chaos approach amounts to just Gaussian quadrature. However, if we want to compute other statistical properties or produce an output PDF, the additional effort of obtaining the higher-order coefficients produces a polynomial approximation of $f(x)$ that we can then sample to predict other quantities of interest.

It may appear that to estimate $f(x)$ (Equation 12.36), we need to know $f(x)$ (Equation 12.50). The distinction is that we just need to be able to evaluate $f(x)$ at some predefined quadrature points, which in turn gives a polynomial approximation for any $x$ .

The second approach to determining the coefficients is regression. Equation 12.36 is linear, so we can estimate the coefficients using least squares (although an underdetermined system with regularization can be used as well). If we evaluate the function $m$ times, where $x^{(i)}$ is the $i{\text{th}}$ sample, the resulting linear system is as follows:

\begin{bmatrix} \psi_0\left(x^{(1)}\right) & \ldots & \psi_{n}\left(x^{(1)}\right) \\ \vdots & & \vdots \\ \psi_0\left(x^{(m)}\right) & \ldots & \psi_{n}\left(x^{(m)}\right) \\ \end{bmatrix} \begin{bmatrix} \alpha_0\\ \vdots\\ \alpha_{n} \end{bmatrix} = \begin{bmatrix} f\left(x^{(1)}\right)\\ \vdots\\ f\left(x^{(m)}\right) \end{bmatrix} \, .

(12.54)

As a rule of thumb, the number of sample points $m$ should be at least twice as large as the number of unknowns, $n+1$ . The sampling points, also known as the collocation points, typically correspond to the nodes in the corresponding quadrature strategy or utilize random sequences.^[10]

Example 12.10 (Forward propagation with polynomial chaos)

Consider the following objective function:

f(x) = 3 + \cos(3x_1) + \exp(-2x_2) \, ,

where the current iteration is at $x = [1, 1]$ , and we assume that both design variables are normally distributed with the following standard deviations: $\sigma = [0.06, 0.2]$ .

We approximate the function with fourth-order Hermite polynomials. Using Equation 12.37, we see that there are 15 basis functions from the various combinations of $H_i H_j$ :

\begin{aligned} \Psi_{0} &= H_0(x_1) H_0(x_2)\\ \Psi_{1} &= H_0(x_1) H_1(x_2) = x_2\\ \Psi_{2} &= H_0(x_1) H_2(x_2) = x_2^2 - 1\\ \Psi_{3} &= H_0(x_1) H_3(x_2) = x_2^3 - 3x_2\\ \Psi_{4} &= H_0(x_1) H_4(x_2) = x_2^4 - 6x_2^2 + 3\\ \Psi_{5} &= H_1(x_1) H_0(x_2) = x_1\\ \Psi_{6} &= H_1(x_1) H_1(x_2) = x_1x_2\\ \Psi_{7} &= H_1(x_1) H_2(x_2) = x_1x_2^2 - x_1\\ \Psi_{8} &= H_1(x_1) H_3(x_2) = x_1x_2^3 - 3x_1x_2\\ \Psi_{9} &= H_2(x_1) H_0(x_2) = x_1^2 - 1\\ \Psi_{10} &= H_2(x_1) H_1(x_2) = x_1^2x_2 - x_2\\ \Psi_{11} &= H_2(x_1) H_2(x_2) = x_1^2x_2^2 - x_1^2 - x_2^2 + 1\\ \Psi_{12} &= H_3(x_1) H_0(x_2) = x_1^3 - 3x_1\\ \Psi_{13} &= H_3(x_1) H_1(x_2) = x_1^3x_2 - 3x_1x_2\\ \Psi_{14} &= H_4(x_1) H_0(x_2) = x_1^4 - 6x_1^2 + 3 \, . \end{aligned}

The integrals for the basis functions (Hermite polynomials) have analytic solutions:

\langle \Psi_k^2 \rangle = \langle (H_m H_n)^2 \rangle = m! n! \, .

We now compute the following double integrals to obtain the coefficients using Gaussian quadrature:

\alpha_k = \frac{1}{\langle \Psi_k^2 \rangle}\int_{-\infty}^\infty \int_{-\infty}^\infty f(x) \Psi_k(x) p(x) \mathrm{d} x_1 \mathrm{d} x_2

We must be careful with variable definitions because the inputs are not standard normal distributions. The function $f$ is defined over the unnormalized variable $x$ , whereas our basis functions are defined over a standard normal distribution: $y = (x - \mu)/\sigma$ . The probability distribution in this case is a bivariate, uncorrelated, normal distribution:

\begin{aligned} \alpha_k = &\frac{1}{\langle \Psi_k^2 \rangle} \int_{-\infty}^\infty \int_{-\infty}^\infty f(x) \Psi_k\left(\frac{x - \mu}{\sigma}\right) \times \\ &\frac{1}{2\pi\sigma_1\sigma_2} \exp\left( -\left(\frac{x_1 - \mu_1}{\sqrt{2}\sigma_1}\right)^2\right) \exp\left( -\left(\frac{x_2 - \mu_2}{\sqrt{2}\sigma_2}\right)^2\right) \mathrm{d} x_1 \mathrm{d} x_2 \, . \end{aligned}

To put this in the proper form for Gauss–Hermite quadrature, we use the change of variable $z = (x - \mu)/(\sqrt{2} \sigma)$ , as follows:

\alpha_k = \frac{1}{\langle \Psi_k^2 \rangle} \frac{1}{\pi}\int_{-\infty}^\infty \int_{-\infty}^\infty f\left(\sqrt{2}\sigma z + \mu \right) \Psi_k\left(\sqrt{2} z \right) e^{ -z_1^2} e^{ -z_2^2} \mathrm{d} z_1 \mathrm{d} z_2 \, .

Applying Gauss–Hermite quadrature, the integral is approximated by

\alpha_k \approx \frac{1}{\pi \langle \Psi_k^2 \rangle} \sum_{i = 1}^{n_i} \sum_{j = 1}^{n_j} w_i w_j f(X_{ij}) \Psi_k\left(\sqrt{2} Z_{ij}\right),

where $n_i$ and $n_j$ determine the number of quadrature nodes we choose to include, and $X_{ij}$ is the tensor product

X = \left(\sqrt{2} \sigma_1 z_1 + \mu_1\right) \otimes \left(\sqrt{2} \sigma_2 z_2 + \mu_2\right) \, ,

and $Z = z_1 \otimes z_2$ .

In this case, we choose a full tensor product mesh of the fifth order in both dimensions. The nodes and weights are given by

\begin{align*} z_1 &= z_2 = [-2.02018, -0.95857, 0.0, 0.95857, 2.02018] \\ w_1 &= w_2 = [0.01995, 0.39362, 0.94531, 0.39362, 0.01995] \end{align*}

(12.55)

and visualized as a tensor product of evaluation points in Figure 12.17.

Figure 12.17:Evaluation nodes with area proportional to weight.

The nonzero coefficients (within a tolerance of approximately 10^-4) are as follows:

\begin{aligned} \alpha_{0} &= 2.1725 \\ \alpha_{1} &= -0.0586 \\ \alpha_{2} &= 0.0117 \\ \alpha_{3} &= -0.00156 \\ \alpha_{5} &= -0.0250 \\ \alpha_{9} &= 0.01578 \, . \end{aligned}

We can now easily compute the mean and standard deviation as

\begin{align*} \mu_f &= \alpha_0 = 2.1725\\ \sigma_f &= \sqrt{\sum_{i=1}^n \alpha_i^2 \langle \Psi_i^2 \rangle } = 0.06966 \, . \end{align*}

(12.56)

In this case, we are able to accurately estimate the mean and standard deviation with only 25 function evaluations. In contrast, applying Monte Carlo to this same problem, with LHS, requires about 10,000 function calls to estimate the mean and over 100,000 function calls to estimate the standard deviation (with less accuracy).

Although direct quadrature would work equally well if all we wanted was the mean and standard deviation, polynomial chaos gives us a polynomial approximation of our function near $\mu_x$ :

\tilde{f}(x) = \sum_i \alpha_i \Psi_i(x).

This fourth-order polynomial is compared to the original function in Figure 12.18, where the dot represents the mean of $x$ .

$Original function on left, polynomial expansion about \mu_x on right.$

Figure 12.18:Original function on left, polynomial expansion about $\mu_x$ on right.

Figure 12.19:Output histogram produced by sampling the polynomial expansion.

The primary benefit of this new function is that it is very inexpensive to evaluate (and the original function is often expensive), so we can use sampling procedures to compute other statistics, such as percentiles or reliability levels, or simply to visualize the output PDF, as shown in Figure 12.19.

12.4 Summary¶

Engineering problems are subject to variation under uncertainty. OUU deals with optimization problems where the design variables or other parameters have uncertain variability. Robust design optimization seeks designs that are less sensitive to inherent variability in the objective function. Common OUU objectives include minimizing the mean or standard deviation or performing multiobjective trade-offs between the mean performance and standard deviation. Reliable design optimization seeks designs with a reduced probability of failure, considering the variability in the constraint values. To quantify robustness and reliability, we need a forward-propagation procedure that propagates the probability distributions of the inputs (either design variables or parameters that are fixed during optimization) to the statistics or probability distributions of the outputs (objective and constraint functions). Four classes of forward propagation methods were discussed in this chapter.^[11]Perturbation methods use a Taylor series expansion of the output functions to estimate the mean and variance. These methods can be efficient for a range of problem sizes, especially if accurate derivatives are available. Their main weaknesses are that they require derivatives (and hence second derivatives when using a gradient-based optimization), only work well with symmetric input probability distributions, and only provide the mean and variance (for first-order methods).

Direct quadrature uses numerical quadrature to evaluate the summary statistics. This process is straightforward and effective. Its primary weakness is that it is limited to low-dimensional problems (number of random inputs). Sparse grids enable these methods to handle a higher number of dimensions, but the scaling is still lacking.

Monte Carlo methods approximate the summary statistics and output distributions using random sampling and the law of large numbers. These methods are straightforward to use and are independent of the problem dimension. Their major weakness is that they are inefficient. However, because the alternatives are intractable for a large number of random inputs, Monte Carlo is an appropriate choice for many high-dimensional problems.

Polynomial chaos represents uncertain variables as a sum of orthogonal basis functions. This method is often a more efficient way to characterize both statistical moments and output distributions. However, the methodology is usually limited to a small number of dimensions because the number of required basis functions grows exponentially.

Problems¶

Exercise 1

Answer true or false and justify your answer.

The greater the reliability, the less likely the design is to have a worse objective function value.
Reliability can be handled in a deterministic way using safety factors, which ensure that the optimum has some margin before the original constraint is violated.
Forward propagation computes the PDFs of the outputs and inputs for a given numerical model.
The computational cost of direct quadrature scales exponentially with the number of random variables, whereas the cost of Monte Carlo is independent of the number of random variables.
Monte Carlo methods approximate PDFs using random sampling and converges slowly.
The first-order perturbation method computes the PDFs using local Taylor series expansions.
Because the first-order perturbation method requires first-order derivatives to compute the uncertainty metrics, OUU using the first-order perturbation method requires second-order derivatives.
Polynomial chaos is a forward-propagation technique that uses polynomial approximations with random coefficients to model the input uncertainties.
The number of basis functions required by polynomial chaos grows exponentially with the number of uncertain input variables.

Exercise 2

Consider the following problem:

\begin{align*} \text{minimize} &\quad f = x_1^2 + x_2^4 + x_2 \exp(x_3)\\ \text{subject to} &\quad x_1^2 + x_2^2 + x_3^3 \ge 10\\ &\quad x_1 x_2 + x_2 x_3 \ge 5. \end{align*}

(12.57)

Assume that all design variables are random variables with the following standard deviations: $\sigma_{x_1} = 0.1, \sigma_{x_2} = 0.2, \sigma_{x_3} = 0.05$ . Use the iterative reliability-based optimization procedure to find a reliable optimum with an overall reliability of 99.9 percent. How much did the objective decrease relative to the deterministic optimum? Check your reliability level with Monte Carlo simulation.

Exercise 7

Robust optimization of a wind farm. We want to find the optimal turbine layout for a wind farm to minimize the cost of energy (COE). We will consider a very simplified wind farm with only three wind turbines. The first turbine will be fixed at (0, 0), and the $x$ -positions of the back two turbines will be fixed with 4-diameter spacing between them. The only thing we can change is the $y$ -position of the two back turbines, as shown in Figure 12.20 (all dimensions in this problem are in terms of rotor diameters). In other words, we just have two design variables: $y_2$ and $y_3$ .

Figure 12.20:Wind farm layout.

We further simplify by assuming the wind always comes from the west, as shown in the figure, and is always at a constant speed. The wake model has a few parameters that define things like its spread angle and decay rate. We will refer to these parameters as $\alpha$ , $\beta$ , and $\delta$ (knowing exactly what each parameter corresponds to is not important for our purposes). The supplementary resources repository contains code for this problem.

Run the optimization deterministically, assuming that the three wake parameters are $\alpha=0.1$ , $\beta=9$ , and $\delta=5$ . Because there are several possible similar solutions, we add the following constraints: $y_i \ge 0$ (bound) and $y_3 \ge y_2$ (linear). Do not use $[0, 0]$ as the starting point for the optimization because that occurs right at a flat spot in the wake (a fixed point), so you might not make any progress. Report the optimal spacing that you find.
Now assume that the wake parameters are uncertain variables under some probability distribution. Specifically, we have the following information for the three parameters:
- $\alpha$ is governed by a Weibull distribution with a scale parameter of 0.1 and a shape parameter of 1.
- $\beta$ is given by a normal distribution with a mean and standard deviation of $\mu$ =9, $\sigma$ =1.
- $\delta$ is given by a normal distribution with a mean and standard deviation of $\mu$ =5, $\sigma$ =0.4.
Note that the mean for all of these distributions corresponds to the deterministic value we used previously.
Using a Monte Carlo method, run an OUU minimizing the 95th percentile for COE.
Once you have completed both optimizations, perform a cross analysis by filling out the four numbers in the table that follows.
Deterministic COE 95th percentile COE
Deterministic layout [ ] [ ]
OUU layout [ ] [ ]
Take the two optimal designs that you found, and then compare each on the two objectives (deterministic and 95th percentile). The first row corresponds to the performance of the optimal deterministic layout. Evaluate the performance of this layout using the deterministic value for COE and the 95th percentile that accounts for uncertainty. Repeat for the optimal solution for the OUU case. Discuss your findings.

Footnotes¶

Although we maintain a distinction in this book, some of the literature includes both of these concepts under the umbrella of “robust optimization”.
↩
For more details on this type of problem and on the aerodynamic shape optimization framework that produced these results, see 1
↩
See other wind farm OUU problems with coupled farm and turbine optimization,2 multiobjective trade-offs in mean and variance,3 and more involved uncertainty quantification techniques discussed later in this chapter.4
↩
Instead of using expected power directly, wind turbine designers use annual energy production, which is the expected power multiplied by utilization time.
↩
Even characterizing input uncertainty might not be straightforward, but for forward propagation, we assume this information is provided.
↩
Higher-order Taylor series can also be used,5 but they are less common because of their increased complexity.
↩
This is the same issue as with the full factorial sampling used to construct surrogate models in Section 10.2.
↩
Polynomial chaos is not chaotic and does not actually need polynomials. The name polynomial chaos came about because it was initially derived for use in a physical theory of chaos.12
↩
Other polynomials can be used, but these polynomials are optimal because they yield exponential convergence.
↩
There are software packages that facilitate the use of polynomial chaos methods.14 $^,$ 15
↩
This list is not exhaustive. For example, the methods discussed in this chapter are nonintrusive. Intrusive polynomial chaos uses expansions inside governing equations. Like intrusive methods for derivative computation (Chapter 6), intrusive methods for forward propagation require more implementation effort but are more accurate and efficient.
↩

References¶

Martins, J. R. R. A. (2020). Perspectives on aerodynamic design optimization. In Proceedings of the AIAA SciTech Forum. American Institute of Aeronautics. 10.2514/6.2020-0043
Stanley, A. P. J., & Ning, A. (2019). Coupled wind turbine design and layout optimization with non-homogeneous wind turbines. Wind Energy Science, 4(1), 99–114. 10.5194/wes-4-99-2019
Gagakuma, B., Stanley, A. P. J., & Ning, A. (2021). Reducing wind farm power variance from wind direction using wind farm layout optimization. Wind Engineering. 10.1177/0309524X20988288
Padrón, A. S., Thomas, J., Stanley, A. P. J., Alonso, J. J., & Ning, A. (2019). Polynomial chaos to efficiently compute the annual energy production in wind farm layout optimization. Wind Energy Science, 4, 211–231. 10.5194/wes-4-211-2019
Cacuci, D. (2003). Sensitivity & Uncertainty Analysis (Vol. 1). Chapman. 10.1201/9780203498798
Parkinson, A., Sorensen, C., & Pourhassan, N. (1993). A general approach for robust optimal design. Journal of Mechanical Design, 115(1), 74. 10.1115/1.2919328
Golub, G. H., & Welsch, J. H. (1969). Calculation of Gauss quadrature rules. Mathematics of Computation, 23(106), 221–230. 10.1090/S0025-5718-69-99647-1
Wilhelmsen, D. R. (1978). Optimal quadrature for periodic analytic functions. SIAM Journal on Numerical Analysis, 15(2), 291–296. 10.1137/0715020
Trefethen, L. N., & Weideman, J. A. C. (2014). The exponentially convergent trapezoidal rule. SIAM Review, 56(3), 385–458. 10.1137/130932132
Johnson, S. G. (2010). Notes on the convergence of trapezoidal-rule quadrature. http://math.mit.edu/~stevenj/trapezoidal.pdf
Smolyak, S. A. (1963). Quadrature and interpolation formulas for tensor products of certain classes of functions. In Proceedings of the USSR Academy of Sciences (Vol. 148, pp. 1042–1045). 10.3103/S1066369X10030084
Wiener, N. (1938). The homogeneous chaos. American Journal of Mathematics, 60(4), 897. 10.2307/2371268
Eldred, M., Webster, C., & Constantine, P. (2008). Evaluation of non-intrusive approaches for Wiener–Askey generalized polynomial chaos. In Proceedings of the 49th AIAA Structures, Structural Dynamics, and Materials Conference. American Institute of Aeronautics. 10.2514/6.2008-1892
Adams, B. M., Bohnhoff, W. J., Dalbey, K. R., Ebeida, M. S., Eddy, J. P., Eldred, M. S., Hooper, R. W., Hough, P. D., Hu, K. T., Jakeman, J. D., Khalil, M., Maupin, K. A., Monschke, J. A., Ridgway, E. M., Rushdi, A. A., Seidl, D. T., Stephens, J. A., Swiler, L. P., & Winokur, J. G. (2021). Dakota, a multilevel parallel object-oriented framework for design optimization, parameter estimation, uncertainty quantification, and sensitivity analysis: Version 6.14 user’s manual [Sandia Technical Report]. Sandia National Laboratories. https://dakota.sandia.gov/content/manuals
Feinberg, J., & Langtangen, H. P. (2015). Chaospy: An open source tool for designing methods of uncertainty quantification. Journal of Computational Science, 11, 46–57. 10.1016/j.jocs.2015.08.008