# Background

## Multidisciplinary design optimization

The LSDO lab's core research falls within the field of multidisciplinary design optimization (MDO). MDO deals with the use of numerical optimization algorithms to solve engineering design problems. These problems often feature strong interactions between multiple engineering disciplines, a property that becomes an important consideration in the development of MDO methods and algorithms. The field of MDO is partially motivated by the strong interdisciplinary interactions that are common in MDO problems such as this satellite design problem. [Hwang et al., JSR, 2014]

## Optimization

When performing design optimization, the starting point is a standalone computational model that predicts the performance or behavior of the engineering vehicle or system (model outputs) as a function of parameters that quantify the design and the operating conditions (model inputs). Based on this computational model, we formulate an optimization problem, where:

• the objective function is a quantity we wish to maximize or minimize (one of the model outputs),

• the design variables are the design parameters of which we wish to find the optimal values (a subset of the model inputs)

• the constraint functions are quantities on which we wish to place equality or inequality conditions (a subset of the model outputs).

Typically, optimization is performed using this standalone computational model and a standalone optimization algorithm, often called the optimizer. The optimizer repeatedly calls the model at different values of the design variables, and the model reports back the objective and constraint function values. The optimizer iterates until it finds the set of design variable values that, in some way, maximize or minimize the objective while satisfying the constraints. Typical optimization paradigm: the optimizer iteratively calls the model at different design variable values until the optimal objective function value is found while satisfying the constraint functions.

## Large-scale optimization

In the LSDO lab, we focus mostly on piecewise-differentiable, nonlinear optimization problems. We are primarily motivated by large-scale optimization problems, which we define for our purposes as those of high dimensionality, i.e., problems with hundreds, thousands, or more design variables.

As in many other settings in mathematics, the curse of dimensionality is a fundamental concern in optimization. To see this, we can consider a brute force optimization approach, where we discretize and sample each design variable at, say, k uniformly spaced values between its lower and upper bounds, and then pick the design with the best objective value. If we have n design variables, this would require evaluating the model n^k times, which for just k=3 and n=100, would already mean 1,000,000 model evaluations. Large-scale optimization is only feasible with methods that scale with the number of design variables much better than this.

There are two types of optimization algorithms: gradient-free and gradient-based. As the names imply, they are distinguished by whether they use derivatives of the model. Derivatives provide valuable information to the optimizer that aids significantly in accelerating and improving the accuracy of convergence to the optimal solution. Therefore, gradient-based optimizers show much better scaling to problems of high dimensionality compared to gradient-free optimizers, typically with orders of magnitude difference in the required number of model evaluations, i.e., the total optimization solution time. However, gradient-based optimizers are more susceptible to local optima, impose additional constraints on the model's differentiability, and most significantly, require significant additional effort to accurately and efficiently compute model derivatives. This plot of optimization time versus the dimensionality shows the effectiveness of gradient-based optimization with the adjoint method. [Hwang, PhD Thesis, 2015]

## Efficient derivative computation

There are decades of research on methods for computing derivatives accurately and efficiently because of their importance. Derivatives are used not just for optimization, but for other applications such as error estimation, mesh adaptation, surrogate modeling, and uncertainty quantification. The simplest method of computing derivatives is the family of finite-difference methods (FD), where each derivative is approximated by evaluating the model at additional, perturbed points. In the context of large-scale optimization, finite-difference methods are inefficient because one or more model evaluations are required for each design variable. They are also inaccurate because of the truncation error and rounding error, which favor pushing the perturbation step size in opposite directions, making it impossible to eliminate both types of error simultaneously.

If we permit access to and modification of the model source code, we can consider two other methods that overcome at least the accuracy issue. The complex-step method uses a complex-number rather than a real-number perturbation, which avoids the rounding error, enabling a much smaller step size with which truncation error is negligible. Algorithmic differentiation (AD) uses software to take the model's original source code and based on that, automatically writes new code that computes the derivatives symbolically. AD produces derivatives that are exact to numerical precision, and it can also compute the derivatives at a cost independent of the number of design variables. However, its computation time is higher than it could be in models that contain iterative solvers.

The solution is the adjoint method. As with AD, the adjoint method can compute derivatives accurate to numerical precision, and the evaluation time is independent of the number of design variables. With the adjoint method, however, the full gradient can be computed in less than the model evaluation time. The adjoint method represents the state of the art in large-scale optimization.

## A new architecture for large-scale MDO

The adjoint method is a very powerful method that is the key to enabling large-scale optimization. However, it has two drawbacks that limit more widespread use. First, it is not applicable to all models; the adjoint method can only be used in models that have coupling, either through at least one discipline containing a system of equations or through one or more feedback loops between disciplines. If no coupling is present, the chain rule is the appropriate method to compute the derivatives analytically, assembling the partial derivatives from each discipline. Second, the adjoint method can be difficult and time-consuming to implement in multidisciplinary settings. When a change is made to one discipline in a multidisciplinary model, the software implementation of the other disciplines in the model is often affected, required additional changes.

Prof. Hwang derived an equation that mathematically unifies the adjoint method with the chain rule, the equations for AD, and many other advanced derivative computation methods. He developed this 'unified derivative equation' (UDE) from the inverse function theorem, and the adjoint method, the chain rule, etc. can all be derived from this one unifying matrix equation. He then used the UDE as the basis for a computational framework, a software environment for constructing multidisciplinary models and performing MDO. This architecture is called MAUD (modular analysis and unified derivatives). Through a collaboration with satellite experts, he applied it to a satellite design problem. Based on these results, an early version of MAUD was adopted into NASA's OpenMDAO software framework.

After the initial integration of MAUD into OpenMDAO, Prof. Hwang made several improvements to the MAUD architecture targeted towards high-fidelity modeling and high-performance computing. He developed a parallel hierarchical solution architecture with matrix-free linear algebra to ensure high-fidelity models (such as CFD or FEA) developed within this paradigm do not incur more than a negligible computational overhead due to the software framework. This architecture enables parallelism across variables and within variables; parallelism across variables is used for allocating different variables and disciplines on different processors, while parallelism within variables is used for splitting up a large array (a single variable) into multiple processors so that the memory footprint is distributed. Parallelism across variables is more commonly used and can be automated using MAUD. For example, we can write a CFD solver in the framework, and run in parallel without writing a single line of parallel code; we can assign a CFD simulation to each processor where each processor computes the flow solution at its own unique set of operating conditions (e.g., flow Mach number).

Through OpenMDAO, MAUD has been used for many MDO applications in academia and industry. The list of applications of MAUD now includes: satellite design, wind turbine design, aircraft design considering operations, aircraft wing design, system design of an electric aircraft, engine cycle design, Hyperloop design, trajectory optimization, and structural topology optimization. MAUD provides one of the major pillars for the ongoing and future work in the LSDO lab. The unifying derivative equation and the derivation of the direct (left equality) and adjoint (right equality) methods from it [Hwang and Martins, ACM TOM, 2018]