Multicondition multiobjective optimization using deep reinforcement learning
Abstract
A multicondition multiobjective optimization method that can find Pareto front over a defined condition space is developed for the first time using deep reinforcement learning. Unlike the conventional methods which perform optimization at a single condition, the present method learns the correlations between conditions and optimal solutions. The exclusive capability of the developed method is examined in the solutions of a novel modified Kursawe benchmark problem and an airfoil shape optimization problem which include nonlinear characteristics which are difficult to resolve using conventional optimization methods. Pareto front with high resolution over a defined condition space is successfully determined in each problem. Compared with multiple operations of a singlecondition optimization method for multiple conditions, the present multicondition optimization method based on deep reinforcement learning shows a greatly accelerated search of Pareto front by reducing the number of required function evaluations. An analysis of aerodynamics performance of airfoils with optimally designed shapes confirms that multicondition optimization is indispensable to avoid significant degradation of target performance for varying flow conditions.
keywords:
Multicondition multiobjective optimization, Deep reinforcement learning, Shape optimization, Fluid dynamics, Airfoil1 Introduction
Optimization is central to all decisionmaking problems including economics, business administration, and engineering Chong and Zak (2004). In particular, in applied mechanics such as structural mechanics, electromagnetism, and biomechanics, designing an optimal shape that maximizes target performance, socalled shape optimization, has been actively studied to this day Semmler et al. (2015); Chu et al. (2021); Taylor and Dirks (2012); Park et al. (2018). Likewise, in fluid mechanics, studies on shape optimization for numerous real applications have been conducted Mohammadi and Pironneau (2004). For example, studies to improve aerodynamic or hydrodynamic characteristics through shape optimization of airplanes, ships, and automobiles have been continuing Droandi and Gibertini (2015); Peri et al. (2001); Percival et al. (2001); Yun et al. (2008). In addition, there have been many efforts in designing wind turbine blades that maximize power efficiency Xudong et al. (2009) and marine propellers to reduce underwater radiated noise Bertetta et al. (2012). Since fluids have nonlinear and highdimensional characteristics, the performance of fluid machines can vary greatly depending on the shapes. Therefore, shape optimization is essential for the efficient operation of fluid machines.
In practical applications, generally, the operating conditions and target performance of fluid machines vary depending on the situation. For example, in the case of wind turbines, varying wind conditions alter the aerodynamic and structural performance of the blades Lachenal et al. (2013). Also, in the case of aircraft, the aerodynamic requirements of the wing vary according to flight situations such as cruise, departure, and landing Secanell et al. (2006). Therefore, in order to maintain the optimal state during operation, it is necessary to know the optimal shape under changing conditions and objectives and to modify its shape accordingly. In many application fields, numerous efforts have been made to maximize the target performance by changing the shape according to the situation Vasista et al. (2019). For aircraft, many studies have been conducted to improve the performance by modifying the shape using smart adaptive devices and morphing materials Diaconu et al. (2008); Barbarino et al. (2011); Ajaj et al. (2016). In addition, morphing hydrofoils and morphing composite propellers have been actively studied in marine applications Garg et al. (2015); Sacher et al. (2018); Chen et al. (2017).
Despite these efforts, ironically, related studies in the perspective of optimization methodology are insufficient. In order to remain optimal, knowing the optimal solution under changing conditions and objectives should be preceded. Optimization for various objectives is possible through conventional multiobjective (MO) optimization methods Srinivas and Deb (1994); Deb et al. (2002); Coello Coello and Lechuga (2002); Miettinen and Mäkelä (2002). However, to the best of our knowledge, there is no optimization method that can find the optimal solution considering various conditions. Therefore, in addition to MO optimization, an optimization method that can handle both various conditions and objectives is needed.
MO optimization method is to optimize multiple objectives that generally conflict with each other at a single condition. The goal of MO optimization is to find Pareto front which is a set of optimal tradeoff solutions among the objectives. However, it can be only applied to a single prescribed condition. To find optimal solutions within a condition range, it is necessary to prescribe some conditions as representatives in advance, and perform optimization separately. For example, Secanell et al. Secanell et al. (2006) performed optimization at seven prescribed flight conditions to design a morphing airfoil. In addition, Wang et al. Wang et al. (2020) conducted optimization considering three representative conditions to design a centrifugal pump. However, if optimization is performed only at some prescribed conditions, the obtained solutions can be valid only at the predetermined conditions. In addition, because optimization has to be repeated from scratch for multiple conditions, it is very inefficient to perform optimization at sufficiently many conditions. Thus, in order to overcome these limitations, a multicondition multiobjective (MCMO) optimization method that can efficiently find a set of optimal solutions (Pareto front) in a condition range is needed.
Recently, with the advancement of artificial intelligence, studies combining it with optimization are being actively conducted Yan et al. (2019); Li et al. (2020). In particular, deep reinforcement learning (DRL) is emerging as a new trend in the field of shape optimization Rabault et al. (2020). Viquerat et al. Viquerat et al. (2021b) showed the capability of DRL in shape optimization by successfully performing airfoil shape optimization at a single flow condition. Thereafter, DRL has started to be adopted for many shape optimization problems. Qin et al. Qin et al. (2021) conducted MO optimization of a cascade blade at a target flow condition using DRL. Also, Li et al. Li et al. (2021) conducted airfoil shape optimization to reduce drag using DRL, and they showed that the learned network can extract more improved shapes compared to the original shape at unlearned conditions. As these studies indicate, DRL has ample potential to be a key to the development of a MCMO optimization method. The basic concept of DRL is to find an optimal action with a given state. Thus, if the condition and objective of optimization are set as the state, it can take a role of a MCMO optimizer. Moreover, in contrast to conventional methods where each condition should be treated independently, it is expected to be more efficient by learning the correlations between conditions and optimal solutions.
In the present study, a DRLbased MCMO optimization method that can efficiently find Pareto front over a condition space is developed. Then, two MCMO optimization problems are dealt with by the developed method. The first problem is a benchmark problem to validate the method. As a benchmark problem, the Kursawe test function Kursawe (1991), which is a representative MO optimization problem, is newly extended to be suitable for MCMO optimization. Next, it is applied to airfoil shape optimization to show its applicability in practical engineering applications. The airfoil shape optimization is a representative shape optimization problem involving fluid dynamics, where nonlinearity and highdimensionality are combined. Despite these difficulties, it has been actively studied due to its direct applicability to numerous engineering fields Zhang et al. (2021); Wang et al. (2019); Gillebaart and De Breuker (2016). Finally, further analysis is conducted in each problem to identify the exclusive capability of the proposed method.
2 Background
2.1 Multiobjective optimization
2.1.1 Problem description
A MO optimization problem with the number of objective functions can be defined as follows:
(1) 
where is an element in the decision space , : consists of realvalued objective functions, and is the objective space.
In this problem, generally, no single solution can optimize these objectives simultaneously as they conflict with each other. Instead, a set of optimal tradeoff solutions among different objectives exists by the concepts of Pareto dominance and Pareto optimality, which are defined as follows:

Pareto dominance: is said to Pareto dominate , denoted by , if and only if , and for at least one index .

Pareto optimality: A solution is said to be Pareto optimal if and only if such that .
The goal of a MO optimization problem is to find the Pareto optimal set and the corresponding Pareto front which is defined as Pareto optimal set.
2.1.2 Weighted Chebyshev method
The weighted Chebyshev method is one of decompositionbased methods for solving MO optimization problems. It scalarizes a MO optimization problem into multiple singleobjective (SO) optimization problems by introducing a weight vector , and the Chebyshev scalarizing function . determines the weight between objectives and is the scalarized objective of each SO optimization problem. Then, the original MO optimization problem can be solved by performing a number of scalarized SO optimization processes with different . The scalarized SO optimization problem can be written as follows:
(2) 
is an utopia value which is defined as , where is a relatively small value.
Unlike other decompositionbased methods, the weighted Chebyshev method guarantees that all Pareto optimal solutions can be obtained for both convex and nonconvex problems Miettinen (2012). Because of this advantage, it has been widely used in literature Zhang and Li (2007); Tan et al. (2013b) and was also successfully used with DRL Van Moffaert et al. (Conference Proceedings). One of the difficulties in adopting the weighted Chebyshev method is that the utopia point, , has to be known before optimization, which requires SO optimization for each objective in advance. In the present study, the difficulty is overcome by integrating these overall processes into a single process, which will be further discussed in Section 4.1.1.
2.2 Deep reinforcement learning based optimization
2.2.1 Deep reinforcement learning
Reinforcement learning is a process of learning a policy to determine an optimal action with a given state Sutton and Barto (2018). At each discrete step n, it determines an action according to its current policy = . Then, through the execution of the action, a reward , according to the decision and the next state , are given. As the step progresses, data is accumulated and learning proceeds. The goal of learning is to find the optimal policy that maximizes the value function Bellman (1966), which is defined as the expected sum of an immediate reward, , and discounted future rewards as follows:
(3) 
where is a discount factor that determines the weight between shortterm and longterm future rewards. This process is repeated until the terminal state, and it is called one episode.
In particular, if deep learning is adopted for learning, it is called DRL. For example, deep neural networks can be used as a policy itself or for predicting the value function. By incorporating the deep neural network, DRL is known to be able to handle complex and highdimensional problems Mnih et al. (2015, 2013). Especially, DRL has shown its outstanding ability in optimal control and optimization Rabault et al. (2020); Buşoniu et al. (2018); Garnier et al. (2021).
2.2.2 Singlestep deep reinforcement learning based optimization
Singlestep DRL based optimization is very recently introduced by Viquerat et al. Viquerat et al. (2021b) where one learning episode consists of a single step; if an action is determined with a given state, a reward is given accordingly and the episode ends without the next state. Since the future rewards in Eq. (3) do not exist, the discount factor, , does not have to be defined and learning proceeds to maximize only the immediate reward. As a result, the optimal action that maximizes the reward itself can be directly determined. Therefore, if the reward is set as the objective function to be optimized, the optimal solution that maximizes the objective function can be directly obtained. By virtue of this characteristic, singlestep DRL is known to be suitable as an optimization method Viquerat et al. (2021a).
3 Problem description of multicondition multiobjective optimization
A MCMO optimization problem is extended from a MO optimization problem to include not only the decision variable , but also the condition variable . The problem with the number of objective functions is defined as follows:
(4) 
where is an element in the decision space , is an element in the condition space , : consists of realvalued objective functions, and is the objective space.
Likewise, the concepts of Pareto dominance and Pareto optimality are extended to cover the condition variable, , which are defined as follows:

Pareto dominance: is said to Pareto dominate at the condition variable , denoted by , if and only if , and for at least one index .

Pareto optimality: A solution is said to be Pareto optimal at the condition variable , if and only if such that .
As in the MO optimization problem, solving a MCMO optimization problem is to find the Pareto optimal set and the corresponding Pareto front which is defined as Pareto optimal set. If is fixed, the MCMO optimization problem is reduced to a MO optimization problem.
4 Method
4.1 Deep reinforcement learning algorithm for multicondition multiobjective optimization
4.1.1 State, action, and reward
In MCMO optimization, the optimal solution varies depending on the condition and objective. Therefore, the state of DRL is set to include the condition and objective, which is defined as follows:
(5) 
where is a condition variable, is a weight vector, and is a utopia point at that condition. In the present study, is adaptively updated during optimization to a slightly lower value than the minimum value of each objective function. Since the Chebyshev scalarizing function, , differs depending on as in Eq. (2), the changing utopia information is included in the state for stable learning.
The action of DRL determines , element in the decision space, according to its policy, which is defined as follows:
(6) 
In addition, all variables in the state and action are normalized to an absolute magnitude around for scaling.
Lastly, the reward of DRL is a quantitative evaluation of an action, which is defined as follows:
(7) 
where is the value of the objective function obtained by executing an action. Note that the minus sign is added because the aim of optimization is to find minimizing the Chebyshev scalarizing function.
4.1.2 Data reproduction method
In the present study, a data reproduction method is applied to enlarge the number and diversity of data by exploiting the nature of the Chebyshev scalarizing function, . As in Eq. (7), the reward of DRL is a function of , , and . Since is independent of and , different rewards can be determined for arbitrary , once the objective functions are evaluated. Therefore, it is possible to reproduce an original data of pair by changing with a single function evaluation. It is expected that this method would accelerate learning and, thus, be essential for optimization problems where the function evaluation is costly. In the present study, at each episode, 100 data are reproduced from a single original data by changing in a uniform distribution.
4.1.3 Learning procedure
The learning procedure of the present study is summarized in Algorithm 1. For the DRL algorithm, the actorcritic algorithm Konda and Tsitsiklis (2000) is used, which is one of the representative DRL algorithms. In the algorithm, two types of neural networks are introduced. One is an actor network, policy itself, which determines an action in continuous space. The other is a critic network which predicts the value function depending on the state and action. As learning progresses, the critic network predicts the value function more and more accurately and, based on this, the probability that the actor network selects the optimal action increases.
Both networks are set as fully connected networks with four hidden layers of , , , and neurons and the Leaky ReLU activation function Maas et al. (2013) is used for the hidden layers in both networks. At the output layer of the actor network, the Tanh activation function is added so that the action values range from to . The learning rates are equally set to and Adam optimizer Kingma and Ba (2017) is used for updating the network parameters. Especially, the actor network is updated every two learning iterations () for stable learning Fujimoto et al. (2018). The minibatch size, , is set to and the learning amount per one episode, , is set to which is the same as the number of reproduced data per one original data. The standard deviation of the exploration noise, , is set to in the initial warmup episodes and afterward. The use of the cosine function enables both exploration for avoiding local minima and exploitation for accurately finding optimal solutions.
4.2 Selection of Pareto front
Pareto dominance in a MCMO optimization problem is defined at each condition variable . However, as in the obtained data are scattered over , there is no exactly the same where the dominance can be judged. Therefore, a concept of decomposition of the condition space is introduced to derive approximate solutions for a MCMO optimization problem. is decomposed into spaces as follows:
(8) 
In each , is assumed to be the same, and Pareto front is selected from the data.
Note that the decomposition has no effect on the optimization process and can be modified during or after the optimization process. Therefore, can be freely adjusted according to the desired quality. For example, the denser the decomposition, the higher the resolution of the selected Pareto front, but the number of episodes required for convergence increases. In the present study, is decomposed into 100 spaces of the same size for selecting Pareto front.
4.3 Convergence judgment
In order to judge the convergence of an optimization process, the hypervolume indicator (HV) Zitzler (1999) is adopted. It refers to the volume in the objective space between Pareto front and a fixed reference point as shown in Fig. 1. Due to its monotonic characteristic, the larger the HV, the more accurate the Pareto front. Therefore, it is one of the most frequently used indicators for convergence and capability assessment of MO optimization methods. A general guideline for determining the reference point is to use a slightly worse point than the nadir point consisting of the worst objectives values over the Pareto front Auger et al. (2012).
In MCMO optimization, as described in Section 4.2, the condition space, , is decomposed and Pareto front is selected respectively in each decomposed space. Likewise, the HV is defined in each decomposed space. Therefore, in this study, the convergence of an optimization process is judged by HV, the average HV over all the decomposed spaces.
5 Results and discussion
In this section, the proposed DRLbased MCMO optimization method is applied to two problems, and the results are analyzed. The first problem is a newly modified Kursawe test function to a MCMO optimization problem. The second problem is airfoil shape optimization which is a representative shape optimization problem involving fluid dynamics.
5.1 Modified Kursawe test function
5.1.1 Problem setup
The modified Kursawe problem for MCMO optimization is defined as follows:
(9) 
For the modification, is introduced for rotational transformation. If is set to , it reduces to the original Kursawe problem which is a MO optimization problem. Extending the problem through rotational transformation has two advantages. First, the characteristics of the original problem can be preserved. As the original Kursawe problem has discontinuous and nonconvex Pareto front, it has been actively adopted to evaluate the capability of MO optimization methods Lim et al. (2015); Tan et al. (2013a); Leung et al. (2014); Naranjani et al. (2017). Thus, with the preserved characteristics, the modified Kursawe problem can be a satisfactory benchmark problem for validating the developed MCMO optimization method. Second, real solutions can be readily obtained, which is crucial in designing a benchmark problem. The boundary shape of the feasible region in the objective space remains unchanged from the original Kursawe problem by rotational transformation. Therefore, the real Pareto front corresponding to can be easily obtained by judging dominance from the rotated boundary.
5.1.2 Optimization results
The modified Kursawe problem is solved as described in Algorithm 1. Fig. 2 shows the optimization process. As shown in Fig. 1(a), in the early episodes, data are widely scattered as the network is not developed enough. However, as the episode progresses, data are accumulated, and the network learns to find an optimal action with a given state. As a result, better solutions are obtained for newly given conditions and objectives, increasing the resolution of the Pareto front. Also, the clustered data near the Pareto front reinforce the learning again, forming a positive feedback loop. Finally, the Pareto front and the network converge at episode , which can be also seen in terms of HV as shown in Fig. 1(b).
Fig. 3 shows the optimization results at the converged episode. Through the optimization, highresolution Pareto front of solutions over the whole condition space is obtained. As shown in Fig. 2(a), it shows good agreement with the real Pareto front including highly nonlinear parts near where the shape of the Pareto front drastically changes along . It shows the exclusive ability of the proposed MCMO optimization method. If several representative conditions are predetermined and optimization is performed at each condition, it is difficult to capture the nonlinear parts. On the other hand, the developed MCMO optimization is performed over the entire condition space, so that highresolution Pareto front can be found. Fig. 2(b) shows the optimization results in five decomposed condition spaces. Note that since the condition space, , is equally decomposed into spaces in the present study, each figure in Fig. 2(b) shows one decomposed condition space. Even in those decomposed spaces, the solutions match well with the real Pareto front.
5.1.3 Effectiveness of multicondition optimization
In this section, a computational experiment based on the modified Kursawe problem is set to analyze how effective multicondition (MC) optimization is compared to singlecondition (SC) optimization. The experiment is designed to compare the number of function evaluations to reach the same quality of optimization. Since SC optimization cannot be performed over a condition space, equally distributed conditions are prescribed in the condition space, , for the comparison. To measure the quality of optimization, a reference is set at each prescribed condition.
Then, two cases are compared by the total number of function evaluations required to reach the same at all prescribed conditions. The first case is SC optimization performed independently at each prescribed condition. The SC optimization method can be easily derived by fixing the condition in the developed method. The second case is MC optimization modified to be conducted only at the prescribed conditions. Although the proposed method in this study is conducted over a whole condition space, it is modified in this experiment for fair comparison.
at each condition is determined as an average HV by performing SC optimization ten times up to episode since the optimization process is stochastic due to the exploration of DRL. The HV of SC optimization at each condition shows convergence around episode and the obtained Pareto front matches well with the real Pareto front as shown in Fig. 4. This is quite comparable to other studies using the Kursawe test function for evaluating their optimization methods Lim et al. (2015); Tan et al. (2013a); Leung et al. (2014); Naranjani et al. (2017). In addition, when comparing the two cases, the average number of total function evaluations of ten runs is used for precise analysis.
Fig. 4 shows one example of the results when . As shown in Fig. 3(a), SC optimization is performed independently at each prescribed condition while MC optimization is performed simultaneously at the five prescribed conditions. As shown in the figure, the number of function evaluations at each condition is reduced in the MC optimization, resulting in a significant reduction of the total number of function evaluations. Total 28481 function evaluations are required in the SC optimization while total 49811 function evaluations are required to reach the same in the MC optimization. This reduction is attributed to the fact that the MC optimization learns the correlations between the conditions and the optimal solutions. By utilizing the correlations, it can effectively find the Pareto front with a small number of function evaluations. Fig. 3(b) shows the Pareto front obtained from the two cases. Because both cases satisfy the same , the Pareto front shows good agreement with the real Pareto front in both cases.
Fig. 5 shows the experiment results according to . In SC optimization, the number of function evaluations increases linearly with . This is a natural result because optimization is performed at each condition independently. However, in MC optimization, the increment gradually decreases, so that the difference between the two cases increases with . Especially, when , the number of function evaluations of MC optimization is only of that of SC optimization. Considering the proposed method in the present study is conducted continuously over a whole condition space (), it can be inferred that the reduction of the number of required function evaluations will be much greater than this result. Therefore, we can conclude that MC optimization is much effective than SC optimization and it is enabled by learning the correlations between conditions and optimal solutions.
5.2 Airfoil shape optimization
5.2.1 Problem setup
In numerous engineering fields, the flow condition and aerodynamic requirement of an airfoil can vary depending on the situation. In this section, a MCMO airfoil shape optimization problem is defined reflecting the practical applications. First, the lift coefficient, , and the lifttodrag ratio, , are set as the objectives of optimization to be maximized. These objectives are crucial factors in designing an airfoil which many researchers are interested in Mukesh et al. (2014); Ribeiro et al. (2012); Zhang et al. (2019b); Huyse et al. (2002). Next, as a representative value of the flow condition, the chord Reynolds number, , is set as a condition variable of optimization.
In this study, an airfoil shape is parameterized using the KármánTrefftz transformation MilneThomson (1973). A KármánTrefftz airfoil is generated from the transformation of a circle in the plane to the physical plane. The circle in the plane centered on is defined to pass . Then, a complex variable on the circle is transformed to to generate an airfoil as follows:
(10) 
where is a trailingedge angle of the generated airfoil. Since it can generate various and realistic airfoils, it has been utilized in many studies Puorger et al. (2007); Berci et al. (2014). Along with the shape of an airfoil itself, the angle of attack is an important factor that greatly influences the aerodynamic characteristics. Thus, when designing an airfoil, relative to the flow direction has to be optimized to achieve optimal performance Huyse et al. (2002). In the present study, in addition to , , and which determine a KármánTrefftz airfoil, is included as a design variable of optimization.
The MCMO airfoil shape optimization problem is defined as follows:
(11) 
Since the goal of optimization is to maximize and , minus signs are added. is multiplied to to match the scale between and . In order to evaluate and , XFOIL which is an analysis tool for airfoils Drela (1989) is adopted in the present study. It is widely used for airfoil shape optimization due to its low computational cost Hansen (2018); Zhang et al. (2019a); Ram et al. (2019). The ranges of the design variables, , , , and , are set to generate various airfoil shapes excluding unrealistic shapes, and the range of is set to cover sufficiently wide applications Lissaman (1983).
5.2.2 Optimization results
The airfoil shape optimization problem is solved as described in Algorithm 1. Fig. 6 shows the results of airfoil shape optimization. As shown in Fig. 5(a), is shown to converge at episode . Fig. 5(b) shows the Pareto front at the converged episode. Overall, Pareto front of sufficient resolution is successfully found within the defined condition space, indicating that the developed method can be applied to practical engineering applications. As shown in Fig 5(b), the maximum increases with while the maximum remains relatively constant. In particular, along the line where is maximized, does not change significantly, which refers that decreases according to . In addition, two distinct features are observed in the Pareto front. When is maximized, nonconvex parts are observed near . Next, when is maximized, nonlinear parts are observed near . These parts will be further discussed through analysis of the optimal solutions and the optimal airfoil shapes.
Fig. 7 shows the optimal solutions and the optimal airfoil shapes. As can be seen in Fig. 6(a), various values of design parameters are obtained depending on and . For each design parameter, is a factor that determines the thickness of the airfoil, so the smaller the absolute value, the thinner airfoil is generated. determines the camber of the airfoil. indicates a symmetric airfoil, and the larger the value, the upper cambered airfoil is generated. and are the trailingedge angle and the angle of attack respectively, which are expressed in degrees.
As shown in Fig. 6(a), nonlinear features are observed where the optimal design parameters change dramatically with respect to and . These are particularly noticeable for and near and , and for and near and . These parts correspond to the aforementioned nonconvex and nonlinear parts observed in Pareto front respectively. Except for these nonlinear parts, overall trends are observed. As the weight of increases, thin and less cambered airfoils with low are generated. On the other hand, As the weight of increases, thick and highly cambered airfoils with high are generated. The trailingedge angle, , shows relatively less variation and keeps its minimum value.
Fig. 6(b) is the optimal airfoil shapes according to and . As mentioned above, as is close to , airfoils with high camber and are generated for maximizing the lift at all . However, the values of do not increase to the maximum, which is due to the consideration of a stall phenomenon caused by excessively high . On the contrary, as is close to , the opposite tendency is observed to consider the drag. Also, when considering the drag, the thicknesses of airfoils decrease except for where the aforementioned nonlinearity exists.
5.2.3 Aerodynamic performance analysis of optimal airfoil shapes
In this section, based on the previous optimization results, the need for MC optimization which can be performed over a whole condition space is confirmed. In order to show the need, an analysis is conducted on whether the optimal shapes at some representative conditions can provide sufficient performance over the entire condition space. For the analysis, the optimal airfoil shapes that maximize with a constraint are selected. There are many situations to optimize one objective and keep others above a certain level. For example, in many aviation applications, is optimized while maintaining a certain level of to sustain their weightHuyse et al. (2002); Buckley et al. (2010); Nemec et al. (2004). The optimal solutions of the constrained optimization problem can be easily obtained from Pareto front as shown in Fig. 7(a).
As shown in the black lines in Fig. 7(b), the optimal shapes show greater than for all and the maximized increasing with . Then, two optimal shapes at different conditions are used for the analysis. The red lines show and of the optimal airfoil at . Compared to the optimal performance, both and decrease notably except for the optimized condition. In particular, drops significantly at slightly lower than the optimized condition, so the constraint cannot be satisfied at all. In the same way, the blue lines show and of the optimal airfoil at . Although it satisfies the constraint near the optimized condition, it also shows a substantial decrease in at slightly higher than the optimized condition.
Fig. 7(c) shows the optimal airfoil shapes according to various conditions. The optimal shape at quite differs from the optimal shape at , which results in the aforementioned difference in . However, The optimal shape at is very similar to the optimal shape at , although there is a large difference in as mentioned before. Likewise, except for the optimal shape at , there is no noticeable difference among the other optimal shapes while there exist significant performance differences. These results are attributed to the nonlinear characteristic of a fluid. The optimal shape can drastically change according to the condition, and even if there is no noticeable difference in shape, a slight variation in shape can make a huge performance difference.
Through the analysis, it is shown that an optimal shape at a specific condition cannot be valid at the nearby conditions, and it can be more severe in problems that have nonlinear characteristics. Therefore, it is inadequate to perform optimization by discretizing the condition space into several representative conditions. In order to overcome the problem and remain optimal for varying conditions, it is essential to consider the whole condition space through the MC optimization method proposed in the present study.
6 Concluding remarks
For the first time in the literature, a MCMO optimization method based on DRL has been developed to find Pareto front over a prescribed condition space. The main idea is based on that DRL can learn a policy for finding optimal solutions according to varying conditions and objectives. The method has been applied to two MCMO optimization problems. First, as a benchmark problem, the Kursawe test function has been newly modified to a MCMO optimization problem. Second, an airfoil shape optimization problem has been dealt with as a practical engineering application. The present MCMO optimization method shows outstanding ability in finding highresolution Pareto front within the entire condition space including nonlinear and nonconvex parts.
Two additional analyses have been conducted to show its exclusive capability. Firstly, a computational experiment based on the modified Kursawe test function has been carried out to show the effectiveness of MC optimization. Compared with multiple operations of SC optimization for multiple conditions, the number of function evaluations required to find Pareto front is significantly reduced. This efficient optimization is enabled by learning the correlations between conditions and optimal solutions. Secondly, the necessity for MC optimization has been confirmed through an analysis of aerodynamic performance of airfoils with optimally designed shapes. An optimal solution at a specific condition cannot be valid at the nearby conditions, resulting in significant deterioration of target performance. Thus, it is essential to cover the entire condition space, which is possible through the proposed MC optimization method.
The proposed method can show its outstanding capability in optimization problems where conditions and objectives are not fixed. A representative example is shape optimization involving fluid mechanics in which the operating conditions are generally given as a range and the objectives differ depending on the situations. However, the proposed method is not limited to shape optimization and the dimensions of conditions and objectives are not restricted. It can be applied to any MCMO optimization problems. Through the developed MCMO optimization method, it is expected that the fields to which optimization can be practically applied will be greatly expanded. Moreover, from a methodological point of view, this study will pave the way to a new category of optimization as the first MCMO optimization method.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
The work was supported by the National Research Foundation of Korea (NRF) under the Grant Number NRF2021R1A2C2092146 and the Samsung Research Funding Center of Samsung Electronics under Project Number SRFCTB170351.
References
 Morphing aircraft: the need for a new design philosophy. Aerospace Science and Technology 49, pp. 154–166. External Links: ISSN 12709638 Cited by: §1.
 Hypervolumebased multiobjective optimization: Theoretical foundations and practical implications. Theoretical Computer Science 425, pp. 75–103. External Links: ISSN 03043975 Cited by: §4.3.
 A review of morphing aircraft. Journal of Intelligent Material Systems and Structures 22 (9), pp. 823–877. External Links: ISSN 1045389X Cited by: §1.
 Dynamic programming. Science 153 (3731), pp. 34–37. Cited by: §2.2.1.
 Multidisciplinary multifidelity optimisation of a flexible wing aerofoil with reference to a small UAV. Structural Multidisciplinary Optimization 50 (4), pp. 683–699. External Links: ISSN 1615147X Cited by: §5.2.1.
 CPP propeller cavitation and noise optimization at different pitches with panel code and validation by cavitation tunnel measurements. Ocean Engineering 53, pp. 177–195. External Links: ISSN 00298018 Cited by: §1.
 Airfoil optimization using practical aerodynamic design requirements. Journal of Aircraft 47 (5), pp. 1707–1719. Cited by: §5.2.3.
 Reinforcement learning for control: performance, stability, and deep approximators. Annual Reviews in Control 46, pp. 8–28. External Links: ISSN 13675788 Cited by: §2.2.1.
 The study on the morphing composite propeller for marine vehicle. part I: Design and numerical analysis. Composite Structures 168, pp. 746–757. External Links: ISSN 02638223 Cited by: §1.
 An introduction to optimization. John Wiley & Sons. Cited by: §1.
 Robust topology optimization for fiberreinforced composite structures under loading uncertainty. Computer Methods in Applied Mechanics and Engineering 384, pp. 113935. External Links: ISSN 00457825 Cited by: §1.
 MOPSO: A proposal for multiple objective particle swarm optimization. Conference Proceedings In Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600), Vol. 2, pp. 1051–1056. Cited by: §1.
 A fast and elitist multiobjective genetic algorithm: NSGAII. IEEE Transactions on Evolutionary Computation 6 (2), pp. 182–197. External Links: ISSN 19410026 Cited by: §1.
 Concepts for morphing airfoil sections using bistable laminated composite structures. ThinWalled Structures 46 (6), pp. 689–701. External Links: ISSN 02638231 Cited by: §1.
 XFOIL: An analysis and design system for low Reynolds number airfoils. Conference Proceedings In Low Reynolds Number Aerodynamics, pp. 1–12. External Links: ISBN 9783642840104 Cited by: §5.2.1.
 Aerodynamic blade design with multiobjective optimization for a tiltrotor aircraft. Aircraft Engineering and Aerospace Technology: An International Journal 87 (1), pp. 19–29. External Links: ISSN 00022667 Cited by: §1.
 Addressing function approximation error in actorcritic methods. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80, pp. 1587–1596. Cited by: §4.1.3.
 Highfidelity hydrodynamic shape optimization of a 3D hydrofoil. Journal of Ship Research 59 (04), pp. 209–226. External Links: ISSN 00224502 Cited by: §1.
 A review on deep reinforcement learning for fluid mechanics. Computers & Fluids 225, pp. 104973. External Links: ISSN 00457930 Cited by: §2.2.1.
 Lowfidelity 2D isogeometric aeroelastic analysis and optimization method with application to a morphing airfoil. Computer Methods in Applied Mechanics and Engineering 305, pp. 512–536. External Links: ISSN 00457825 Cited by: §1.
 Airfoil optimization for wind turbine application. Wind Energy 21 (7), pp. 502–514. External Links: ISSN 10954244 Cited by: §5.2.1.
 Probabilistic approach to freeform airfoil shape optimization under uncertainty. AIAA journal 40 (9), pp. 1764–1772. Cited by: §5.2.1, §5.2.1, §5.2.3.
 Adam: A method for stochastic optimization. External Links: 1412.6980 Cited by: §4.1.3.
 Actorcritic algorithms. In Advances in Neural Information Processing Systems, pp. 1008–1014. Cited by: §4.1.3.
 A variant of evolution strategies for vector optimization. Conference Proceedings In Parallel Problem Solving from Nature, pp. 193–197. External Links: ISBN 9783540706526 Cited by: §1.
 Review of morphing concepts and materials for wind turbine blade applications. Wind Energy 16 (2), pp. 283–307. External Links: ISSN 10954244 Cited by: §1.
 A new strategy for finding good local guides in MOPSO. Conference Proceedings In 2014 IEEE Congress on Evolutionary Computation (CEC), pp. 1990–1997. External Links: ISBN 19410026 Cited by: §5.1.1, §5.1.3.
 Efficient aerodynamic shape optimization with deeplearningbased geometric filtering. AIAA Journal 58 (10), pp. 4243–4259. External Links: ISSN 00011452 Cited by: §1.
 Learning the aerodynamic design of supercritical airfoils through deep reinforcement learning. AIAA Journal 59 (10), pp. 3988–4001. External Links: ISSN 00011452 Cited by: §1.
 Kursawe and ZDT functions optimization using hybrid micro genetic algorithm (HMGA). Soft Computing 19 (12), pp. 3571–3580. External Links: ISSN 14337479 Cited by: §5.1.1, §5.1.3.
 LowReynoldsnumber airfoils. Annual Review of Fluid Mechanics 15 (1), pp. 223–239. External Links: ISSN 00664189 Cited by: §5.2.1.
 Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, Vol. 30, pp. 3. Cited by: §4.1.3.
 On scalarizing functions in multiobjective optimization. OR Spectrum 24 (2), pp. 193–213. External Links: ISSN 14366304 Cited by: §1.
 Nonlinear multiobjective optimization. Vol. 12, Springer Science & Business Media. Cited by: §2.1.2.
 Theoretical aerodynamics. Courier Corporation. Cited by: §5.2.1.
 Playing atari with deep reinforcement learning. External Links: 1312.5602 Cited by: §2.2.1.
 Humanlevel control through deep reinforcement learning. Nature 518 (7540), pp. 529–533. External Links: ISSN 14764687 Cited by: §2.2.1.
 Shape optimization in fluid mechanics. Annual Review of Fluid Mechanics 36 (1), pp. 255–279. External Links: ISSN 00664189 Cited by: §1.
 Airfoil shape optimization using nontraditional optimization technique and its validation. Journal of King Saud University  Engineering Sciences 26 (2), pp. 191–197. External Links: ISSN 10183639 Cited by: §5.2.1.
 A hybrid method of evolutionary algorithm and simple cell mapping for multiobjective optimization problems. International Journal of Dynamics and Control 5 (3), pp. 570–582. External Links: ISSN 21952698 Cited by: §5.1.1, §5.1.3.
 Multipoint and multiobjective aerodynamic shape optimization. AIAA Journal 42 (6), pp. 1057–1065. External Links: ISSN 00011452 Cited by: §5.2.3.
 Design of complex bone internal structure using topology optimization with perimeter control. Computers in Biology and Medicine 94, pp. 74–84. External Links: ISSN 00104825 Cited by: §1.
 Hydrodynamic optimization of ship hull forms. Applied Ocean Research 23 (6), pp. 337–355. External Links: ISSN 01411187 Cited by: §1.
 Design optimization of ship hulls via CFD techniques. Journal of Ship Research 45 (02), pp. 140–149. External Links: ISSN 00224502 Cited by: §1.
 Preliminary design of an amphibious aircraft by the multidisciplinary design optimization approach. In 48th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, pp. 1924. Cited by: §5.2.1.
 Multiobjective optimization of cascade blade profile based on reinforcement learning. Applied Sciences 11 (1), pp. 106. External Links: ISSN 20763417 Cited by: §1.
 Deep reinforcement learning in fluid mechanics: a promising method for both active flow control and shape optimization. Journal of Hydrodynamics 32 (2), pp. 234–246. External Links: ISSN 18780342 Cited by: §1, §2.2.1.
 Design and optimization of airfoils and a 20 kW wind turbine using multiobjective genetic algorithm and HARP_Opt code. Renewable Energy 144, pp. 56–67. External Links: ISSN 09601481 Cited by: §5.2.1.
 An airfoil optimization technique for wind turbines. Applied Mathematical Modelling 36 (10), pp. 4898–4907. External Links: ISSN 0307904X Cited by: §5.2.1.
 Flexible hydrofoil optimization for the 35th America’s Cup with constrained EGO method. Ocean Engineering 157, pp. 62–72. External Links: ISSN 00298018 Cited by: §1.
 Design of a morphing airfoil using aerodynamic shape optimization. AIAA Journal 44 (7), pp. 1550–1562. External Links: ISSN 00011452 Cited by: §1, §1.
 Shape optimization in electromagnetic applications. Book Section In New Trends in Shape Optimization, A. Pratelli and G. Leugering (Eds.), pp. 251–269. External Links: ISBN 9783319175638 Cited by: §1.
 Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary Computation 2 (3), pp. 221–248. External Links: ISSN 10636560 Cited by: §1.
 Reinforcement learning: An introduction. MIT Press. Cited by: §2.2.1.
 A modified micro genetic algorithm for undertaking multiobjective optimization problems. Journal of Intelligent & Fuzzy Systems 24, pp. 483–495. Cited by: §5.1.1, §5.1.3.
 MOEA/D+ uniform design: a new version of MOEA/D for optimization problems with many objectives. Computers & Operations Research 40 (6), pp. 1648–1660. External Links: ISSN 03050548 Cited by: §2.1.2.
 Shape optimization in exoskeletons and endoskeletons: a biomechanics analysis. Journal of The Royal Society Interface 9 (77), pp. 3480–3489. Cited by: §1.
 Scalarized multiobjective reinforcement learning: Novel design techniques. Conference Proceedings In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191–199. External Links: ISBN 23251867 Cited by: §2.1.2.
 Morphing structures, applications of. Book Section In Encyclopedia of Continuum Mechanics, H. Altenbach and A. Öchsner (Eds.), pp. 1–13. External Links: ISBN 9783662536056 Cited by: §1.
 A review on deep reinforcement learning for fluid mechanics: an update. External Links: 2107.12206 Cited by: §2.2.2.
 Direct shape optimization through deep reinforcement learning. Journal of Computational Physics 428, pp. 110080. External Links: ISSN 00219991 Cited by: §1, §2.2.2.
 Adjointbased airfoil optimization with adaptive isogeometric discontinuous Galerkin method. Computer Methods in Applied Mechanics and Engineering 344, pp. 602–625. External Links: ISSN 00457825 Cited by: §1.
 Multicondition optimization of cavitation performance on a doublesuction centrifugal pump based on ANN and NSGAII. Processes 8 (9), pp. 1124. External Links: ISSN 22279717 Cited by: §1.
 Shape optimization of wind turbine blades. Wind Energy 12 (8), pp. 781–803. External Links: ISSN 10954244 Cited by: §1.
 Aerodynamic shape optimization using a novel optimizer based on machine learning techniques. Aerospace Science and Technology 86, pp. 826–835. External Links: ISSN 12709638 Cited by: §1.
 Application of function based design method to automobile aerodynamic shape optimization. Book Section In 12th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Multidisciplinary Analysis Optimization Conferences. Cited by: §1.
 MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation 11 (6), pp. 712–731. External Links: ISSN 19410026 Cited by: §2.1.2.
 Design methodology using characteristic parameters control for low Reynolds number airfoils. Aerospace Science and Technology 86, pp. 143–152. External Links: ISSN 12709638 Cited by: §5.2.1.
 Multiobjective optimization design for airfoils with high lifttodrag ratio based on geometric feature control. IOP Conference Series: Earth and Environmental Science 227, pp. 032014. External Links: ISSN 17551315 Cited by: §5.2.1.
 Multifidelity deep neural network surrogate model for aerodynamic shape optimization. Computer Methods in Applied Mechanics and Engineering 373, pp. 113485. External Links: ISSN 00457825 Cited by: §1.
 Evolutionary algorithms for multiobjective optimization: Methods and applications. Vol. 63, Citeseer. Cited by: §4.3.
List of Figures
 1 Schematic of the HV when the number of objectives is two.
 2 Optimization process of the modified Kursawe test function.
 (a) Results during the optimization process. One data point is added for every episode. The Pareto front is selected from the data as described in Section 4.2. The network shows the values of objective functions corresponding to the actions determined by the network.
 (b) over episodes with a reference point .
 3 Optimization results of the modified Kursawe test function. The black dots show the Pareto front at episode 100000. The skyblue surfaces in (a) and lines in (b) show real Pareto front depicted for comparison.
 (a) Pareto front from two different angles.
 (b) Pareto front in five decomposed condition spaces
 4 One of the experiment results when .
 (a) The HVs over the total number of function evaluations with a reference point at five prescribed conditions. ——–, ; ——–, ; ——–, ; ——–, ; ——–, . The left figure shows SC optimization and the right figure shows MC optimization. The dashed lines in both figures indicate the same . If the HV at a specific condition satisfies , the condition is excluded. The numbers near the lines are the numbers of function evaluations at each condition.
 (b) Pareto front obtained from the two cases. The black crosses show the Pareto front obtained form SC optimization and the red crosses show the Pareto front obtained from MC optimization. The skyblue lines show the real Pareto front at the five prescribed conditions.
 5 The number of function evaluations required to reach the same according to . The black line indicates the average of SC optimization and the red line indicates the average of MC optimization. The error bars show the minimum and maximum values over ten runs. The numbers below the red line are the percentages of MC optimization compared to SC optimization.
 6 Results of airfoil shape optimization.
 (a) over episodes with a reference point .
 (b) Pareto front at episode from two different angles.
 7 Optimal solutions and optimal airfoil shapes.
 (a) Optimal design parameters corresponding to and . is the weight of . Thereby, corresponds to the maximization of while corresponds to the maximization of .
 (b) Optimal airfoil shapes at different and . ——–, ; ——–, ; ——–, . of the airfoils are expressed assuming that the flow direction is horizontal.
 8 Performance analysis of optimal airfoil shapes.
 (a) Finding the optimal shapes that maximize with a constraint in Pareto front. The black line corresponds to the optimal shapes.
 (b) and of optimal shapes according to . The black lines correspond to the optimal shapes obtained in (a). The red and blue lines correspond to each optimal airfoil at and respectively, depicted as the stars.
 (c) Optimal airfoil shapes obtained in (a) at various . Note that the red and blue airfoils correspond to the airfoils selected in (b).
Highlights

A multicondition multiobjective optimization method is developed based on deep reinforcement learning.

A novel benchmark problem for multicondition multiobjective optimization is introduced.

The developed method is shown to efficiently find highresolution Pareto front over a condition space.

Learning the correlations between conditions and optimal solutions enables efficient optimization.

Critical degradation of target performance by optimization performed only at a specific condition is confirmed.