Multi-step Shape Optimization of Airfoil based on Reinforcement Learning

  • Feng, Yiqi (Technical University of Munich)
  • Schmidt, Steffen (Technical University of Munich)
  • Adams, Nikolaus (Technical University of Munich)

Please login to view abstract download link

Airfoil shape optimization is a high-dimensional and strongly non-linear design problem: small geometric variations can trigger large changes in aerodynamic performance, making the objective multi-modal and difficult to explore with conventional one-shot optimizers. Reinforcement learning (RL) offers an alternative view of design as a decision process, where an agent learns a deformation strategy that improves aerodynamic performance through interaction with an environment, rather than repeatedly solving a new optimization problem from scratch. In this work, we formulate airfoil design as a Markov decision process: an agent is able to predict incremental updates to the control points, producing a sequence of geometries that improve the target aerodynamic performance. The agent is coupled directly to an aerodynamic solver rather than a surrogate model, enabling exploration of a multi-modal design space while avoiding surrogate-induced bias and uncertainty. Specifically, the airfoil is parameterized by B-spline control points, and the state comprises the current control-point vector and the flow conditions, e.g., Reynolds number. The action space is continuous and defined as bounded increments to multiple shape control points. After each action taken by the agent, the geometry is reconstructed. A compressible flow solver then evaluates the updated airfoil to generate its aerodynamic performance such as $C_d$, $C_l$ as the reward of the prediction. Based on Proximal Policy Optimization (PPO) framework, the agent is trained towards maximizing the cumulative rewards based on collected trajectories. Results demonstrate that the proposed multi-step RL method improves aerodynamic efficiency over baseline designs while maintaining geometric feasibility throughout the optimization trajectory. Constraint handling is naturally integrated via reward shaping and termination criteria: invalid geometries (e.g., self-intersection, thickness violation, excessive curvature) and solver failures are penalized or rejected, which stabilizes training and increases simulation success rates. Overall, the proposed framework provides a flexible and scalable paradigm for sequential aerodynamic shape optimization and can be extended to additional operating conditions and objective formulations.