WCCM ECCOMAS 2026

Physics-Enhanced Reinforcement Learning for Real-time Optimal Control of Dynamical Systems

Tomasetto, Matteo (Politecnico di Milano)
Botteghi, Nicolò (Politecnico di Milano)
Bruni, Gabriele (Politecnico di Milano)
Manzoni, Andrea (Politecnico di Milano)

In session: MS285B - Cents & Sensitivities: Exploiting Differentiable Solvers for Low-Cost Optimization in Science and Engineering II

Please login to view abstract download link

Deep Reinforcement Learning (DRL) has recently emerged as a promising feedback control strategy for complex physical systems governed by differential equations. However, DRL is typically sample inefficient, as many environment interactions are required to synthesize optimal control strategies, and limited to low-dimensional state and action variables, due to the curse of dimensionality entailed by the exploration and exploitation in high-dimensional spaces. In this work, we present a novel Physics-EnhAnced Reinforcement Learning (PEARL) paradigm tailored to the control of (possibly high-dimensional and parametric) dynamical systems. PEARL leverages automatic differentiation-based policy learning algorithms, such as, e.g., Short-Horizon Actor–Critic (SHAC), to guide policy learning through adjoint-based sensitivities, enabling optimal control strategies after a few environment interactions. To further promote sample complexity, a model-based extension of the algorithm is proposed, where we learn on-the-fly differentiable proxies of the underlying dynamics to interact with during training, with minimal deterioration of the policy performance. Through applications on challenging optimal control problems, we show that PEARL (i) can effectively steer differentiable environments implemented in, e.g., PyTorch, JAX, and FEniCS, outperforming state-of-the-art DRL algorithms, (ii) is extremely sample efficient, thanks to the physics-guided optimization and the differentiable surrogate models emulating the system evolution, (iii) generalizes across multiple scenarios when dealing with parametric systems, and (iv) can cope with high-dimensional problems characterized by large state and action spaces, without requiring low-dimensional state representations or multi-agent strategies.