Parameter-Varying Control Using Reference Policies and Transformer-Based PPO

  • Meena, Lokendra (The Ohio State University)
  • Han Veiga, Maria (The Ohio State University)
  • Liu, Xinyu (The Ohio State University)

Please login to view abstract download link

We present a generalized, data-driven framework for solving parameter-varying control problems in complex dynamical systems. In many engineering applications, from additive manufacturing to robotics, the underlying physical parameters (e.g., thermal coefficients, stiffness, or friction) can vary significantly during operation. Some of the current strategies relies on Linear Parameter-Varying (LPV) techniques and Gain Scheduling, which require extensive manual tuning, or Model Predictive Control (MPC), which requires underlying physics models and need online computational resources. Although Deep Reinforcement Learning (DRL) offers a model-agnostic alternative, standard methods struggle to generalize across wide parameter ranges. This study addresses the question of how to use sparse, heterogeneous, reference policies (which works well for fixed parameters or for small local variations) to obtain a global control law for a wide range of parameter values without explicit model knowledge. We propose a custom Residual Transformer policy which combines the Multi-Head attention mechanism of Transformers, Residual learning and PPO to learn how to dynamically interpolate between references along with small correction term to final control. We validate this methodology on numerical benchmark exhibiting transition from non-stiff to stiff as we vary the parameters and discontinuities with respect to the parameters. In terms of average reward and size of training data, our Transformer-based policy surpasses both, the individual reference policies and RL baselines where we simply add the parameters as another state variable for PPO. Additionally, we observe that the choice of input embedding (e.g., Fourier Features ) plays a role in resolving parameter sensitivity, though the attention mechanism remains robust across embedding types. The overall significance of this work lies in providing a data-driven meta-controller that is both interpretable and real-time capable for wide range of parameter variations.