WCCM ECCOMAS 2026

Deep Reinforcement Learning for Energy-Efficient Control in Smart Buildings

Sapargali, Almaz (DigitAlem LLP)
Aibagarov, Serik (DigitAlem LLP)
Mukhanbet, Aksultan (DigitAlem LLP)

In session: MS362A - Scientific Machine Learning to Enable Real-time Inference for Digital Twins I

Please login to view abstract download link

Improving the energy efficiency of smart buildings while maintaining acceptable indoor thermal comfort is a challenging control problem. This is mainly due to nonlinear system behavior, delayed thermal responses, and conflicting objectives related to comfort and energy use. HVAC systems further complicate the task because of their thermal inertia and actuator limitations. As a result, simple rule-based controllers are often insufficient. Recent studies have explored deep reinforcement learning for HVAC optimization; however, most existing approaches assume idealized action spaces and neglect actuator rate limits and physical response constraints. This work focuses on improving the realism of the action space used in multi-objective deep reinforcement learning for HVAC control in smart buildings. The control task is defined as a balance between thermal comfort and energy consumption using a weighted multi-objective reward. Instead of sending control actions directly to the environment, an additional action processing stage is introduced. This stage applies basic safety constraints, normalization, limits on how quickly actions can change, and simple smoothing. These elements are included to better reflect actuator behavior and thermal inertia and to avoid unrealistic control actions during training. The proposed approach is evaluated in a physics-based EnergyPlus simulation of a multi-zone building and compared against random and rule-based baseline controllers. The MORL-based policy achieves a substantially improved comfort–energy trade-off: average HVAC power consumption is reduced from approximately 480–500 W to 410–420 W (12–15% reduction), while the mean comfort penalty decreases from about 2.3 to 1.8. Moreover, the 95th-percentile comfort penalty is reduced from roughly 1.21 to 0.42. Throughout evaluation, indoor temperatures remain within a stable range of 15–20.5 °C and exhibit smoother dynamics than baseline control strategies. Overall, the results provide promising initial validation of actuator-aware MORL and establish a basis for future extensions toward more advanced digital twin control frameworks.