Investigation and Optimization of Multi-Objective Reinforcement Learning Algorithms for Resource Management in Smart Buildings
Please login to view abstract download link
The building sector accounts for approximately 40% of global primary energy consumption, making the optimization of HVAC systems a critical task [1]. Traditional control methods often fail to effectively balance conflicting objectives, such as minimizing energy consumption while maximizing thermal comfort. This work investigates Multi-Objective Reinforcement Learning (MORL) algorithms to address this challenge by learning a set of Pareto-optimal policies, allowing dynamic adaptation to changing user preferences without model retraining. We utilized the BOPTEST framework (BESTEST case 900) to simulate a residential building with a hydronic heat pump [2]. The control problem was formulated as a Markov Decision Process (MDP) with a vector reward function. Three actor-critic architectures were implemented and compared: Scalar-UVFA, Conditioned Network with MSE loss (CN-MSE), and Conditioned Network with Huber loss (CN-Huber). Experimental results over 500 episodes demonstrate that the CN-MSE algorithm achieves the best performance, reaching an Inverted Generational Distance (IGD) of 0.005. It significantly outperformed CN-Huber, which suffered from "cold start" issues and slower convergence due to the nature of the Huber loss function in this specific domain. The CN-MSE agent successfully approximated the Pareto front, revealing non-linear trade-offs where slight improvements in comfort at high preference levels require disproportionate energy expenditures. We conclude that CN-MSE is a robust solution for smart building resource management, enabling real-time flexibility between economy and comfort modes.
