Automatic Job-specific Energy Optimization within the ClusterCockpit Monitoring Framework

  • Eitzinger, Jan (NHR@FAU)
  • Panzlaff, Michael (NHR@FAU)
  • Kluge, Christoph (NHR@FAU)
  • Ujeniya, Aditya (NHR@FAU)
  • Gruber, Thomas (NHR@FAU)
  • Wellein, Gerhard (NHR@FAU)

Please login to view abstract download link

Energy consumption is a key concern in the design and operation of high-performance computing (HPC) systems. Improving energy efficiency requires coordinated consideration of both application behavior and hardware configuration. This contribution presents an automated, job-specific approach for optimizing the energy efficiency of HPC systems through dynamic adaptation of hardware parameters such as power caps and frequency settings. The proposed solution continuously monitors application performance and node-level energy consumption and applies direct optimization during runtime. The default optimization policy minimizes the energy–delay product to balance energy-to-solution and throughput. The solution is composed of a node metric collector, a node setting agent, and a centralized energy manager. This solution was implemented as part of the BMBF EE-HPC project [1] and is inspired by the PowerSched framework initially contributed in [2]. It is integrated in the job-specific monitoring framework ClusterCockpit [3]. We evaluate the approach using multiple benchmarks on multicore and GPU-accelerated systems, demonstrating its potential for significant energy savings with minimal performance impact.