Ensemble Based Testing for Chaotic Weather and Climate Models and Investigations into Fused Multiply-Add Sensitivities.

  • Price-Broncucia, Teo (NSF National Center for Atmospheric Research)
  • Baker, Allison (NSF National Center for Atmospheric Research)
  • Duda, Michael (NSF National Center for Atmospheric Research)

Please login to view abstract download link

Large scale weather and climate model simulations are critical tools in the effort to understand our changing climate. Like other codes, ensuring the correctness and code quality of these models is a necessity for maintaining confidence in simulation results. However, evaluating correctness for these codes is non-trivial as most climate and weather model codes are quite large, chaotic, and constantly changing to both adapt to new computing technologies and incorporate new scientific capabilities and developments. The straightforward approach of evaluating correctness by requiring bit-for-bit (BFB) equivalent results is not feasible for chaotic weather and climate models and the variety of hardware and software environments in which they run. Alternatively, requiring expert evaluation is expensive, time-consuming, and subjective. Therefore, there have been a number of alternative efforts over the years to evaluate correctness in climate and weather model codes, beginning with Rosinski et al., 1997, and more recently (often motivated by increasingly heterogeneous computer platforms) in Wan et al., 2017 and Mahajan et al., 2017. Our focus in this work is the Ensemble Consistency Test (ECT) (Baker, et al., 2015), which employs an asymmetrical ensemble based hypothesis test to detect changes in expensive chaotic models. The ECT and its variants have been used at the National Center for Atmospheric Research for over 10 years to help ensure correctness during model development. Over the course of the method's development and application (Price-Broncucia et al., 2024), one test scenario has repeatedly stood out. Why does the use of the Fused Multiply-Add (FMA) operation result in model configurations getting flagged as failures, while changes to compiler choice, optimization level, processor type and number, etc. are passed as expected? This work explores the impacts of FMA on GCM simulation output, demonstrating why these codes are uniquely sensitive and why correctness methods are so important. In addition it provides directions for future work to enable model developers and users to use numerical optimization techniques with confidence.