Evaluating the Robustness of Generative Models in Emulating Non-stationary Dynamical Systems
Please login to view abstract download link
As climate change shifts the Earth’s thermodynamic equilibrium, atmospheric and oceanic variables become inherently non-stationary. While machine learning (ML) emulators excel at reproducing historical patterns, their ability to generalize to ”unseen” future climates—characterized by higher energy states and shifting distributions—remains a critical challenge for trustworthy prediction [1]. This work investigates the out-of-distribution (OOD) generalization capabilities of generative models using two-dimensional Rayleigh-Benard convection as a surrogate for complex atmospheric dynamics. Though simplified, this setup provides a proxy problem for evaluating non-stationary dynamics in controlled environment. To simulate a changing climate, we implement a quasi-adiabatic ramping protocol, transitioning the system from moderate to high turbulence by step-wise increases of the Rayleigh number (Ra). This creates a rigorous OOD benchmark where the test set contains the most energetic states. Notably, the global heat transport scaling, defined by the Nusselt number Nu, shifts from Nu ∼ Ra0.26 in the training set to Nu ∼ Ra0.32 in the test set, representing a fundamental regime shift. We evaluate a generative emulator based on the Flow Matching framework [2]. Our results demonstrate that, out-of-the-box, the model fails to capture the evolving physics and heat-transport statistics in the highly turbulent regime. Physical consistency is assessed via derived quantities, including global Nu and spectral density. This benchmark provides a framework for ML extrapolation testing and contributes to the development of reliable ML climate emulators capable of predicting beyond historical data, where distributions are inherently changing.
