Physics-Consistent Generative Modeling via Function-Space Diffusion Operators
Please login to view abstract download link
In the era of generative modelling, diffusion models have demonstrated remarkable progress across diverse domains. These models iteratively corrupt data with Gaussian white noise and learn a reverse process that estimates the score function to generate novel samples. However, being developed primarily in Euclidean space, classical score-based diffusion models struggle to approximate the true data distribution accurately. Moreover, when used as surrogates for scientific systems, they show inconsistencies in generating samples that accurately reflect the underlying physics. To address this limitation, our work operates in function space by combining diffusion models with neural operators. Neural operators learn the solution operator governing partial differential equations (PDEs), providing a principled basis for learning data distributions while reducing approximation error in score estimation. In this setting, the standard forward diffusion process is replaced by a Gaussian random field noise instead of white noise. We consider data from both classical benchmark PDEs as well as complex simulation-based systems, including the Cahn-Hilliard equation and related models, evaluated using spectral methods and state-of-the-art architectures. This provides a systematic testbed for assessing generative performance across increasing physical and geometric complexity. In this context, recent work has shown that diffusion models implicitly learn the intrinsic dimensionality (ID) of data manifolds, which helps mitigate the curse of dimensionality in high-dimensional scientific systems whose effective dynamics evolve on low-dimensional structures. However, geometric and approximation errors still limit estimation accuracy. By leveraging neural operators, our approach aims to reduce these bottlenecks, enabling improved score estimation, physically consistent sample generation, and reduced computational cost as data requirements scale with the ID. We further demonstrate how this framework supports a better understanding of system topology, enhances generation strategies, and integrates physics-based objectives.
