WCCM ECCOMAS 2026

Advancing Large-Scale CFD Simulations via online Training & Inference using SOD2D solver and TorchFort

Spiga, Filippo (NVIDIA Corporation)
Gasparino, Lucas (Barcelona Supercomputing Centre (BSC))
Lehmkuhl, Oriol (Barcelona Supercomputing Centre (BSC))

In session: MS084C - Integrating HPC and AI for Real-World Applications III

Please login to view abstract download link

Modern high-resolution computational fluid dynamics (CFD) simulations produce massive amounts of data — often hundreds of gigabytes per snapshot — creating severe I/O bottlenecks when storing and managing data for offline AI training. This often leads to discarding valuable intermediate data that capture the evolution of physical phenomena. Although advanced AI models such as CNNs, U-Nets, GNNs, Fourier Neural Operators (FNOs), and Physics-Informed Neural Networks (PINNs) are effective for complex PDE problems, they struggle to scale efficiently for large real-world engineering domains. To address this “data deluge” and improve time-to-solution for large-scale simulations, we integrate TorchFort into the SOD2D spectral element solver. TorchFort, an open-source library developed by NVIDIA, provides seamless coupling between AI models and Fortran-based solvers within a unified MPI framework. Unlike alternative methods such as native Python wrappers or external orchestration tools (e.g., SmartSim), TorchFort removes the need for standalone driver programs and handles all learning tasks directly within the simulation environment. This work demonstrates a methodology for online training and inference, where the AI model operates concurrently with the running simulation. Data are accessed directly from memory, eliminating file I/O overhead and enabling real-time responses, such as early termination when a simulation diverges. The framework is generalizable to other CFD solvers. As a proof of concept, a U-Net architecture for feature detection is used to illustrate the integration process and component coupling. The method uses TorchScript-exported PyTorch models and YAML-based configuration files loaded dynamically by SOD2D, enabling flexible experimentation without modifying the core solver. This integrated approach streamlines the creation of high-fidelity AI surrogate models by leveraging live simulation data instead of stored datasets. It removes storage constraints, accelerates model development, and supports adaptive AI-assisted simulations for complex, long-term CFD studies transforming how large-scale engineering problems are modeled and optimized.