HORSES3D-GPU: A high order discontinuous Galerkin accelerated solver for multi-GPU systems
Please login to view abstract download link
HORSES3D is a high-order discontinuous Galerkin finite element solver tailored for large- scale computational fluid dynamics (CFD) simulations, capable of handling compressible, incompressible, and multiphase flows. In this work, we present the porting of the compressible version of HORSES3D [1] to multi-GPU architectures using OpenACC, enabling scalable and efficient execution on modern high-performance computing systems, specifically those equipped with NVIDIA H100 GPUs, such as MareNostrum5. We demonstrate strong scaling performance, achieving near-ideal parallel efficiency in up to 256 GPUs for simulations involving more than 1 billion degrees of freedom (DOF). We also investigated the impact of polynomial order on performance and accuracy, finding that simulations with a polynomial order of P = 7 (corresponding to the 8th order accuracy) achieve the same computational efficiency as P = 4 (5th order accuracy), allowing higher-fidelity solutions without additional run-time cost. To validate the capabilities of the GPU-accelerated solver, we present large-scale simulations of complex engineering applications, including the Taylor-Green vortex problem and a 3D aircraft and wind turbines [2]. These applications demonstrate the robustness and versatility of HORSES3D in simulating real-world scenarios across a wide range of flow regimes. Our results underline the effectiveness of accelerating high-order solvers and provide actionable insights into achieving optimal performance and accuracy on emerging exascale GPU architectures for advanced CFD applications.
