A Story of CPUs and GPUs: Performance Portability and Scaling Results for Nektar++

  • Wüstenberg, Henrik (Imperial College London)
  • Xing, Jacques (Imperial College London)
  • Ye, Junjie (Imperial College London)
  • Renner, Diego (Imperial College London)
  • Xia, Boyang (King's College London)
  • Sherwin, Spencer (Imperial College London)
  • Moxey, David (King's College London)
  • Cantwell, Chris (Imperial College London)

Please login to view abstract download link

Nektar++ is an open-source spectral/hp element framework [1,2] for the numerical solution of partial differential equations (PDEs), widely used for high-order simulations in computational fluid dynamics (CFD) and related applications. To better exploit modern heterogeneous high-performance computing (HPC) systems, including GPU-accelerated platforms, Nektar++ is undergoing a major redesign focused on performance portability while retaining high efficiency on CPU-only architectures. The redesigned framework adopts a unified backend architecture supporting CUDA, HIP, SYCL, and SIMD vectorisation, enabling a single high-level solver implementation to target diverse hardware platforms. Performance-critical components, such as elemental and global operators, are implemented using backend-specialised kernels, while solver logic remains device-independent. Building on previously demonstrated performance of elemental operators, this work presents early scaling results, progressing from single-GPU execution to multi-GPU configurations. We report performance and scaling behaviour for optimised global operators, and demonstrate solver-level results for the incompressible and compressible Navier-Stokes equations using continuous and discontinuous Galerkin discretisation within the new operator-based design. These early results illustrate the feasibility of scaling high-order spectral element methods for unstructured meshes across multiple GPUs using a unified, maintainable codebase, and represent an important step toward scalable, portable, and production-ready CFD simulations on current and future heterogeneous HPC systems.