Containerizing CODA: Deployment and Performance of Large Industrial HPC Workflows
Please login to view abstract download link
Containerization has become a key enabler in high-performance computing (HPC), addressing the challenges in software reproducibility, portability, and dependency management, especially for complex scientific and industrial workflows. By encapsulating applications, libraries, runtime environments and complete workflows within lightweight, isolated containers, users can ensure consistent execution across diverse computing infrastructures, from workstations and local clusters to exascale systems. In this talk we present our experiences with building, deploying and using container images of the CFD software CODA on large-scale production clusters. CODA is the computational fluid dynamics (CFD) software being developed as part of a collaboration between the French Aerospace Lab ONERA, the German Aerospace Center (DLR), Airbus, and their European research partners. CODA offers a simulation platform for generating detailed aerodynamic data for the design and optimization of aircraft, including both fixed-wing airplanes and rotary-wing helicopters. To support real-world deployment, CODA is packaged in pre-built, containerized environments using Singularity or Apptainer. These containers include the CODA CFD software, the FlowSimulator framework, the high-performance sparse linear solver Spliss, and all further necessary dependencies; all of which are optimized for heterogeneous high-performance hardware including NVIDIA and AMD GPUs. The talk highlights challenges and solutions to build and deploy container images of the complete workflow for CODA simulations. A detailed evaluation shows the performance of the CODA CFD software on DLR’s two HPC production systems based on the AMD Naples and Rome architectures with the NASA Common Research Model. The evaluation includes a comparison of compute performance and an assessment of strong and weak scaling behaviour on the two systems. In addition, the talk discusses the specific challenges that arise from increasingly heterogeneous systems including different CPU and GPU architectures.
