WCCM ECCOMAS 2026

OpenACC-Accelerated MPM for Slope Failure Simulation with Single-Codebase MPI and OpenMP Benchmarking

Hidano, Soma (Tohoku University)
Terada, Kenjiro (Tohoku University)

In session: MS075B - Advances in Natural Hazard Simulation II

Please login to view abstract download link

The MPM, a particle method with a FEM-like background grid, is widely used in geotechnical engineering to simulate large deformations such as slope failure. GPU-accelerated MPM has been actively studied, yet fully aligned “apples-to-apples” performance evaluations across distributed-memory MPI CPU runs, shared-memory OpenMP CPU runs, and directive-based GPU offloading within a single codebase remain relatively limited. In this study, we develop an OpenACC-offloaded MPM while retaining MPI and OpenMP execution paths in the same Fortran code, enabling consistent benchmarking across CPU (MPI/OpenMP) and GPU (OpenACC, optionally with MPI) configurations within a unified codebase. We report end-to-end performance together with a transparent timing breakdown over major stages, including particle-to-grid (P2G), grid updates, grid-to-particle (G2P), particle management, and I/O. To reduce contention in P2G scatter updates, we maintain node-based particle lists and perform P2G by looping over grid nodes and their associated particles. All experiments are conducted on the “Miyabi” supercomputer operated by the Joint Center for Advanced High Performance Computing (JCAHPC). The Miyabi-G GPU nodes adopt NVIDIA GH200 Grace Hopper Superchips connected via NVLink-C2C and provide a cache-coherent, unified CPU–GPU memory system. We further assess multi-node, multi-GPU execution by reusing MPI domain decomposition with one GPU per node, and demonstrate feasibility for a real-scale three-dimensional slope-failure simulation while discussing scalability limits and dominant cost components. Looking ahead, to address long-duration infiltration-induced failure, we will extend the framework toward semi-implicit/implicit time integration and accelerate preconditioners and linear solvers for robust, scalable simulations.