Multigrid preconditioned asymmetric iterative solver for the Incompressible SPH(2) for multi-GPU environments

  • Morikawa, Daniel Shigueo (Tongji University)
  • Osaki, Haruki (Kozo Keikaku Engineering Inc.)
  • Morito, Yusuke (Kyushu University)
  • Tsutsumi, Shigenobu (Claris Techno)
  • Asai, Mitsuteru (Kyushu University)

Please login to view abstract download link

In this study, we present the implementation of a multigrid preconditioned solver for an asymmetric solver in the context of the Incompressible Smoothed Particle Hydrodynamics (SPH) method [1]. This work is based on the SPH(2) [2], a high-order accurate formulation of the spatial derivatives, which results in asymmetric coefficient matrix for the pressure Poisson equation. To adapt the multigrid preconditioner to the SPH method, we propose to conduct the restriction operator by mapping particles to a background grid used for neighboring searching, followed by standard grid coarsening. Key contributions include the adjustment of the multigrid preconditioner to multi-GPU cluster environments including communication-computation overlap and communication-hiding techniques. Also, we adapted the dynamic slice-grid domain decomposition technique to respect the coarse-grid hierarchy of the multigrid preconditioner. We present the method's computational efficiency through dam-break simulations on three supercomputers (JAMSTEC's Earth Simulator, Tongji University's supercomputer, and the University of Tokyo's Miyabi), demonstrating weak scalability efficiencies between 0.82 and 0.89. In summary, the multigrid preconditioner dramatically improved solver performance, reducing the average iteration count for the pressure solver from over 300 to nearly constant 60 across problem sizes ranging from 5 to 320 million particles, resulting in a 2.3 times overall speedup. Furthermore, we demonstrate the method's capability for being applied in real-world engineering applications by simulating the complex 2011 tsunami inundation of the Fukushima Daiichi Nuclear Power Plant using approximately 100 million fluid particles on 32 GPUs. While the preconditioner proves to be highly effective for solver convergence, we conclude by discussing the limitations related to coarse-grid constraints on load balancing and communication overhead.