Performance-Portable Implementation of the Conforming Reproducing Kernel Method for Explicit Dynamics
Please login to view abstract download link
The Conforming Reproducing Kernel (CRK) method combines meshfree approximation techniques with mesh-based boundary handling to improve robustness and automation in computational mechanics simulations [1]. A key challenge in deploying such methods for production use is achieving efficient execution across diverse hardware architectures—from multi-core CPUs to modern GPUs—without maintaining separate codebases. This presentation describes the performance-portable implementation of CRK within our aperi-mech code. The implementation leverages the Kokkos programming model [2] for portable parallelism and the Sierra Toolkit (STK) for mesh and field data management. This combination enables a single codebase to target multiple execution backends including OpenMP for CPU threading, CUDA for NVIDIA GPUs, and HIP for AMD GPUs. We discuss several design decisions that enable efficient execution across architectures. Compressed sparse row (CSR) data structures store node-neighbor relationships and shape function information in a GPU-friendly format. Eigen matrix types with runtime striding wrap STK field data, avoiding excessive memory copies while accommodating different memory layouts across backends. The reproducing kernel shape function construction, which involves solving small linear systems at each evaluation point, maps naturally to data-parallel execution. Preliminary benchmarks on explicit transient dynamics problems demonstrate approximately 300× speedup on NVIDIA H100 GPUs compared to single AMD EPYC CPU cores. CPU threading via OpenMP provides expected scaling on multi-core configurations. Additional performance comparisons will be presented at the conference. These results suggest that meshfree-inspired methods like CRK can achieve competitive performance on modern accelerators while maintaining the code portability necessary for deployment across heterogeneous computing environments. REFERENCES [1] J.J. Koester and J.-S. Chen, Conforming Window Functions for Meshfree Methods, Computer Methods in Applied Mechanics and Engineering, Vol. 347, pp. 588–621, (2019). [2] C.R. Trott et al., Kokkos 3: Programming Model Extensions for the Exascale Era, IEEE Transactions on Parallel and Distributed Systems, Vol. 33, No. 4, pp. 805–817, (2022)
