Improving Neural Cellular Automata Training for Grain Growth through GPU-Parallelized Data Generation

  • Sitarz, Szymon (AGH University of Krakow)
  • Sitko, Mateusz (AGH University of Krakow)
  • Madej, Lukasz (AGH University of Krakow)

Please login to view abstract download link

Cellular Automata (CA) is a well-known computational method for simulating microstructural evolution phenomena such as grain growth or static recrystallisation. To improve CA models' computational capabilities, reduce computing time, and increase performance, various CPU and GPU parallelisation schemes have been analysed and implemented in many research works [1,2]. However, recently, an alternative approach has emerged in the form of Neural Cellular Automata (NCA), where transition rules are developed from data rather than explicitly programmed [3]. However, effective NCA training requires large datasets comprising millions of samples, acquired from experimental investigations of full-field simulations. This shifts the computational bottleneck from simulation execution to training data generation, thereby motivating renewed interest in GPU parallelisation of CA. This work presents an analysis of GPU parallelisation strategies for CA-based training data generation and demonstrates their application to NCA model training. Two grain growth models are considered: an unconstrained grain growth model and a curvature-driven grain boundary migration model. The CUDA implementation addresses key challenges, including efficient neighbourhood access via shared memory, minimising host-to-device data transfers, and balancing computation with I/O during dataset generation. Appropriate data representations are also developed for each model to enable NCA to develop transition rules applicable to arbitrary microstructure configurations. The results demonstrate that NCA models trained on GPU-generated datasets successfully reproduce the underlying CA transition rules with high accuracy. The performance analysis identifies conditions under which GPU acceleration provides significant benefits over CPU implementations for large-scale data generation. The presented framework establishes a foundation for extending the approach to more complex microstructure evolution phenomena, including static recrystallisation in the presence of continuous physical fields, such as stored-energy and temperature distributions.