Transformers versus Recurrent Neural Networks for Path-Dependent Composites
Please login to view abstract download link
Short Fiber Reinforced Composites (SFRCs) have gained significant industrial attention due to their high strength-to-weight ratio, cost-effectiveness, and ability to be injection-molded into complex 3D geometries. Injection molding produces diverse fiber orientation distributions. This requires the generation and homogenization of 3D Representative Volume Elements (RVEs). However, high-fidelity micro-mechanical simulations of SFRCs are computationally expensive, particularly for nonlinear, path-dependent elasto-plastic analyses. To address this, data-driven surrogate models using neural networks have been used, with Recurrent Neural Networks (RNNs) frequently adopted to predict the homogenized response of RVEs [1-2]. While RNNs effectively model complex data, they exhibit limitations in retaining long-range temporal dependencies. Recently, transformer models utilizing self-attention mechanisms have been proposed as a scalable and parallelized alternative [3]. Despite their success in capturing complex sequence-to-sequence dependencies in 2D composite modeling, transformers have not yet been systematically compared to RNNs regarding 3D plasticity or their ability to handle data scarcity during training. This work systematically compares a classical RNN and a transformer-based surrogate model for modeling nonlinear behavior in 3D RVEs of SFRCs. We utilize Bayesian optimization to tune both training and architectural hyperparameters, ensuring a fair and reproducible evaluation. To mitigate the scarcity of high-fidelity simulation data, we incorporate a previously developed rotation-based augmentation strategy that improves model robustness without requiring additional expensive simulations [1]. Results indicate that while transformer models are competitive on large datasets, RNNs demonstrate superior accuracy in small-data regimes and more reliable extrapolation under cyclic loading. Specifically, the transformer model failed to reliably capture behavior during extrapolation tests where the RNN remained accurate. Conversely, the transformer architecture was approximately seven times faster during inference due to its parallelized structure. These findings provide a practical framework for selecting surrogate architectures based on specific data availability and computational constraints.
