Entropy-Guided Ensemble Active Learning for Efficient Structural Selection in Machine-Learning Interatomic Potentials

  • Nonaka, Hirona (The University of Osaka)
  • Liu, Lijun (The University of Osaka)
  • Jia, Weile (Chinese Academy of Sciences)
  • Hirotani, Jun (Kyoto University)

Please login to view abstract download link

Machine-learning interatomic potentials (MLIPs) have become an essential component of modern computational mechanics and materials modeling, enabling accurate and efficient atomistic simulations. However, the construction of reliable MLIPs still relies heavily on large and diverse training datasets, making data generation a major computational bottleneck. This challenge has motivated the development of active learning strategies, in which training data are adaptively selected based on model uncertainty to reduce the cost of reference calculations while maintaining accuracy [1]. In this work, we propose an ensemble-based active learning framework for efficient and robust structural selection in MLIP development. Multiple models with different initial conditions are trained using the same reference dataset, and their predictions of mechanical responses—such as elastic properties under applied strain—are interpreted as statistical ensembles. Assuming a normal distribution of model outputs, the Shannon information entropy of the ensemble predictions is evaluated as a quantitative measure of predictive uncertainty. Atomic configurations associated with high entropy are selectively added to the training set, thereby prioritizing data that maximally reduces model uncertainty. This entropy-guided selection principle provides a mathematically well-defined and physically interpretable criterion for uncertainty quantification in data-driven atomistic modeling. Through iterative active learning cycles, the proposed approach exhibits a systematic reduction in entropy, indicating progressive convergence of ensemble predictions. Benchmark tests demonstrate consistent improvements in the accuracy of energies and forces with a limited number of additional training samples. Moreover, key physical properties—including structural parameters, defect-related energies, surface energetics, and elastic constants—show rapid and stable convergence toward reference values obtained from experiments or first-principles calculations. The present framework provides a statistically grounded and scalable strategy for MLIP construction, offering a promising pathway toward efficient multiscale simulations and data-driven computational mechanics. REFERENCES [1] L. Zhang, D.-Y. Lin, H. Wang, R. Car and W. E, Active learning of uniformly accurate interatomic potentials, Phys. Rev. Mater., 3, 023804, 2019.