Scalable Natural Gradient Descent for Neural PDE Solvers: Low-Rank Preconditioning and Efficient Implementations

  • Bioli, Ivan (University of Pavia)
  • Sangalli, Giancarlo (University of Pavia)
  • Marcati, Carlo (Institut Camille Jordan)

Please login to view abstract download link

Natural Gradient Descent (NGD) has recently emerged as a powerful optimization strategy for deep-learning-based solvers of partial differential equations (PDEs), notably Physics-Informed Neural Networks (PINNs). By incorporating geometric information from the neural network manifold, NGD can achieve significantly faster convergence in terms of iterations compared to standard first-order optimizers. However, its practical adoption has been limited by the high computational cost associated with forming and inverting the Gramian matrix, which scales cubically with the number of network parameters. In this talk, we present a computationally efficient NGD framework for neural PDE solvers that overcomes these limitations by combining matrix-free implementations with low-rank preconditioning strategies. We extend matrix-free NGD methods to a broad class of neural PDE formulations, including PINNs, Variational PINNs, Finite Element Interpolated Neural Networks, and Robust VPINNs, as well as to general choices of the underlying metric. By exploiting the empirically observed low-rank structure of the Gramian matrix, we develop preconditioners based on randomized numerical linear algebra techniques, such as Nyström approximations and partial pivoted Cholesky factorizations. These preconditioners significantly accelerate the convergence of the inner iterative solvers while keeping memory usage and computational costs under control. We systematically compare different NGD variants—explicit inversion, unpreconditioned matrix-free methods, and various preconditioned approaches—from the perspectives of computational complexity and practical efficiency, identifying regimes in which each strategy is preferable. We also discuss efficient implementations based on automatic differentiation, together with guidelines for integrating NGD into existing optimization and autodiff software frameworks. Finally, we benchmark the proposed methods against state-of-the-art optimizers on a range of PDE problems, demonstrating substantial reductions in training time as well as improved accuracy and robustness. Overall, the results highlight low-rank preconditioned NGD as a scalable and competitive optimization tool for modern neural PDE solvers.