An Optimal Weighted Least-Squares Method for Operator Learning
Please login to view abstract download link
Data-driven surrogate models can accelerate outer-loop tasks---uncertainty quantification, inference, design, and control---where repeated evaluations of high-fidelity PDE solvers are prohibitive. A promising surrogate paradigm is operator learning, which seeks to approximate PDE solution operators (maps between function spaces) from supervised functional data. Many current approaches use neural operators, i.e., neural-network parameterizations of these maps. While effective in some regimes, such methods can be data-hungry and often come with limited approximation and stability guarantees. Motivated by the data-limited setting---where only a small number of expensive training solves is available---we focus on two recurring issues: generalization error and ill-conditioning of the training procedure. We develop a non-neural operator-learning approach based on optimally weighted least squares that targets both issues while retaining near-optimal sample complexity. The analysis is carried out in the Bochner space $L^2_\rho(\mathcal X; \mathcal Y)$, where $\mathcal X$ and $\mathcal Y$ are separable Hilbert spaces of input and output functions and $\rho$ is a prescribed probability measure on the input function space. For an $N$-dimensional approximation space $V\subset L^2_\rho(\mathcal X; \mathcal Y)$ equipped with an orthonormal operator basis, we form the training-input sampling measure $\mu$ using an operator-level Christoffel (leverage) function and compute the corresponding weighted least-squares estimator. This method yields uniformly well-conditioned weighted Gram matrices and requires only $M=\mathcal O(N\log N)$ training pairs for stability. Moreover, we provide high-probability error bounds in the Bochner norm which separate projection error, solver/noise effects, and output discretization error. To make the framework concrete, we construct explicit orthonormal operator families and density results for (i) rank-one linear operators dense in $\mathcal L(X;Y)$, and (ii) rank-one orthogonal-polynomial (polynomial-chaos) operator families dense in $L^2_\rho(\mathcal X; \mathcal Y)$ under mild moment assumptions. Benchmarks on Poisson, viscous Burgers’, and incompressible Navier–Stokes solution operators show substantially improved conditioning relative to standard sampling and reliable accuracy in physically meaningful norms.
