WCCM ECCOMAS 2026

Risk-Optimal Multifidelity Linear Regression for Surrogate Modeling from Scarce Data

Rex, Atticus (Georgia Institute of Technology)
Qian, Elizabeth (Georgia Institute of Technology)

In session: MS305A - The Fast and The Curious: Exploring data-efficient sampling in science and engineering I

Please login to view abstract download link

Across many scientific disciplines, new theory and algorithms enable more accurate high-fidelity simulation, but these advances often come with increased computational cost. When high-fidelity simulation is prohibitively expensive, practitioners often substitute cheaper low-fidelity models, which arise from simplifying assumptions and/or lower spatiotemporal resolutions. These low-fidelity models are often biased and may omit important high-fidelity features. Scientific machine learning (SciML) has garnered interest in replacing expensive, high-fidelity simulations with data-driven surrogate models. These surrogate models aim to be much cheaper than the high-fidelity model, but also more accurate than the low-fidelity models. Without exceeding some computational budget, it is often the case that only a few high-fidelity model evaluations may be available to train a surrogate model. When training data is scarce, learned models typically have high generalization risk—the expected error across the full input domain. In this work, we consider multifidelity linear regression, a SciML approach which combines high- and low-fidelity data to train an unbiased linear surrogate model. This method does not address how to optimally nest the training data, nor how control variate coefficients and sample sizes affect the expected accuracy of the model. Our contributions are as follows: (1) we implement a nesting strategy which leverages correlation across all fidelity levels, (2) we connect the generalization risk of the learned model to control variates and sample allocation, (3) we provide closed-form expressions for risk-optimal control variate coefficients and sample sizes at each fidelity level without exceeding a fixed computational budget, and (4) we demonstrate increased predictive accuracy over existing single- and multifidelity surrogate modeling techniques on a variety of regression problems.