VPBoost: Accelerating Gradient Boosting with Variable Projection for Separable Neural Networks
Please login to view abstract download link
Gradient boosting (GB) constructs predictive models by sequentially training weak learners that serve as descent directions in function space. While highly successful for decision trees, particularly through frameworks such as XGBoost, applying GB to neural networks (NNs) remains computationally expensive due to the high-dimensional, non-convex optimization required for each weak learner. Recent advances demonstrate that boosted NNs can outperform decision trees on complex data, yet training efficiency remains a critical bottleneck. Variable projection (VarPro) has emerged as a powerful optimization strategy for separable NNs, which decompose into nonlinear feature extraction followed by linear mapping. By analytically eliminating the linear weights through partial optimization, VarPro reduces the parameter space and accelerates convergence. We propose VPBoost, a novel algorithm that combines VarPro with an XGBoost's algorithm for gradient boosting. For each weak learner, VPBoost solves a reduced optimization problem over only the nonlinear parameters while maintaining optimal linear weights analytically. This exploitation of separability yields a bi-level optimization framework where the outer problem minimizes over featurizer parameters and the inner problem provides closed-form solutions for linear weights. Our methodology establishes convergence guarantees analogous to standard gradient descent while achieving faster training per weak learner and requiring fewer ensemble members. Moreover, gradient boosting integrates naturally with VarPro, reducing the computational and implementation complexity barrier that VarPro traditionally presents. Numerical experiments demonstrate substantial computational savings and improved generalization compared to traditional gradient boosting approaches for NNs, positioning VPBoost as an efficient alternative for scientific machine learning and data-intensive applications.
