Additively Preconditioned Trust-Region Methods for Large-Scale Training Problems
Please login to view abstract download link
Training deep neural networks (NNs) gives rise to large-scale, highly nonconvex optimization problems, making optimizer performance sensitive to hyperparameter choice. We propose an Additively Preconditioned Trust-Region Strategy (IAPTS) that combines (i) a nonlinear right-preconditioner assembled from parallel subdomain solves with (ii) a global trust-region (TR) globalization mechanism. Building on this framework, we further introduce a non-monotone variant (NAPTS) that improves acceptance of effective global directions by avoiding needless rejection of effective coarse steps. Concretely, the model is partitioned into subdomains distributed across independent devices. In each preconditioner iteration, we optimize each subdomain in parallel to obtain local nonlinear corrections, and we assemble them additively into a global preconditioned trial step. A subsequent global TR iteration evaluates the full objective and updates the radius, reducing sensitivity to learning-rate tuning. In NAPTS, we introduce a windowed non-monotone trust-region acceptance rule to accept steps more frequently and thereby reduce computational cost. This avoids rejecting coarse steps that improve long-range coupling solely because they are not immediately improving. In our experiments, NAPTS preserves accuracy while reducing CPU time by 30%.
