Majorization Minimization for Neural Network Training
Please login to view abstract download link
We extend a majorization minimization method for training neural networks with piecewise affine activations. The method is provably convergent. It relaxes the multicomposite structure of neural networks by lifting the training problem to a higher-dimensional space. We extend the method to a broader class of losses that includes cross-entropy and to a proximal trust region method for the majorization minimization.
