Multilevel Training and Preconditioning for Kolmogorov-Arnold Networks
Please login to view abstract download link
One of the greatest challenges facing training of large-scale neural networks is the poor algorithmic scaling of training due to the lack of structure guaranteed by the function compositions in such networks. In contrast to the commonly used multilayer perceptrons (MLPs), Kolmogorov-Arnold Networks (KANs) are a class of neural networks which write network activation functions over a fixed basis. Training KANs then learns the coefficients of the basis expansion, allowing for better interpretability of network dynamics. The original formulation of KANs presented a multilevel approach utilizing a least-squares transfer between bases. In this work, we show that nested bases such as B-splines have a "properly-nested hierarchy" using geometric multigrid techniques. This allows one to construct a new KAN with double the number of parameters while retaining the learned structure of the previous model through the basis transfer operator. This greatly improves the algorithmic scaling of the training process. Additionally, we show there is a change of basis between B-splines and ReLUs to powers which establishes an equivalence between multichannel MLPs and KANs. This linear transformation shows the spline basis greatly accelerates the training process, and we show how these insights may be used to develop preconditioners for gradient descent methods. We present our results for multilevel methods and preconditioning on a variety of different applications including regression, physics-informed neural networks, and transformers. Additionally, we discuss extensions of these approaches to more general geometric mappings and more advanced multilevel approaches.
