Multilevel Initialization for Layer-Parallel Deep Neural
Network Training
This paper investigates multilevel initializa- tion strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on a continuous interpretation of the training problem as an optimal control problem, in which neu- ral networks are represented as discretizations of time- dependent ordinary differential equations. A key goal is to develop a method able to intelligently initialize the network parameters for the very deep networks en- abled by scalable layer-parallel training. To do this, we apply a uniform refinement strategy across the time domain, that is equivalent to refining in the layer di- mension. This refinement algorithm builds good ini- tializations for deep networks with network parameters coming from the coarser trained networks. The effec- tiveness of multilevel strategies (called nested iteration) for training is investigated using the Peaks and Indian Pines classification data sets. In both cases, the vali- dation accuracy achieved by nested iteration is higher than non-nested training. Moreover, run time to achieve the same validation accuracy is reduced. For instance, the Indian Pines example takes around 25% less time to train with the nested iteration algorithm. Finally, using the Peaks problem, we present preliminary anec- dotal evidence that the initialization strategy provides a regularizing effect on the training process, reducing sensitivity to hyperparameters and randomness in ini- tial network parameters.