scholarly journals RALR: Random Amplify Learning Rates for Training Neural Networks

2021 ◽  
Vol 12 (1) ◽  
pp. 268
Author(s):  
Jiali Deng ◽  
Haigang Gong ◽  
Minghui Liu ◽  
Tianshu Xie ◽  
Xuan Cheng ◽  
...  

It has been shown that the learning rate is one of the most critical hyper-parameters for the overall performance of deep neural networks. In this paper, we propose a new method for setting the global learning rate, named random amplify learning rates (RALR), to improve the performance of any optimizer in training deep neural networks. Instead of monotonically decreasing the learning rate, we expect to escape saddle points or local minima by amplifying the learning rate between reasonable boundary values based on a given probability. Training with RALR rather than conventionally decreasing the learning rate achieves further improvement on networks’ performance without extra consumption. Remarkably, the RALR is complementary with state-of-the-art data augmentation and regularization methods. Besides, we empirically study its performance on image classification tasks, fine-grained classification tasks, object detection tasks, and machine translation tasks. Experiments demonstrate that RALR can bring a notable improvement while preventing overfitting when training deep neural networks. For example, the classification accuracy of ResNet-110 trained on the CIFAR-100 dataset using RALR achieves a 1.34% gain compared with ResNet-110 trained traditionally.

Author(s):  
Alex Hernández-García ◽  
Johannes Mehrer ◽  
Nikolaus Kriegeskorte ◽  
Peter König ◽  
Tim C. Kietzmann

2019 ◽  
Vol 134 ◽  
pp. 53-65 ◽  
Author(s):  
Paolo Vecchiotti ◽  
Giovanni Pepe ◽  
Emanuele Principi ◽  
Stefano Squartini

2021 ◽  
Author(s):  
Tianyi Liu ◽  
Zhehui Chen ◽  
Enlu Zhou ◽  
Tuo Zhao

Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference, etc.). Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.


2019 ◽  
Vol 28 (6) ◽  
pp. 1177-1183
Author(s):  
Pengyuan Zhang ◽  
Hangting Chen ◽  
Haichuan Bai ◽  
Qingsheng Yuan

2021 ◽  
Vol 5 (3) ◽  
pp. 1-10
Author(s):  
Melih Öz ◽  
Taner Danışman ◽  
Melih Günay ◽  
Esra Zekiye Şanal ◽  
Özgür Duman ◽  
...  

The human eye contains valuable information about an individual’s identity and health. Therefore, segmenting the eye into distinct regions is an essential step towards gathering this useful information precisely. The main challenges in segmenting the human eye include low light conditions, reflections on the eye, variations in the eyelid, and head positions that make an eye image hard to segment. For this reason, there is a need for deep neural networks, which are preferred due to their success in segmentation problems. However, deep neural networks need a large amount of manually annotated data to be trained. Manual annotation is a labor-intensive task, and to tackle this problem, we used data augmentation methods to improve synthetic data. In this paper, we detail the exploration of the scenario, which, with limited data, whether performance can be enhanced using similar context data with image augmentation methods. Our training and test set consists of 3D synthetic eye images generated from the UnityEyes application and manually annotated real-life eye images, respectively. We examined the effect of using synthetic eye images with the Deeplabv3+ network in different conditions using image augmentation methods on the synthetic data. According to our experiments, the network trained with processed synthetic images beside real-life images produced better mIoU results than the network, which only trained with real-life images in the Base dataset. We also observed mIoU increase in the test set we created from MICHE II competition images.


Sign in / Sign up

Export Citation Format

Share Document