An effective learning rate scheduler for stochastic gradient descent-based deep learning model in healthcare diagnosis system

ABSTRACTIn the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.

Download Full-text

MapReduce and Optimized Deep Network for Rainfall Prediction in Agriculture

The Computer Journal ◽

10.1093/comjnl/bxz164 ◽

2020 ◽

Vol 63 (6) ◽

pp. 900-912

Author(s):

Oswalt Manoj S ◽

Ananth J P

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Prediction Models ◽

Short Term Memory ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Mean Square ◽

Rainfall Prediction ◽

Gradient Descent Algorithm ◽

Major Factors

Abstract Rainfall prediction is the active area of research as it enables the farmers to move with the effective decision-making regarding agriculture in both cultivation and irrigation. The existing prediction models are scary as the prediction of rainfall depended on three major factors including the humidity, rainfall and rainfall recorded in the previous years, which resulted in huge time consumption and leveraged huge computational efforts associated with the analysis. Thus, this paper introduces the rainfall prediction model based on the deep learning network, convolutional long short-term memory (convLSTM) system, which promises a prediction based on the spatial-temporal patterns. The weights of the convLSTM are tuned optimally using the proposed Salp-stochastic gradient descent algorithm (S-SGD), which is the integration of Salp swarm algorithm (SSA) in the stochastic gradient descent (SGD) algorithm in order to facilitate the global optimal tuning of the weights and to assure a better prediction accuracy. On the other hand, the proposed deep learning framework is built in the MapReduce framework that enables the effective handling of the big data. The analysis using the rainfall prediction database reveals that the proposed model acquired the minimal mean square error (MSE) and percentage root mean square difference (PRD) of 0.001 and 0.0021.

Download Full-text

A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) ◽

10.1109/ccgrid.2019.00053 ◽

2019 ◽

Author(s):

Wenbin Jiang ◽

Geyan Ye ◽

Laurence T. Yang ◽

Jian Zhu ◽

Yang Ma ◽

...

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Heterogeneous Cluster ◽

Cluster Systems ◽

Descent Algorithm ◽

Gradient Descent Algorithm

Download Full-text

An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep-Learning Applications

2017 46th International Conference on Parallel Processing (ICPP) ◽

10.1109/icpp.2017.10 ◽

2017 ◽

Cited By ~ 2

Author(s):

Guojing Cong ◽

Onkar Bhardwaj ◽

Minwei Feng

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Descent Algorithm ◽

Gradient Descent Algorithm

Download Full-text

Local Regularizer Improves Generalization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6167 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6861-6868 ◽

Cited By ~ 1

Author(s):

Yikai Zhang ◽

Hui Qu ◽

Dimitris Metaxas ◽

Chao Chen

Keyword(s):

Deep Learning ◽

Theoretical Analysis ◽

Experimental Evidence ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Training Algorithms ◽

Training Methods ◽

Theoretical Understanding ◽

Better Than

Regularization plays an important role in generalization of deep learning. In this paper, we study the generalization power of an unbiased regularizor for training algorithms in deep learning. We focus on training methods called Locally Regularized Stochastic Gradient Descent (LRSGD). An LRSGD leverages a proximal type penalty in gradient descent steps to regularize SGD in training. We show that by carefully choosing relevant parameters, LRSGD generalizes better than SGD. Our thorough theoretical analysis is supported by experimental evidence. It advances our theoretical understanding of deep learning and provides new perspectives on designing training algorithms. The code is available at https://github.com/huiqu18/LRSGD.

Download Full-text

Extracting Grain Orientations from EBSD Patterns of Polycrystalline Materials Using Convolutional Neural Networks

Microscopy and Microanalysis ◽

10.1017/s1431927618015131 ◽

2018 ◽

Vol 24 (5) ◽

pp. 497-502 ◽

Cited By ~ 18

Author(s):

Dipendra Jha ◽

Saransh Singh ◽

Reda Al-Bahrani ◽

Wei-keng Liao ◽

Alok Choudhary ◽

...

Keyword(s):

Deep Learning ◽

Network Architecture ◽

Crystal Orientation ◽

Electron Backscatter Diffraction ◽

Ground Truth ◽

Learning Model ◽

Stochastic Gradient Descent ◽

Polycrystalline Materials ◽

Polycrystalline Nickel ◽

Deep Learning Model

AbstractWe present a deep learning approach to the indexing of electron backscatter diffraction (EBSD) patterns. We design and implement a deep convolutional neural network architecture to predict crystal orientation from the EBSD patterns. We design a differentiable approximation to the disorientation function between the predicted crystal orientation and the ground truth; the deep learning model optimizes for the mean disorientation error between the predicted crystal orientation and the ground truth using stochastic gradient descent. The deep learning model is trained using 374,852 EBSD patterns of polycrystalline nickel from simulation and evaluated using 1,000 experimental EBSD patterns of polycrystalline nickel. The deep learning model results in a mean disorientation error of 0.548° compared to 0.652° using dictionary based indexing.

Download Full-text

Auto-Ensemble: An Adaptive Learning Rate Scheduling Based Deep Learning Model Ensembling

IEEE Access ◽

10.1109/access.2020.3041525 ◽

2020 ◽

Vol 8 ◽

pp. 217499-217509

Author(s):

Jun Yang ◽

Fei Wang

Keyword(s):

Deep Learning ◽

Adaptive Learning ◽

Learning Model ◽

Learning Rate ◽

Adaptive Learning Rate ◽

Rate Scheduling ◽

Deep Learning Model

Download Full-text

A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning

2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) ◽

10.1109/padsw.2018.8644932 ◽

2018 ◽

Cited By ~ 3

Author(s):

Shaohuai Shi ◽

Qiang Wang ◽

Xiaowen Chu ◽

Bo Li

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Dag Model

Download Full-text

Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks

Entropy ◽

10.3390/e22050560 ◽

2020 ◽

Vol 22 (5) ◽

pp. 560

Author(s):

Shrihari Vasudevan

Keyword(s):

Neural Networks ◽

Mutual Information ◽

Gradient Descent ◽

Deep Neural Networks ◽

Learning Rate ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Novel Approach ◽

The Neural Network ◽

Gradient Based

This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.

Download Full-text