Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks

In this paper, we present a handwritten character recognition (HCR) system that aims to recognize first-order logic handwritten formulas and create editable text files of the recognized formulas. Dense feedforward neural networks (NNs) are utilized, and their performance is examined under various training conditions and methods. More specifically, after three training algorithms (backpropagation, resilient propagation and stochastic gradient descent) had been tested, we created and trained an NN with the stochastic gradient descent algorithm, optimized by the Adam update rule, which was proved to be the best, using a trainset of 16,750 handwritten image samples of 28 × 28 each and a testset of 7947 samples. The final accuracy achieved is 90.13%. The general methodology followed consists of two stages: the image processing and the NN design and training. Finally, an application has been created that implements the methodology and automatically recognizes handwritten logic formulas. An interesting feature of the application is that it allows for creating new, user-oriented training sets and parameter settings, and thus new NN models.

Download Full-text

A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

Stochastic Systems ◽

10.1287/stsy.2021.0083 ◽

2021 ◽

Author(s):

Tianyi Liu ◽

Zhehui Chen ◽

Enlu Zhou ◽

Tuo Zhao

Keyword(s):

Neural Networks ◽

Nonconvex Optimization ◽

Gradient Descent ◽

Deep Neural Networks ◽

Optimization Problems ◽

Saddle Points ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Nonconvex Optimization Problems ◽

Empirical Success

Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference, etc.). Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

Download Full-text

A Stochastic Gradient Descent Algorithm for Structural Risk Minimisation

Lecture Notes in Computer Science - Algorithmic Learning Theory ◽

10.1007/978-3-540-39624-6_17 ◽

2003 ◽

pp. 205-220 ◽

Cited By ~ 1

Author(s):

Joel Ratsaby

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Risk Minimisation ◽

Descent Algorithm ◽

Gradient Descent Algorithm ◽

Structural Risk

Download Full-text

A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) ◽

10.1109/ccgrid.2019.00053 ◽

2019 ◽

Author(s):

Wenbin Jiang ◽

Geyan Ye ◽

Laurence T. Yang ◽

Jian Zhu ◽

Yang Ma ◽

...

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Heterogeneous Cluster ◽

Cluster Systems ◽

Descent Algorithm ◽

Gradient Descent Algorithm

Download Full-text

An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep-Learning Applications

2017 46th International Conference on Parallel Processing (ICPP) ◽

10.1109/icpp.2017.10 ◽

2017 ◽

Cited By ~ 2

Author(s):

Guojing Cong ◽

Onkar Bhardwaj ◽

Minwei Feng

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Descent Algorithm ◽

Gradient Descent Algorithm

Download Full-text

Parameter calibration with stochastic gradient descent for interacting particle systems driven by neural networks

Mathematics of Control Signals and Systems ◽

10.1007/s00498-021-00309-8 ◽

2021 ◽

Author(s):

Simone Göttlich ◽

Claudia Totzeck

Keyword(s):

Gradient Descent ◽

Interacting Particle Systems ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Force Model ◽

Data Sets ◽

Optimal Controls ◽

Parameter Calibration ◽

Descent Algorithm ◽

Gradient Descent Algorithm

AbstractWe propose a neural network approach to model general interaction dynamics and an adjoint-based stochastic gradient descent algorithm to calibrate its parameters. The parameter calibration problem is considered as optimal control problem that is investigated from a theoretical and numerical point of view. We prove the existence of optimal controls, derive the corresponding first-order optimality system and formulate a stochastic gradient descent algorithm to identify parameters for given data sets. To validate the approach, we use real data sets from traffic and crowd dynamics to fit the parameters. The results are compared to forces corresponding to well-known interaction models such as the Lighthill–Whitham–Richards model for traffic and the social force model for crowd motion.

Download Full-text