Fractional Stochastic Gradient Descent Based Learning Algorithm For Multi-layer Perceptron Neural Networks

In this paper, we present a handwritten character recognition (HCR) system that aims to recognize first-order logic handwritten formulas and create editable text files of the recognized formulas. Dense feedforward neural networks (NNs) are utilized, and their performance is examined under various training conditions and methods. More specifically, after three training algorithms (backpropagation, resilient propagation and stochastic gradient descent) had been tested, we created and trained an NN with the stochastic gradient descent algorithm, optimized by the Adam update rule, which was proved to be the best, using a trainset of 16,750 handwritten image samples of 28 × 28 each and a testset of 7947 samples. The final accuracy achieved is 90.13%. The general methodology followed consists of two stages: the image processing and the NN design and training. Finally, an application has been created that implements the methodology and automatically recognizes handwritten logic formulas. An interesting feature of the application is that it allows for creating new, user-oriented training sets and parameter settings, and thus new NN models.

Download Full-text

A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

Stochastic Systems ◽

10.1287/stsy.2021.0083 ◽

2021 ◽

Author(s):

Tianyi Liu ◽

Zhehui Chen ◽

Enlu Zhou ◽

Tuo Zhao

Keyword(s):

Neural Networks ◽

Nonconvex Optimization ◽

Gradient Descent ◽

Deep Neural Networks ◽

Optimization Problems ◽

Saddle Points ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Nonconvex Optimization Problems ◽

Empirical Success

Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference, etc.). Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

Download Full-text

Introducing a novel multi-layer perceptron network based on stochastic gradient descent optimized by a meta-heuristic algorithm for landslide susceptibility mapping

The Science of The Total Environment ◽

10.1016/j.scitotenv.2020.140549 ◽

2020 ◽

Vol 742 ◽

pp. 140549 ◽

Cited By ~ 2

Author(s):

Haoyuan Hong ◽

Paraskevas Tsangaratos ◽

Ioanna Ilia ◽

Constantinos Loupasakis ◽

Yi Wang

Keyword(s):

Heuristic Algorithm ◽

Landslide Susceptibility ◽

Gradient Descent ◽

Susceptibility Mapping ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Landslide Susceptibility Mapping ◽

Multi Layer Perceptron

Download Full-text

Deep Convolutional Spiking Neural Networks for Image Classification

10.18122/td.1782.boisestate ◽

2021 ◽

Author(s):

Ruthvik Vaila

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Networks ◽

Gradient Descent ◽

Stochastic Gradient ◽

Spiking Neural Networks ◽

Stochastic Gradient Descent ◽

Data Set ◽

Learning Capabilities ◽

Artificial Neural

Spiking neural networks are biologically plausible counterparts of artificial neural networks. Artificial neural networks are usually trained with stochastic gradient descent (SGD) and spiking neural networks are trained with bioinspired spike timing dependent plasticity (STDP). Spiking networks could potentially help in reducing power usage owing to their binary activations. In this work, we use unsupervised STDP in the feature extraction layers of a neural network with instantaneous neurons to extract meaningful features. The extracted binary feature vectors are then classified using classification layers containing neurons with binary activations. Gradient descent (backpropagation) is used only on the output layer to perform training for classification. Surrogate gradients are proposed to perform backpropagation with binary gradients. The accuracies obtained for MNIST and the balanced EMNIST data set compare favorably with other approaches. The effect of the stochastic gradient descent (SGD) approximations on learning capabilities of our network are also explored. We also studied catastrophic forgetting and its effect on spiking neural networks (SNNs). For the experiments regarding catastrophic forgetting, in the classification sections of the network we use a modified synaptic intelligence that we refer to as cost per synapse metric as a regularizer to immunize the network against catastrophic forgetting in a Single-Incremental-Task scenario (SIT). In catastrophic forgetting experiments, we use MNIST and EMNIST handwritten digits datasets that were divided into five and ten incremental subtasks respectively. We also examine behavior of the spiking neural network and empirically study the effect of various hyperparameters on its learning capabilities using the software tool SPYKEFLOW that we developed. We employ MNIST, EMNIST and NMNIST data sets to produce our results.

Download Full-text

Asymptotics of Reinforcement Learning with Neural Networks

Stochastic Systems ◽

10.1287/stsy.2021.0072 ◽

2021 ◽

Author(s):

Justin Sirignano ◽

Konstantinos Spiliopoulos

Keyword(s):

Differential Equation ◽

Neural Networks ◽

Stationary Solution ◽

Gradient Descent ◽

Learning Algorithm ◽

Single Layer ◽

Stochastic Gradient Descent ◽

Distributed Data ◽

Limiting Behavior ◽

Q Learning

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.

Download Full-text

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup

Journal of Statistical Mechanics Theory and Experiment ◽

10.1088/1742-5468/abc61e ◽

2020 ◽

Vol 2020 (12) ◽

pp. 124010

Author(s):

Sebastian Goldt ◽

Madhu S Advani ◽

Andrew M Saxe ◽

Florent Krzakala ◽

Lenka Zdeborová

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Teacher Student

Download Full-text

A Unit Softmax with Laplacian Smoothing Stochastic Gradient Descent for Deep Convolutional Neural Networks

Communications in Computer and Information Science - Intelligent Technologies and Applications ◽

10.1007/978-981-15-5232-8_14 ◽

2020 ◽

pp. 162-174

Author(s):

Jamshaid Ul Rahman ◽

Akhtar Ali ◽

Masood Ur Rehman ◽

Rafaqat Kazmi

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Deep Convolutional Neural Networks ◽

Laplacian Smoothing

Download Full-text

An Enhanced Stochastic Gradient Descent Variance Reduced Ascension Optimization Algorithm for Deep Neural Networks

Applied Computer Vision and Image Processing - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-4029-5_38 ◽

2020 ◽

pp. 378-385

Author(s):

Arifa Shikalgar ◽

Shefali Sonavane

Keyword(s):

Neural Networks ◽

Optimization Algorithm ◽

Gradient Descent ◽

Deep Neural Networks ◽

Stochastic Gradient ◽

Stochastic Gradient Descent

Download Full-text

Predicting Kids Malnutrition Using Multilayer Perceptron with Stochastic Gradient Descent

Revue d intelligence artificielle ◽

10.18280/ria.340514 ◽

2020 ◽

Vol 34 (5) ◽

pp. 631-636

Author(s):

Sama Ranjeeth ◽

Thamarai Pugazhendhi Latchoumi

Keyword(s):

Feature Selection ◽

Multilayer Perceptron ◽

Gradient Descent ◽

Andhra Pradesh ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Multi Layer Perceptron ◽

Sample Data ◽

Remedial Actions ◽

Mlp Classifier

The capability of predicting malnutrition kids is highly beneficial to take remedial actions on kids who are under 5 year’s age. In this article, Kid’s malnutrition predictive model is created and tested with our own collected dataset. We find the issues of kids malnutrition by the use of Machine Learning (ML) models. From ML-models, a multi-layer perceptron is used to classify the data neatly. Optimizing technique stochastic gradient descent (SGD) and Multilayer Perceptron (MLP) classifier methods are integrated to classify the data more effectively. To select the best features, from the feature selection (FS) technique filter-based method used. After selecting the best features, selected features are pass to the classifier model then the model will classify the data. Results with the MLP-SGD classifier were good than the other classifiers but after feature selection, the performance of the model was increased more. It will help in improving the analysis of malnutrition kid’s data. The sample data are collected from parents who are having kids less than five years of age at Repalle town, Andhra Pradesh, India.

Download Full-text