Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistance to Local Minima

In this paper we discuss the asymptotic properties of the most commonly used variant of the backpropagation algorithm in which network weights are trained by means of a local gradient descent on examples drawn randomly from a fixed training set, and the learning rate η of the gradient updates is held constant (simple backpropagation). Using stochastic approximation results, we show that for η → 0 this training process approaches a batch training. Further, we show that for small η one can approximate simple backpropagation by the sum of a batch training process and a gaussian diffusion, which is the unique solution to a linear stochastic differential equation. Using this approximation we indicate the reasons why simple backpropagation is less likely to get stuck in local minima than the batch training process and demonstrate this empirically on a number of examples.

Download Full-text

Momentum Backpropagation Optimization for Cancer Detection Based on DNA Microarray Data

International Journal of Artificial Intelligence Research ◽

10.29099/ijair.v4i2.188 ◽

2021 ◽

Vol 4 (2) ◽

pp. 127

Author(s):

Untari Novia Wisesty ◽

Febryanti Sthevanie ◽

Rita Rismala

Keyword(s):

Convergence Rate ◽

Adaptive Learning ◽

Dna Microarrays ◽

Feature Selection Method ◽

Classification Performance ◽

Learning Rate ◽

Training Process ◽

Backpropagation Algorithm ◽

Cancer Data ◽

Dna Microarray Data

Early detection of cancer can increase the success of treatment in patients with cancer. In the latest research, cancer can be detected through DNA Microarrays. Someone who suffers from cancer will experience changes in the value of certain gene expression. In previous studies, the Genetic Algorithm as a feature selection method and the Momentum Backpropagation algorithm as a classification method provide a fairly high classification performance, but the Momentum Backpropagation algorithm still has a low convergence rate because the learning rate used is still static. The low convergence rate makes the training process need more time to converge. Therefore, in this research an optimization of the Momentum Backpropagation algorithm is done by adding an adaptive learning rate scheme. The proposed scheme is proven to reduce the number of epochs needed in the training process from 390 epochs to 76 epochs compared to the Momentum Backpropagation algorithm. The proposed scheme can gain high accuracy of 90.51% for Colon Tumor data, and 100% for Leukemia, Lung Cancer, and Ovarian Cancer data.

Download Full-text

SOLVING LOCAL MINIMA PROBLEM IN BACK PROPAGATION ALGORITHM USING ADAPTIVE GAIN, ADAPTIVE MOMENTUM AND ADAPTIVE LEARNING RATE ON CLASSIFICATION PROBLEMS

International Journal of Modern Physics Conference Series ◽

10.1142/s2010194512005533 ◽

2012 ◽

Vol 09 ◽

pp. 448-455 ◽

Cited By ~ 5

Author(s):

NORHAMREEZA ABDUL HAMID ◽

NAZRI MOHD NAWI ◽

ROZAIDA GHAZALI ◽

MOHD NAJIB MOHD SALLEH

Keyword(s):

Adaptive Learning ◽

Gradient Descent ◽

Back Propagation ◽

Learning Rate ◽

Benchmark Problems ◽

Local Minima ◽

Back Propagation Algorithm ◽

Propagation Algorithm ◽

Hidden Layer ◽

Adaptive Momentum

This paper presents a new method to improve back propagation algorithm from getting stuck with local minima problem and slow convergence speeds which caused by neuron saturation in the hidden layer. In this proposed algorithm, each training pattern has its own activation functions of neurons in the hidden layer that are adjusted by the adaptation of gain parameters together with adaptive momentum and learning rate value during the learning process. The efficiency of the proposed algorithm is compared with the conventional back propagation gradient descent and the current working back propagation gradient descent with adaptive gain by means of simulation on three benchmark problems namely iris, glass and thyroid.

Download Full-text

Csiszar's Generalized Error Measures for Gradient-descent-based Optimizations in Neural Networks Using the Backpropagation Algorithm

Connection Science ◽

10.1080/095400996116965 ◽

1996 ◽

Vol 8 (1) ◽

pp. 79-114 ◽

Cited By ~ 4

Author(s):

P. S NEELAKANTA

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Backpropagation Algorithm ◽

Error Measures

Download Full-text

A Hybrid GA-PSO Method for Evolving Architecture and Short Connections of Deep Convolutional Neural Networks

10.26686/wgtn.13158299.v1 ◽

2020 ◽

Author(s):

B Wang ◽

Y Sun ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Neural Networks ◽

Image Classification ◽

Convolutional Neural Networks ◽

Network Architecture ◽

Learning Task ◽

Fixed Number ◽

Learning Rate ◽

Current Layer ◽

Training Process ◽

Deep Convolutional Neural Networks

© 2019, Springer Nature Switzerland AG. Image classification is a difficult machine learning task, where Convolutional Neural Networks (CNNs) have been applied for over 20 years in order to solve the problem. In recent years, instead of the traditional way of only connecting the current layer with its next layer, shortcut connections have been proposed to connect the current layer with its forward layers apart from its next layer, which has been proved to be able to facilitate the training process of deep CNNs. However, there are various ways to build the shortcut connections, it is hard to manually design the best shortcut connections when solving a particular problem, especially given the design of the network architecture is already very challenging. In this paper, a hybrid evolutionary computation (EC) method is proposed to automatically evolve both the architecture of deep CNNs and the shortcut connections. Three major contributions of this work are: Firstly, a new encoding strategy is proposed to encode a CNN, where the architecture and the shortcut connections are encoded separately; Secondly, a hybrid two-level EC method, which combines particle swarm optimisation and genetic algorithms, is developed to search for the optimal CNNs; Lastly, an adjustable learning rate is introduced for the fitness evaluations, which provides a better learning rate for the training process given a fixed number of epochs. The proposed algorithm is evaluated on three widely used benchmark datasets of image classification and compared with 12 peer Non-EC based competitors and one EC based competitor. The experimental results demonstrate that the proposed method outperforms all of the peer competitors in terms of classification accuracy.

Download Full-text

PENENTUAN PARAMETER LEARNING RATE SELAMA PEMBELAJARAN JARINGAN SYARAF TIRUAN BACKPROPAGATION MENGGUNAKAN ALGORITMA GENETIKA

Jurnal Teknologi Informasi Jurnal Keilmuan dan Aplikasi Bidang Teknik Informatika ◽

10.47111/jti.v14i2.1141 ◽

2020 ◽

Vol 14 (2) ◽

pp. 202-212

Author(s):

Ade chandra Saputra

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Genetic Algorithms ◽

Fitness Function ◽

Learning Rate ◽

Local Minima ◽

Rate Parameter ◽

Neural Network Learning ◽

Value Of Learning ◽

Backpropagation Learning

One of the weakness in backpropagation Artificial neural network(ANN) is being stuck in local minima. Learning rate parameter is an important parameter in order to determine how fast the ANN Learning. This research is conducted to determine a method of finding the value of learning rate parameter using a genetic algorithm when neural network learning stops and the error value is not reached the stopping criteria or has not reached the convergence. Genetic algorithm is used to determine the value of learning rate used is based on the calculation of the fitness function with the input of the ANN weights, gradient error, and bias. The calculation of the fitness function will produce an error value of each learning rate which represents each candidate solutions or individual genetic algorithms. Each individual is determined by sum of squared error value. One with the smallest SSE is the best individual. The value of learning rate has chosen will be used to continue learning so that it can lower the value of the error or speed up the learning towards convergence. The final result of this study is to provide a new solution to resolve the problem in the backpropagation learning that often have problems in determining the learning parameters. These results indicate that the method of genetic algorithms can provide a solution for backpropagation learning in order to decrease the value of SSE when learning of ANN has been static in large error conditions, or stuck in local minima

Download Full-text

Enhancement of Multilayer Perceptron Model Training Accuracy through the Optimization of Hyperparameters: A Case Study of the Quality Prediction of Injection Molded Parts

10.21203/rs.3.rs-685234/v1 ◽

2021 ◽

Author(s):

Kun-Cheng Ke ◽

Ming-Shyan Huang

Keyword(s):

Gradient Descent ◽

Training Model ◽

Learning Rate ◽

Stochastic Gradient Descent ◽

Activation Functions ◽

Molded Parts ◽

Injection Molded ◽

Testing Accuracy ◽

Model Training

Abstract Injection molding has been broadly used in the mass production of plastic parts and must meet the requirements of efficiency and quality consistency. Machine learning can effectively predict the quality of injection molded part. However, the performance of machine learning models largely depends on the accuracy of the training. Hyperparameters such as activation functions, momentum, and learning rate are crucial to the accuracy and efficiency of model training. This research further analyzed the influence of hyperparameters on testing accuracy, explored the corresponding optimal learning rate, and provided the optimal training model for predicting the quality of injection molded parts. In this study, stochastic gradient descent (SGD) and stochastic gradient descent with momentum were used to optimize the artificial neural network model. Through optimization of these training model hyperparameters, the width testing accuracy of the injection product improved. The experimental results indicated that in the absence of momentum effects, all five activation functions can achieve more than 90% of the training accuracy with a learning rate of 0.1. Moreover, when optimized with the SGD, the learning rate of the Sigmoid activation function was 0.1, and the testing accuracy reached 95.8%. Although momentum had the least influence on accuracy, it affected the convergence speed of the Sigmoid function, which reduced the number of required learning iterations (82.4% reduction rate). Optimizing hyperparameter settings can improve the accuracy of model testing and markedly reduce training time.

Download Full-text

Uncertainty Quantification and Optimization of Deep Learning for Fracture Recognition

10.2118/204863-ms ◽

2021 ◽

Author(s):

Ryan Santoso ◽

Xupeng He ◽

Marwa Alsinan ◽

Hyung Kwak ◽

Hussein Hoteit

Keyword(s):

Deep Learning ◽

Uncertainty Quantification ◽

Learning Rate ◽

Batch Size ◽

Global Maximum ◽

Fractured Reservoir ◽

Training Set ◽

Fast Learning ◽

Learning Rates ◽

Aleatoric Uncertainty

Abstract Automatic fracture recognition from borehole images or outcrops is applicable for the construction of fractured reservoir models. Deep learning for fracture recognition is subject to uncertainty due to sparse and imbalanced training set, and random initialization. We present a new workflow to optimize a deep learning model under uncertainty using U-Net. We consider both epistemic and aleatoric uncertainty of the model. We propose a U-Net architecture by inserting dropout layer after every "weighting" layer. We vary the dropout probability to investigate its impact on the uncertainty response. We build the training set and assign uniform distribution for each training parameter, such as the number of epochs, batch size, and learning rate. We then perform uncertainty quantification by running the model multiple times for each realization, where we capture the aleatoric response. In this approach, which is based on Monte Carlo Dropout, the variance map and F1-scores are utilized to evaluate the need to craft additional augmentations or stop the process. This work demonstrates the existence of uncertainty within the deep learning caused by sparse and imbalanced training sets. This issue leads to unstable predictions. The overall responses are accommodated in the form of aleatoric uncertainty. Our workflow utilizes the uncertainty response (variance map) as a measure to craft additional augmentations in the training set. High variance in certain features denotes the need to add new augmented images containing the features, either through affine transformation (rotation, translation, and scaling) or utilizing similar images. The augmentation improves the accuracy of the prediction, reduces the variance prediction, and stabilizes the output. Architecture, number of epochs, batch size, and learning rate are optimized under a fixed-uncertain training set. We perform the optimization by searching the global maximum of accuracy after running multiple realizations. Besides the quality of the training set, the learning rate is the heavy-hitter in the optimization process. The selected learning rate controls the diffusion of information in the model. Under the imbalanced condition, fast learning rates cause the model to miss the main features. The other challenge in fracture recognition on a real outcrop is to optimally pick the parental images to generate the initial training set. We suggest picking images from multiple sides of the outcrop, which shows significant variations of the features. This technique is needed to avoid long iteration within the workflow. We introduce a new approach to address the uncertainties associated with the training process and with the physical problem. The proposed approach is general in concept and can be applied to various deep-learning problems in geoscience.

Download Full-text

Preliminary classification of mass spectral patterns using a simplified learning machine

Australian Journal of Chemistry ◽

10.1071/ch9731955 ◽

1973 ◽

Vol 26 (9) ◽

pp. 1955 ◽

Cited By ~ 1

Author(s):

RJ Mathews

Keyword(s):

Mass Spectra ◽

Functional Group ◽

Training Process ◽

Training Set ◽

Mass Spectral ◽

Learning Machine ◽

Spectral Patterns ◽

Feature Pattern ◽

Preliminary Classification

A simplified learning machine technique is presented which is suitable for forming preliminary classification of patterns. This technique allows preliminary compression of data prior to the training process, and generates a reliable classifier even when there are linearly inseparable data in the training set. This method has been used to form an eight-feature pattern classifier which identifies, directly from their mass spectra, compounds of the structure (RO)2P(=X)Y where R is H, Me, or Et; X is O or S; and Y is any functional group.

Download Full-text

MOTION DETECTION AND DIRECTION DETECTION IN LOCAL NEURAL NETS

International Journal of Neural Systems ◽

10.1142/s0129065789000098 ◽

1989 ◽

Vol 01 (02) ◽

pp. 187-192 ◽

Cited By ~ 4

Author(s):

H.-U. Bauer ◽

T. Geisel

Keyword(s):

Motion Detection ◽

Pulse Velocity ◽

Neural Nets ◽

Global Network ◽

Training Set ◽

Backpropagation Algorithm ◽

Short Term ◽

Memory Time ◽

Direction Detection ◽

Order Of Magnitude

We present a model for motion and direction detection of moving pulses whose performance is independent of pulse velocity, size and shape. The input signal activates one row of instantaneous nodes and one row of time integrating input nodes acting as short-term memories. Motion detection is achieved locally by subnetworks which are trained with a synthetic training set using the backpropagation algorithm. The global network is constructed from these subnetworks, one for each position. We test its performance with different pulse shapes and sizes and find the response to be invariant in a window of pulse velocities an order of magnitude wide. The window can be shifted by adjusting the memory time of the input nodes.

Download Full-text

ACCELERATED LEARNING BY ACTIVE EXAMPLE SELECTION

International Journal of Neural Systems ◽

10.1142/s0129065794000086 ◽

1994 ◽

Vol 05 (01) ◽

pp. 67-75 ◽

Cited By ~ 32

Author(s):

BYOUNG-TAK ZHANG

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithm ◽

Accelerated Learning ◽

Training Set ◽

Alternative Approach ◽

Speed Up ◽

Multilayer Neural Networks ◽

Training Examples ◽

The Given

Much previous work on training multilayer neural networks has attempted to speed up the backpropagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative approach in which the learning proceeds on an increasing number of selected training examples, starting with a small training set. We derive a measure of criticality of examples and present an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task. Our experimental results suggest that the method can significantly improve training speed and generalization performance in many real applications of neural networks. This method can be used in conjunction with other variations of gradient descent algorithms.

Download Full-text