ACCELERATED LEARNING BY ACTIVE EXAMPLE SELECTION

1994 ◽  
Vol 05 (01) ◽  
pp. 67-75 ◽  
Author(s):  
BYOUNG-TAK ZHANG

Much previous work on training multilayer neural networks has attempted to speed up the backpropagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative approach in which the learning proceeds on an increasing number of selected training examples, starting with a small training set. We derive a measure of criticality of examples and present an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task. Our experimental results suggest that the method can significantly improve training speed and generalization performance in many real applications of neural networks. This method can be used in conjunction with other variations of gradient descent algorithms.

2014 ◽  
pp. 99-106
Author(s):  
Leonid Makhnist ◽  
Nikolaj Maniakov ◽  
Nikolaj Maniakov

Is proposed two new techniques for multilayer neural networks training. Its basic concept is based on the gradient descent method. For every methodic are showed formulas for calculation of the adaptive training steps. Presented matrix algorithmizations for all of these techniques are very helpful in its program realization.


1994 ◽  
Vol 05 (02) ◽  
pp. 115-122
Author(s):  
MOSTEFA GOLEA

We describe an Hebb-type algorithm for learning unions of nonoverlapping perceptrons with binary weights. Two perceptrons are said to be nonoverlapping if they do not share any input variables. The learning algorithm is able to find both the network architecture and the weight values necessary to represent the target function. Moreover, the algorithm is local, homogeneous, and simple enough to be biologically plausible. We investigate the average behavior of this algorithm as a function of the size of the training set. We find that, as the size of the training set increases, the hypothesis network built by the algorithm “converges” to the target network, both in terms of the number of perceptrons and the connectivity. Moreover, the generalization rate converges exponentially to perfect generalization as a function of the number of training examples. The analytic expressions are in excellent agreement with the numerical simulations. To our knowledge, this is the first average case analysis of an algorithm that finds both the weight values and the network connectivity.


1994 ◽  
Vol 6 (3) ◽  
pp. 469-490 ◽  
Author(s):  
K. P. Unnikrishnan ◽  
K. P. Venugopal

We present a learning algorithm for neural networks, called Alopex. Instead of error gradient, Alopex uses local correlations between changes in individual weights and changes in the global error measure. The algorithm does not make any assumptions about transfer functions of individual neurons, and does not explicitly depend on the functional form of the error measure. Hence, it can be used in networks with arbitrary transfer functions and for minimizing a large class of error measures. The learning algorithm is the same for feedforward and recurrent networks. All the weights in a network are updated simultaneously, using only local computations. This allows complete parallelization of the algorithm. The algorithm is stochastic and it uses a “temperature” parameter in a manner similar to that in simulated annealing. A heuristic “annealing schedule” is presented that is effective in finding global minima of error surfaces. In this paper, we report extensive simulation studies illustrating these advantages and show that learning times are comparable to those for standard gradient descent methods. Feedforward networks trained with Alopex are used to solve the MONK's problems and symmetry problems. Recurrent networks trained with the same algorithm are used for solving temporal XOR problems. Scaling properties of the algorithm are demonstrated using encoder problems of different sizes and advantages of appropriate error measures are illustrated using a variety of problems.


1994 ◽  
Vol 05 (02) ◽  
pp. 153-156
Author(s):  
R. MONASSON

A learning algorithm for the two-layered committee machine is proposed. The proof of its convergence in a finite time is given. Its efficiency is compared to the simple exhaustive enumeration of the internal representations of the training set.


2021 ◽  
Author(s):  
Justin Sirignano ◽  
Konstantinos Spiliopoulos

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.


Sign in / Sign up

Export Citation Format

Share Document