A Convergence Result for Learning in Recurrent Neural Networks

1994 ◽  
Vol 6 (3) ◽  
pp. 420-440 ◽  
Author(s):  
Chung-Ming Kuan ◽  
Kurt Hornik ◽  
Halbert White

We give a rigorous analysis of the convergence properties of a backpropagation algorithm for recurrent networks containing either output or hidden layer recurrence. The conditions permit data generated by stochastic processes with considerable dependence. Restrictions are offered that may help assure convergence of the network parameters to a local optimum, as some simulations illustrate.

2004 ◽  
Vol 213 ◽  
pp. 483-486
Author(s):  
David Brodrick ◽  
Douglas Taylor ◽  
Joachim Diederich

A recurrent neural network was trained to detect the time-frequency domain signature of narrowband radio signals against a background of astronomical noise. The objective was to investigate the use of recurrent networks for signal detection in the Search for Extra-Terrestrial Intelligence, though the problem is closely analogous to the detection of some classes of Radio Frequency Interference in radio astronomy.


2003 ◽  
Vol 15 (8) ◽  
pp. 1897-1929 ◽  
Author(s):  
Barbara Hammer ◽  
Peter Tiňo

Recent experimental studies indicate that recurrent neural networks initialized with “small” weights are inherently biased toward definite memory machines (Tiňno, Čerňanský, & Beňušková, 2002a, 2002b). This article establishes a theoretical counterpart: transition function of recurrent network with small weights and squashing activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite memory machine. Conversely, every definite memory machine can be simulated by a recurrent network with contractive transition function. Hence, initialization with small weights induces an architectural bias into learning with recurrent neural networks. This bias might have benefits from the point of view of statistical learning theory: it emphasizes one possible region of the weight space where generalization ability can be formally proved. It is well known that standard recurrent neural networks are not distribution independent learnable in the probably approximately correct (PAC) sense if arbitrary precision and inputs are considered. We prove that recurrent networks with contractive transition function with a fixed contraction parameter fulfill the so-called distribution independent uniform convergence of empirical distances property and hence, unlike general recurrent networks, are distribution independent PAC learnable.


2021 ◽  
Author(s):  
Bojian Yin ◽  
Federico Corradi ◽  
Sander M. Bohté

ABSTRACTInspired by more detailed modeling of biological neurons, Spiking neural networks (SNNs) have been investigated both as more biologically plausible and potentially more powerful models of neural computation, and also with the aim of extracting biological neurons’ energy efficiency; the performance of such networks however has remained lacking compared to classical artificial neural networks (ANNs). Here, we demonstrate how a novel surrogate gradient combined with recurrent networks of tunable and adaptive spiking neurons yields state-of-the-art for SNNs on challenging benchmarks in the time-domain, like speech and gesture recognition. This also exceeds the performance of standard classical recurrent neural networks (RNNs) and approaches that of the best modern ANNs. As these SNNs exhibit sparse spiking, we show that they theoretically are one to three orders of magnitude more computationally efficient compared to RNNs with comparable performance. Together, this positions SNNs as an attractive solution for AI hardware implementations.


2009 ◽  
Vol 19 (02) ◽  
pp. 115-125 ◽  
Author(s):  
GHEORGHE PUSCASU ◽  
BOGDAN CODRES ◽  
ALEXANDRU STANCU ◽  
GABRIEL MURARIU

A novel approach for nonlinear complex system identification based on internal recurrent neural networks (IRNN) is proposed in this paper. The computational complexity of neural identification can be greatly reduced if the whole system is decomposed into several subsystems. This approach employs internal state estimation when no measurements coming from the sensors are available for the system states. A modified backpropagation algorithm is introduced in order to train the IRNN for nonlinear system identification. The performance of the proposed design approach is proven on a car simulator case study.


Acta Numerica ◽  
1994 ◽  
Vol 3 ◽  
pp. 145-202 ◽  
Author(s):  
S.W. Ellacott

This article starts with a brief introduction to neural networks for those unfamiliar with the basic concepts, together with a very brief overview of mathematical approaches to the subject. This is followed by a more detailed look at three areas of research which are of particular interest to numerical analysts.The first area is approximation theory. IfKis a compact set in ℝn, for somen, then it is proved that a semilinear feedforward network with one hidden layer can uniformly approximate any continuous function inC(K) to any required accuracy. A discussion of known results and open questions on the degree of approximation is included. We also consider the relevance of radial basis functions to neural networks.The second area considered is that of learning algorithms. A detailed analysis of one popular algorithm (the delta rule) will be given, indicating why one implementation leads to a stable numerical process, whereas an initially attractive variant (essentially a form of steepest descent) does not. Similar considerations apply to the backpropagation algorithm. The effect of filtering and other preprocessing of the input data will also be discussed systematically.Finally some applications of neural networks to numerical computation are considered.


2002 ◽  
Vol 14 (8) ◽  
pp. 1907-1927 ◽  
Author(s):  
Alex Aussem

This article extends previous analysis of the gradient decay to a class of discrete-time fully recurrent networks, called dynamical recurrent neural networks, obtained by modeling synapses as finite impulse response (FIR) filters instead of multiplicative scalars. Using elementary matrix manipulations, we provide an upper bound on the norm of the weight matrix, ensuring that the gradient vector, when propagated in a reverse manner in time through the error-propagation network, decays exponentially to zero. This bound applies to all recurrent FIR architecture proposals, as well as fixed-point recurrent networks, regardless of delay and connectivity. In addition, we show that the computational overhead of the learning algorithm can be reduced drastically by taking advantage of the exponential decay of the gradient.


1989 ◽  
Vol 1 (4) ◽  
pp. 552-558 ◽  
Author(s):  
David Zipser

An algorithm, called RTRL, for training fully recurrent neural networks has recently been studied by Williams and Zipser (1989a, b). Whereas RTRL has been shown to have great power and generality, it has the disadvantage of requiring a great deal of computation time. A technique is described here for reducing the amount of computation required by RTRL without changing the connectivity of the networks. This is accomplished by dividing the original network into subnets for the purpose of error propagation while leaving them undivided for activity propagation. An example is given of a 12-unit network that learns to be the finite-state part of a Turing machine and runs 10 times faster using the subgrouping strategy than the original algorithm.


2016 ◽  
Author(s):  
Thomas Miconi

AbstractNeural activity during cognitive tasks exhibits complex dynamics that flexibly encode task-relevant variables. Chaotic recurrent networks, which spontaneously generate rich dynamics, have been proposed as a model of cortical computation during cognitive tasks. However, existing methods for training these networks are either biologically implausible, and/or require a continuous, real-time error signal to guide learning. Here we show that a biologically plausible learning rule can train such recurrent networks, guided solely by delayed, phasic rewards at the end of each trial. Networks endowed with this learning rule can successfully learn nontrivial tasks requiring flexible (context-dependent) associations, memory maintenance, nonlinear mixed selectivities, and coordination among multiple outputs. The resulting networks replicate complex dynamics previously observed in animal cortex, such as dynamic encoding of task features and selective integration of sensory inputs. We conclude that recurrent neural networks offer a plausible model of cortical dynamics during both learning and performance of flexible behavior.


2021 ◽  
Author(s):  
Quan Wan ◽  
Jorge A. Menendez ◽  
Bradley R. Postle

How does the brain prioritize among the contents of working memory to appropriately guide behavior? Using inverted encoding modeling (IEM), previous work (Wan et al., 2020) showed that unprioritized memory items (UMI) are actively represented in the brain but in a “flipped”, or opposite, format compared to prioritized memory items (PMI). To gain insight into the mechanisms underlying the UMI-to-PMI representational transformation, we trained recurrent neural networks (RNNs) with an LSTM architecture to perform a 2-back working memory task. Visualization of the LSTM hidden layer activity using Principle Component Analysis (PCA) revealed that the UMI representation is rotationally remapped to that of PMI, and this was quantified and confirmed via demixed PCA. The application of the same analyses to the EEG dataset of Wan et al. (2020) revealed similar rotational remapping between the UMI and PMI representations. These results identify rotational remapping as a candidate neural computation employed in the dynamic prioritization within contents of working memory.


2005 ◽  
Vol 15 (06) ◽  
pp. 435-443 ◽  
Author(s):  
XIAOMING CHEN ◽  
ZHENG TANG ◽  
CATHERINE VARIAPPAN ◽  
SONGSONG LI ◽  
TOSHIMI OKADA

The complex-valued backpropagation algorithm has been widely used in fields of dealing with telecommunications, speech recognition and image processing with Fourier transformation. However, the local minima problem usually occurs in the process of learning. To solve this problem and to speed up the learning process, we propose a modified error function by adding a term to the conventional error function, which is corresponding to the hidden layer error. The simulation results show that the proposed algorithm is capable of preventing the learning from sticking into the local minima and of speeding up the learning.


Sign in / Sign up

Export Citation Format

Share Document