Recurrent Neural Networks with Small Weights Implement Definite Memory Machines

2003 ◽  
Vol 15 (8) ◽  
pp. 1897-1929 ◽  
Author(s):  
Barbara Hammer ◽  
Peter Tiňo

Recent experimental studies indicate that recurrent neural networks initialized with “small” weights are inherently biased toward definite memory machines (Tiňno, Čerňanský, & Beňušková, 2002a, 2002b). This article establishes a theoretical counterpart: transition function of recurrent network with small weights and squashing activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite memory machine. Conversely, every definite memory machine can be simulated by a recurrent network with contractive transition function. Hence, initialization with small weights induces an architectural bias into learning with recurrent neural networks. This bias might have benefits from the point of view of statistical learning theory: it emphasizes one possible region of the weight space where generalization ability can be formally proved. It is well known that standard recurrent neural networks are not distribution independent learnable in the probably approximately correct (PAC) sense if arbitrary precision and inputs are considered. We prove that recurrent networks with contractive transition function with a fixed contraction parameter fulfill the so-called distribution independent uniform convergence of empirical distances property and hence, unlike general recurrent networks, are distribution independent PAC learnable.

2017 ◽  
Vol 43 (4) ◽  
pp. 761-780 ◽  
Author(s):  
Ákos Kádár ◽  
Grzegorz Chrupała ◽  
Afra Alishahi

We present novel methods for analyzing the activation patterns of recurrent neural networks from a linguistic point of view and explore the types of linguistic structure they learn. As a case study, we use a standard standalone language model, and a multi-task gated recurrent network architecture consisting of two parallel pathways with shared word embeddings: The Visual pathway is trained on predicting the representations of the visual scene corresponding to an input sentence, and the Textual pathway is trained to predict the next word in the same sentence. We propose a method for estimating the amount of contribution of individual tokens in the input to the final prediction of the networks. Using this method, we show that the Visual pathway pays selective attention to lexical categories and grammatical functions that carry semantic information, and learns to treat word types differently depending on their grammatical function and their position in the sequential structure of the sentence. In contrast, the language models are comparatively more sensitive to words with a syntactic function. Further analysis of the most informative n-gram contexts for each model shows that in comparison with the Visual pathway, the language models react more strongly to abstract contexts that represent syntactic constructions.


Author(s):  
Diyar Qader Zeebaree ◽  
Adnan Mohsin Abdulazeez ◽  
Lozan M. Abdullrhman ◽  
Dathar Abas Hasan ◽  
Omar Sedqi Kareem

Prediction is vital in our daily lives, as it is used in various ways, such as learning, adapting, predicting, and classifying. The prediction of parameters capacity of RNNs is very high; it provides more accurate results than the conventional statistical methods for prediction. The impact of a hierarchy of recurrent neural networks on Predicting process is studied in this paper. A recurrent network takes the hidden state of the previous layer as input and generates as output the hidden state of the current layer. Some of deep Learning algorithms can be utilized in as prediction tools in video analysis, musical information retrieval and time series applications. Recurrent networks may process examples simultaneously, maintaining a state or memory that recreates an arbitrarily long background window. Long Short-Term Memory (LSTM) and Bidirectional RNN (BRNN) are examples of recurrent networks. This paper aims to give a comprehensive assessment of predictions based on RNN. Additionally, each paper presents all relevant facts, such as dataset, method, architecture, and the accuracy of the predictions they deliver.


Author(s):  
Nicola Capuano ◽  
Santi Caballé ◽  
Jordi Conesa ◽  
Antonio Greco

AbstractMassive open online courses (MOOCs) allow students and instructors to discuss through messages posted on a forum. However, the instructors should limit their interaction to the most critical tasks during MOOC delivery so, teacher-led scaffolding activities, such as forum-based support, can be very limited, even impossible in such environments. In addition, students who try to clarify the concepts through such collaborative tools could not receive useful answers, and the lack of interactivity may cause a permanent abandonment of the course. The purpose of this paper is to report the experimental findings obtained evaluating the performance of a text categorization tool capable of detecting the intent, the subject area, the domain topics, the sentiment polarity, and the level of confusion and urgency of a forum post, so that the result may be exploited by instructors to carefully plan their interventions. The proposed approach is based on the application of attention-based hierarchical recurrent neural networks, in which both a recurrent network for word encoding and an attention mechanism for word aggregation at sentence and document levels are used before classification. The integration of the developed classifier inside an existing tool for conversational agents, based on the academically productive talk framework, is also presented as well as the accuracy of the proposed method in the classification of forum posts.


2021 ◽  
Vol 26 (jai2021.26(1)) ◽  
pp. 32-41
Author(s):  
Bodyanskiy Y ◽  
◽  
Antonenko T ◽  

Modern approaches in deep neural networks have a number of issues related to the learning process and computational costs. This article considers the architecture grounded on an alternative approach to the basic unit of the neural network. This approach achieves optimization in the calculations and gives rise to an alternative way to solve the problems of the vanishing and exploding gradient. The main issue of the article is the usage of the deep stacked neo-fuzzy system, which uses a generalized neo-fuzzy neuron to optimize the learning process. This approach is non-standard from a theoretical point of view, so the paper presents the necessary mathematical calculations and describes all the intricacies of using this architecture from a practical point of view. From a theoretical point, the network learning process is fully disclosed. Derived all necessary calculations for the use of the backpropagation algorithm for network training. A feature of the network is the rapid calculation of the derivative for the activation functions of neurons. This is achieved through the use of fuzzy membership functions. The paper shows that the derivative of such function is a constant, and this is a reason for the statement of increasing in the optimization rate in comparison with neural networks which use neurons with more common activation functions (ReLU, sigmoid). The paper highlights the main points that can be improved in further theoretical developments on this topic. In general, these issues are related to the calculation of the activation function. The proposed methods cope with these points and allow approximation using the network, but the authors already have theoretical justifications for improving the speed and approximation properties of the network. The results of the comparison of the proposed network with standard neural network architectures are shown


2021 ◽  
pp. 36-43
Author(s):  
L. A. Demidova ◽  
A. V. Filatov

The article considers an approach to solving the problem of monitoring and classifying the states of hard disks, which is solved on a regular basis, within the framework of the concept of non-destructive testing. It is proposed to solve this problem by developing a classification model using machine learning algorithms, in particular, using recurrent neural networks with Simple RNN, LSTM and GRU architectures. To develop a classification model, a data set based on the values of SMART sensors installed on hard disks it used. It represents a group of multidimensional time series. At the same time, the structure of the classification model contains two layers of a neural network with one of the recurrent architectures, as well as a Dropout layer and a Dense layer. The results of experimental studies confirming the advantages of LSTM and GRU architectures as part of hard disk state classification models are presented.


2004 ◽  
Vol 213 ◽  
pp. 483-486
Author(s):  
David Brodrick ◽  
Douglas Taylor ◽  
Joachim Diederich

A recurrent neural network was trained to detect the time-frequency domain signature of narrowband radio signals against a background of astronomical noise. The objective was to investigate the use of recurrent networks for signal detection in the Search for Extra-Terrestrial Intelligence, though the problem is closely analogous to the detection of some classes of Radio Frequency Interference in radio astronomy.


2020 ◽  
Vol 34 (04) ◽  
pp. 5150-5157
Author(s):  
Fandong Meng ◽  
Jinchao Zhang ◽  
Yang Liu ◽  
Jie Zhou

Recurrent neural networks (RNNs) have been widely used to deal with sequence learning problems. The input-dependent transition function, which folds new observations into hidden states to sequentially construct fixed-length representations of arbitrary-length sequences, plays a critical role in RNNs. Based on single space composition, transition functions in existing RNNs often have difficulty in capturing complicated long-range dependencies. In this paper, we introduce a new Multi-zone Unit (MZU) for RNNs. The key idea is to design a transition function that is capable of modeling multiple space composition. The MZU consists of three components: zone generation, zone composition, and zone aggregation. Experimental results on multiple datasets of the character-level language modeling task and the aspect-based sentiment analysis task demonstrate the superiority of the MZU.


2006 ◽  
Vol 15 (04) ◽  
pp. 623-650
Author(s):  
JUDY A. FRANKLIN

Recurrent (neural) networks have been deployed as models for learning musical processes, by computational scientists who study processes such as dynamic systems. Over time, more intricate music has been learned as the state of the art in recurrent networks improves. One particular recurrent network, the Long Short-Term Memory (LSTM) network shows promise for learning long songs, and generating new songs. We are experimenting with a module containing two inter-recurrent LSTM networks to cooperatively learn several human melodies, based on the songs' harmonic structures, and on the feedback inherent in the network. We show that these networks can learn to reproduce four human melodies. We then present as input new harmonizations, so as to generate new songs. We describe the reharmonizations, and show the new melodies that result. We also present a hierarchical structure for using reinforcement learning to choose LSTM modules during the course of melody generation.


1999 ◽  
Vol 11 (5) ◽  
pp. 1069-1077 ◽  
Author(s):  
Danilo P. Mandic ◽  
Jonathon A. Chambers

A relationship between the learning rate η in the learning algorithm, and the slope β in the nonlinear activation function, for a class of recurrent neural networks (RNNs) trained by the real-time recurrent learning algorithm is provided. It is shown that an arbitrary RNN can be obtained via the referent RNN, with some deterministic rules imposed on its weights and the learning rate. Such relationships reduce the number of degrees of freedom when solving the nonlinear optimization task of finding the optimal RNN parameters.


1994 ◽  
Vol 25 (8) ◽  
pp. 27-39
Author(s):  
Tatsumi Watanabe ◽  
Yoshiki Uchikawa ◽  
Kazutoshi Gouhara

Sign in / Sign up

Export Citation Format

Share Document