EXPERIMENTAL COMPARISON OF THE EFFECT OF ORDER IN RECURRENT NEURAL NETWORKS

There has been much interest in increasing the computational power of neural networks. In addition there has been much interest in “designing” neural networks better suited to particular problems. Increasing the “order” of the connectivity of a neural network permits both. Though order has played a significant role in feedforward neural networks, its role in dynamically driven recurrent networks is still being understood. This work explores the effect of order in learning grammars. We present an experimental comparison of first order and second order recurrent neural networks, as applied to the task of grammatical inference. We show that for the small grammars studied these two neural net architectures have comparable learning and generalization power, and that both are reasonably capable of extracting the correct finite state automata for the language in question. However, for a larger randomly-generated ten-state grammar, second order networks significantly outperformed the first order networks, both in convergence time and generalization capability. We show that these networks learn faster the more neurons they have (our experiments used up to 10 hidden neurons), but that the solutions found by smaller networks are usually of better quality (in terms of generalization performance after training). Second order nets have the advantage that they converge more quickly to a solution and can find it more reliably than first order nets, but that the second order solutions tend to be of poorer quality than those of the first order if both architectures are trained to the same error tolerance. Despite this, second order nets can more successfully extract finite state machines using heuristic clustering techniques applied to the internal state representations. We speculate that this may be due to restrictions on the ability of first order architecture to fully make use of its internal state representation power and that this may have implications for the performance of the two architectures when scaled up to larger problems.

Download Full-text

Second-order recurrent neural networks for grammatical inference

IJCNN-91-Seattle International Joint Conference on Neural Networks ◽

10.1109/ijcnn.1991.155350 ◽

2002 ◽

Cited By ~ 18

Author(s):

C.L. Giles ◽

D. Chen ◽

C.B. Miller ◽

H.H. Chen ◽

G.Z. Sun ◽

...

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Second Order ◽

Grammatical Inference

Download Full-text

First-Order Recurrent Neural Networks and Deterministic Finite State Automata

Neural Computation ◽

10.1162/neco.1994.6.6.1155 ◽

1994 ◽

Vol 6 (6) ◽

pp. 1155-1173 ◽

Cited By ~ 43

Author(s):

Peter Manolios ◽

Robert Fanelli

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Network Architecture ◽

A Priori ◽

Finite State Automata ◽

First Order ◽

Finite State ◽

The Stability ◽

Time Invariant ◽

Training Sets

We examine the correspondence between first-order recurrent neural networks and deterministic finite state automata. We begin with the problem of inducing deterministic finite state automata from finite training sets, that include both positive and negative examples, an NP-hard problem (Angluin and Smith 1983). We use a neural network architecture with two recurrent layers, which we argue can approximate any discrete-time, time-invariant dynamic system, with computation of the full gradient during learning. The networks are trained to classify strings as belonging or not belonging to the grammar. The training sets used contain only short strings, and the sets are constructed in a way that does not require a priori knowledge of the grammar. After training, the networks are tested using various test sets with strings of length up to 1000, and are often able to correctly classify all the test strings. These results are comparable to those obtained with second-order networks (Giles et al. 1992; Watrous and Kuhn 1992a; Zeng et al. 1993). We observe that the networks emulate finite state automata, confirming the results of other authors, and we use a vector quantization algorithm to extract deterministic finite state automata after training and during testing of the networks, obtaining a table listing the start state, accept states, reject states, all transitions from the states, as well as some useful statistics. We examine the correspondence between finite state automata and neural networks in detail, showing two major stages in the learning process. To this end, we use a graphics module, which graphically depicts the states of the network during the learning and testing phases. We examine the networks' performance when tested on strings much longer than those in the training set, noting a measure based on clustering that is correlated to the stability of the networks. Finally, we observe that with sufficiently long training times, neural networks can become true finite state automata, due to the attractor structure of their dynamics.

Download Full-text

An Algebraic Framework to Represent Finite State Machines in Single-Layer Recurrent Neural Networks

Neural Computation ◽

10.1162/neco.1995.7.5.931 ◽

1995 ◽

Vol 7 (5) ◽

pp. 931-949 ◽

Cited By ~ 21

Author(s):

R. Alquézar ◽

A. Sanfeliu

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Single Layer ◽

Finite State Machines ◽

Grammatical Inference ◽

State Machines ◽

Linear System Of Equations ◽

Wide Range ◽

Finite State ◽

Algebraic Framework

In this paper we present an algebraic framework to represent finite state machines (FSMs) in single-layer recurrent neural networks (SLRNNs), which unifies and generalizes some of the previous proposals. This framework is based on the formulation of both the state transition function and the output function of an FSM as a linear system of equations, and it permits an analytical explanation of the representational capabilities of first-order and higher-order SLRNNs. The framework can be used to insert symbolic knowledge in RNNs prior to learning from examples and to keep this knowledge while training the network. This approach is valid for a wide range of activation functions, whenever some stability conditions are met. The framework has already been used in practice in a hybrid method for grammatical inference reported elsewhere (Sanfeliu and Alquézar 1994).

Download Full-text

First-order versus second-order single-layer recurrent neural networks

IEEE Transactions on Neural Networks ◽

10.1109/72.286928 ◽

1994 ◽

Vol 5 (3) ◽

pp. 511-513 ◽

Cited By ~ 44

Author(s):

M.W. Goudreau ◽

C.L. Giles ◽

S.T. Chakradhar ◽

D. Chen

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Single Layer ◽

Second Order ◽

First Order

Download Full-text

Architectural Bias in Recurrent Neural Networks: Fractal Analysis

Neural Computation ◽

10.1162/08997660360675099 ◽

2003 ◽

Vol 15 (8) ◽

pp. 1931-1957 ◽

Cited By ~ 13

Author(s):

Peter Tiňo ◽

Barbara Hammer

Keyword(s):

Neural Networks ◽

Fractal Analysis ◽

Recurrent Neural Networks ◽

Markov Models ◽

Fractal Dimensions ◽

Transition Diagram ◽

Activation Patterns ◽

Box Counting ◽

Fractal Clusters ◽

Finite State

We have recently shown that when initialized with “small” weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tiňo, 2002; Tiňo, Čerňanský, &Beňušková, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram&a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.

Download Full-text

Representation and recognition of regular grammars by means of second-order recurrent neural networks

New Trends in Neural Computation - Lecture Notes in Computer Science ◽

10.1007/3-540-56798-4_138 ◽

1993 ◽

pp. 143-148 ◽

Cited By ~ 1

Author(s):

R. Alquézar ◽

A. Sanfeliu

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Second Order

Download Full-text

EXPERIMENTAL COMPARISON OF THE EFFECT OF ORDER IN RECURRENT NEURAL NETWORKS

Series in Machine Perception and Artificial Intelligence - Advances in Pattern Recognition Systems Using Neural Network Technologies ◽

10.1142/9789812797926_0013 ◽

1994 ◽

pp. 205-228 ◽

Cited By ~ 1

Author(s):

CLIFFORD B. MILLER ◽

C. LEE GILES

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Experimental Comparison

Download Full-text

Improved second-order training algorithms for globally and partially recurrent neural networks

IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339) ◽

10.1109/ijcnn.1999.832591 ◽

2003 ◽

Cited By ~ 6

Author(s):

E.P. dos Santos ◽

F.J. Von Zuben

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Second Order ◽

Training Algorithms

Download Full-text

FINITE STATE PROCESSES, Z-TEMPORAL LOGIC AND THE MONADIC THEORY OF THE INTEGERS

International Journal of Foundations of Computer Science ◽

10.1142/s0129054192000152 ◽

1992 ◽

Vol 03 (03) ◽

pp. 233-244 ◽

Cited By ~ 1

Author(s):

A. SAOUDI ◽

D.E. MULLER ◽

P.E. SCHUPP

Keyword(s):

Temporal Logic ◽

Linear Temporal Logic ◽

Second Order ◽

Regular Languages ◽

Infinite Words ◽

First Order ◽

Monadic Theory ◽

Finite State ◽

Temporal Formula

We introduce four classes of Z-regular grammars for generating bi-infinite words (i.e. Z-words) and prove that they generate exactly Z-regular languages. We extend the second order monadic theory of one successor to the set of the integers (i.e. Z) and give some characterizations of this theory in terms of Z-regular grammars and Z-regular languages. We prove that this theory is decidable and equivalent to the weak theory. We also extend the linear temporal logic to Z-temporal logic and then prove that each Z-temporal formula is equivalent to a first order monadic formula. We prove that the correctness problem for finite state processes is decidable.

Download Full-text

Second-order training for recurrent neural networks without teacher-forcing

Proceedings of ICNN'95 - International Conference on Neural Networks ◽

10.1109/icnn.1995.487520 ◽

2002 ◽

Cited By ~ 1

Author(s):

F.J. Von Zuben ◽

M.L. de Andrade Netto

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Second Order

Download Full-text