First-Order Recurrent Neural Networks and Deterministic Finite State Automata

We examine the correspondence between first-order recurrent neural networks and deterministic finite state automata. We begin with the problem of inducing deterministic finite state automata from finite training sets, that include both positive and negative examples, an NP-hard problem (Angluin and Smith 1983). We use a neural network architecture with two recurrent layers, which we argue can approximate any discrete-time, time-invariant dynamic system, with computation of the full gradient during learning. The networks are trained to classify strings as belonging or not belonging to the grammar. The training sets used contain only short strings, and the sets are constructed in a way that does not require a priori knowledge of the grammar. After training, the networks are tested using various test sets with strings of length up to 1000, and are often able to correctly classify all the test strings. These results are comparable to those obtained with second-order networks (Giles et al. 1992; Watrous and Kuhn 1992a; Zeng et al. 1993). We observe that the networks emulate finite state automata, confirming the results of other authors, and we use a vector quantization algorithm to extract deterministic finite state automata after training and during testing of the networks, obtaining a table listing the start state, accept states, reject states, all transitions from the states, as well as some useful statistics. We examine the correspondence between finite state automata and neural networks in detail, showing two major stages in the learning process. To this end, we use a graphics module, which graphically depicts the states of the network during the learning and testing phases. We examine the networks' performance when tested on strings much longer than those in the training set, noting a measure based on clustering that is correlated to the stability of the networks. Finally, we observe that with sufficiently long training times, neural networks can become true finite state automata, due to the attractor structure of their dynamics.

Download Full-text

EXPERIMENTAL COMPARISON OF THE EFFECT OF ORDER IN RECURRENT NEURAL NETWORKS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001493000431 ◽

1993 ◽

Vol 07 (04) ◽

pp. 849-872 ◽

Cited By ~ 30

Author(s):

CLIFFORD B. MILLER ◽

C. LEE GILES

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Internal State ◽

Second Order ◽

Convergence Time ◽

Experimental Comparison ◽

Grammatical Inference ◽

Neural Net ◽

First Order ◽

Finite State

There has been much interest in increasing the computational power of neural networks. In addition there has been much interest in “designing” neural networks better suited to particular problems. Increasing the “order” of the connectivity of a neural network permits both. Though order has played a significant role in feedforward neural networks, its role in dynamically driven recurrent networks is still being understood. This work explores the effect of order in learning grammars. We present an experimental comparison of first order and second order recurrent neural networks, as applied to the task of grammatical inference. We show that for the small grammars studied these two neural net architectures have comparable learning and generalization power, and that both are reasonably capable of extracting the correct finite state automata for the language in question. However, for a larger randomly-generated ten-state grammar, second order networks significantly outperformed the first order networks, both in convergence time and generalization capability. We show that these networks learn faster the more neurons they have (our experiments used up to 10 hidden neurons), but that the solutions found by smaller networks are usually of better quality (in terms of generalization performance after training). Second order nets have the advantage that they converge more quickly to a solution and can find it more reliably than first order nets, but that the second order solutions tend to be of poorer quality than those of the first order if both architectures are trained to the same error tolerance. Despite this, second order nets can more successfully extract finite state machines using heuristic clustering techniques applied to the internal state representations. We speculate that this may be due to restrictions on the ability of first order architecture to fully make use of its internal state representation power and that this may have implications for the performance of the two architectures when scaled up to larger problems.

Download Full-text

Representation and Identification of Finite State Automata by Recurrent Neural Networks

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30499-9_39 ◽

2004 ◽

pp. 261-268 ◽

Cited By ~ 1

Author(s):

Yasuaki Kuroe

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Finite State Automata ◽

Finite State

Download Full-text

Constructing deterministic finite-state automata in recurrent neural networks

Journal of the ACM ◽

10.1145/235809.235811 ◽

1996 ◽

Vol 43 (6) ◽

pp. 937-972 ◽

Cited By ~ 100

Author(s):

Christian W. Omlin ◽

C. Lee Giles

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Finite State Automata ◽

Finite State

Download Full-text

Call classification using recurrent neural networks, support vector machines and finite state automata

Knowledge and Information Systems ◽

10.1007/s10115-005-0198-5 ◽

2005 ◽

Vol 9 (2) ◽

pp. 131-156 ◽

Cited By ~ 7

Author(s):

Sheila Garfield ◽

Stefan Wermter

Keyword(s):

Neural Networks ◽

Support Vector Machines ◽

Recurrent Neural Networks ◽

Support Vector ◽

Finite State Automata ◽

Vector Machines ◽

Finite State ◽

Call Classification

Download Full-text

Fuzzy finite-state automata can be deterministically encoded into recurrent neural networks

IEEE Transactions on Fuzzy Systems ◽

10.1109/91.660809 ◽

1998 ◽

Vol 6 (1) ◽

pp. 76-89 ◽

Cited By ~ 60

Author(s):

C.W. Omlin ◽

K.K. Thornber ◽

C.L. Giles

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Finite State Automata ◽

Finite State

Download Full-text

On-Line Identification and Rule Extraction of Finite State Automata with Recurrent Neural Networks

Artificial Neural Nets and Genetic Algorithms ◽

10.1007/978-3-7091-6230-9_18 ◽

2001 ◽

pp. 78-81 ◽

Cited By ~ 3

Author(s):

Ivan Gabrijel ◽

Andrej Dobnikar

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Rule Extraction ◽

Finite State Automata ◽

Line Identification ◽

Finite State ◽

On Line

Download Full-text

Representation of fuzzy finite state automata in continuous recurrent, neural networks

Proceedings of International Conference on Neural Networks (ICNN'96) ◽

10.1109/icnn.1996.549038 ◽

2002 ◽

Cited By ~ 6

Author(s):

C.W. Omlin ◽

K.K. Thornber ◽

C.L. Giles

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Finite State Automata ◽

Finite State

Download Full-text

Constructing deterministic finite-state automata in sparse recurrent neural networks

Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94) ◽

10.1109/icnn.1994.374417 ◽

2002 ◽

Cited By ~ 9

Author(s):

C.W. Omlin ◽

C.L. Giles

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Finite State Automata ◽

Finite State

Download Full-text

Symbolic Priors for RNN-based Semantic Parsing

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/585 ◽

2017 ◽

Cited By ~ 1

Author(s):

Chunyang Xiao ◽

Marc Dymetman ◽

Claire Gardent

Keyword(s):

Neural Networks ◽

Prior Knowledge ◽

Recurrent Neural Networks ◽

Logical Form ◽

Finite State Automata ◽

Semantic Parsing ◽

Context Free Grammar ◽

Finite State ◽

Intersection Algorithm ◽

Context Free

Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attention in the domain of Semantic Parsing. While in principle they can be trained directly on pairs (natural language utterances, logical forms), their performance is limited by the amount of available data. To alleviate this problem, we propose to exploit various sources of prior knowledge: the well-formedness of the logical forms is modeled by a weighted context-free grammar; the likelihood that certain entities present in the input utterance are also present in the logical form is modeled by weighted finite-state automata. The grammar and automata are combined together through an efficient intersection algorithm to form a soft guide (“background”) to the RNN.We test our method on an extension of the Overnight dataset and show that it not only strongly improves over an RNN baseline, but also outperforms non-RNN models based on rich sets of hand-crafted features.

Download Full-text

Stable Encoding of Large Finite-State Automata in Recurrent Neural Networks with Sigmoid Discriminants

Neural Computation ◽

10.1162/neco.1996.8.4.675 ◽

1996 ◽

Vol 8 (4) ◽

pp. 675-696 ◽

Cited By ~ 32

Author(s):

Christian W. Omlin ◽

C. Lee Giles

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Case Analysis ◽

Network Size ◽

Worst Case Analysis ◽

Small Subset ◽

Finite State Automata ◽

State Behavior ◽

Worst Case ◽

Finite State

We propose an algorithm for encoding deterministic finite-state automata (DFAs) in second-order recurrent neural networks with sigmoidal discriminant function and we prove that the languages accepted by the constructed network and the DFA are identical. The desired finite-state network dynamics is achieved by programming a small subset of all weights. A worst case analysis reveals a relationship between the weight strength and the maximum allowed network size, which guarantees finite-state behavior of the constructed network. We illustrate the method by encoding random DFAs with 10, 100, and 1000 states. While the theory predicts that the weight strength scales with the DFA size, we find empirically the weight strength to be almost constant for all the random DFAs. These results can be explained by noting that the generated DFAs represent average cases. We empirically demonstrate the existence of extreme DFAs for which the weight strength scales with DFA size.

Download Full-text