Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review

Rule extraction (RE) from recurrent neural networks (RNNs) refers to finding models of the underlying RNN, typically in the form of finite state machines, that mimic the network to a satisfactory degree while having the advantage of being more transparent. RE from RNNs can be argued to allow a deeper and more profound form of analysis of RNNs than other, more or less ad hoc methods. RE may give us understanding of RNNs in the intermediate levels between quite abstract theoretical knowledge of RNNs as a class of computing devices and quantitative performance evaluations of RNN instantiations. The development of techniques for extraction of rules from RNNs has been an active field since the early 1990s. This article reviews the progress of this development and analyzes it in detail. In order to structure the survey and evaluate the techniques, a taxonomy specifically designed for this purpose has been developed. Moreover, important open research issues are identified that, if addressed properly, possibly can give the field a significant push forward.

Download Full-text

On-Line Identification and Rule Extraction of Finite State Automata with Recurrent Neural Networks

Artificial Neural Nets and Genetic Algorithms ◽

10.1007/978-3-7091-6230-9_18 ◽

2001 ◽

pp. 78-81 ◽

Cited By ~ 3

Author(s):

Ivan Gabrijel ◽

Andrej Dobnikar

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Rule Extraction ◽

Finite State Automata ◽

Line Identification ◽

Finite State ◽

On Line

Download Full-text

Architectural Bias in Recurrent Neural Networks: Fractal Analysis

Neural Computation ◽

10.1162/08997660360675099 ◽

2003 ◽

Vol 15 (8) ◽

pp. 1931-1957 ◽

Cited By ~ 13

Author(s):

Peter Tiňo ◽

Barbara Hammer

Keyword(s):

Neural Networks ◽

Fractal Analysis ◽

Recurrent Neural Networks ◽

Markov Models ◽

Fractal Dimensions ◽

Transition Diagram ◽

Activation Patterns ◽

Box Counting ◽

Fractal Clusters ◽

Finite State

We have recently shown that when initialized with “small” weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tiňo, 2002; Tiňo, Čerňanský, &Beňušková, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram&a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.

Download Full-text

Survey on vehicular ad hoc networks clustering algorithms: Overview, taxonomy, challenges, and open research issues

International Journal of Communication Systems ◽

10.1002/dac.4402 ◽

2020 ◽

Vol 33 (11) ◽

pp. e4402 ◽

Cited By ~ 5

Author(s):

Oussama Senouci ◽

Saad Harous ◽

Zibouda Aliouat

Keyword(s):

Ad Hoc Networks ◽

Vehicular Ad Hoc Networks ◽

Ad Hoc ◽

Clustering Algorithms ◽

Research Issues ◽

Open Research ◽

Hoc Networks

Download Full-text

TDMA-Based MAC Protocols for Vehicular Ad Hoc Networks: A Survey, Qualitative Analysis, and Open Research Issues

IEEE Communications Surveys & Tutorials ◽

10.1109/comst.2015.2440374 ◽

2015 ◽

Vol 17 (4) ◽

pp. 2461-2492 ◽

Cited By ~ 133

Author(s):

Mohamed Hadded ◽

Paul Muhlethaler ◽

Anis Laouiti ◽

Rachid Zagrouba ◽

Leila Azouz Saidane

Keyword(s):

Ad Hoc Networks ◽

Qualitative Analysis ◽

Vehicular Ad Hoc Networks ◽

Ad Hoc ◽

Mac Protocols ◽

Research Issues ◽

Open Research ◽

Hoc Networks

Download Full-text

EXPERIMENTAL COMPARISON OF THE EFFECT OF ORDER IN RECURRENT NEURAL NETWORKS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001493000431 ◽

1993 ◽

Vol 07 (04) ◽

pp. 849-872 ◽

Cited By ~ 30

Author(s):

CLIFFORD B. MILLER ◽

C. LEE GILES

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Internal State ◽

Second Order ◽

Convergence Time ◽

Experimental Comparison ◽

Grammatical Inference ◽

Neural Net ◽

First Order ◽

Finite State

There has been much interest in increasing the computational power of neural networks. In addition there has been much interest in “designing” neural networks better suited to particular problems. Increasing the “order” of the connectivity of a neural network permits both. Though order has played a significant role in feedforward neural networks, its role in dynamically driven recurrent networks is still being understood. This work explores the effect of order in learning grammars. We present an experimental comparison of first order and second order recurrent neural networks, as applied to the task of grammatical inference. We show that for the small grammars studied these two neural net architectures have comparable learning and generalization power, and that both are reasonably capable of extracting the correct finite state automata for the language in question. However, for a larger randomly-generated ten-state grammar, second order networks significantly outperformed the first order networks, both in convergence time and generalization capability. We show that these networks learn faster the more neurons they have (our experiments used up to 10 hidden neurons), but that the solutions found by smaller networks are usually of better quality (in terms of generalization performance after training). Second order nets have the advantage that they converge more quickly to a solution and can find it more reliably than first order nets, but that the second order solutions tend to be of poorer quality than those of the first order if both architectures are trained to the same error tolerance. Despite this, second order nets can more successfully extract finite state machines using heuristic clustering techniques applied to the internal state representations. We speculate that this may be due to restrictions on the ability of first order architecture to fully make use of its internal state representation power and that this may have implications for the performance of the two architectures when scaled up to larger problems.

Download Full-text

Rule extraction from recurrent neural networks using a symbolic machine learning algorithm

ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378) ◽

10.1109/iconip.1999.845683 ◽

2003 ◽

Cited By ~ 7

Author(s):

A. Vahed ◽

C.W. Omlin

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Recurrent Neural Networks ◽

Learning Algorithm ◽

Rule Extraction ◽

Machine Learning Algorithm

Download Full-text

Representation and Identification of Finite State Automata by Recurrent Neural Networks

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30499-9_39 ◽

2004 ◽

pp. 261-268 ◽

Cited By ~ 1

Author(s):

Yasuaki Kuroe

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Finite State Automata ◽

Finite State

Download Full-text

Finite State Machines and Recurrent Neural Networks — Automata and Dynamical Systems Approaches

Neural Networks and Pattern Recognition ◽

10.1016/b978-012526420-4/50007-0 ◽

1998 ◽

pp. 171-219 ◽

Cited By ~ 8

Author(s):

Peter Tiňo ◽

Bill G. Horne ◽

C. Lee Giles ◽

Pete C. Collingwood

Keyword(s):

Neural Networks ◽

Dynamical Systems ◽

Recurrent Neural Networks ◽

Finite State Machines ◽

State Machines ◽

Systems Approaches ◽

Finite State

Download Full-text

An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks

Neural Computation ◽

10.1162/neco_a_01111 ◽

2018 ◽

Vol 30 (9) ◽

pp. 2568-2591 ◽

Cited By ~ 8

Author(s):

Qinglong Wang ◽

Kaixuan Zhang ◽

Alexander G. Ororbia II ◽

Xinyu Xing ◽

Xue Liu ◽

...

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Credit Scoring ◽

Empirical Evaluation ◽

Rule Extraction ◽

Production Rules ◽

Recursive Models ◽

Box Models ◽

Highly Nonlinear ◽

Black Box Models

Rule extraction from black box models is critical in domains that require model validation before implementation, as can be the case in credit scoring and medical diagnosis. Though already a challenging problem in statistical learning in general, the difficulty is even greater when highly nonlinear, recursive models, such as recurrent neural networks (RNNs), are fit to data. Here, we study the extraction of rules from second-order RNNs trained to recognize the Tomita grammars. We show that production rules can be stably extracted from trained RNNs and that in certain cases, the rules outperform the trained RNNs.

Download Full-text

A Subgrouping Strategy that Reduces Complexity and Speeds Up Learning in Recurrent Networks

Neural Computation ◽

10.1162/neco.1989.1.4.552 ◽

1989 ◽

Vol 1 (4) ◽

pp. 552-558 ◽

Cited By ~ 35

Author(s):

David Zipser

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Error Propagation ◽

Computation Time ◽

Great Power ◽

Recurrent Networks ◽

Original Network ◽

Original Algorithm ◽

Finite State ◽

Unit Network

An algorithm, called RTRL, for training fully recurrent neural networks has recently been studied by Williams and Zipser (1989a, b). Whereas RTRL has been shown to have great power and generality, it has the disadvantage of requiring a great deal of computation time. A technique is described here for reducing the amount of computation required by RTRL without changing the connectivity of the networks. This is accomplished by dividing the original network into subnets for the purpose of error propagation while leaving them undivided for activity propagation. An example is given of a 12-unit network that learns to be the finite-state part of a Turing machine and runs 10 times faster using the subgrouping strategy than the original algorithm.

Download Full-text