Markov Information Bottleneck to Improve Information Flow in Stochastic Neural Networks

While rate distortion theory compresses data under a distortion constraint, information bottleneck (IB) generalizes rate distortion theory to learning problems by replacing a distortion constraint with a constraint of relevant information. In this work, we further extend IB to multiple Markov bottlenecks (i.e., latent variables that form a Markov chain), namely Markov information bottleneck (MIB), which particularly fits better in the context of stochastic neural networks (SNNs) than the original IB. We show that Markov bottlenecks cannot simultaneously achieve their information optimality in a non-collapse MIB, and thus devise an optimality compromise. With MIB, we take the novel perspective that each layer of an SNN is a bottleneck whose learning goal is to encode relevant information in a compressed form from the data. The inference from a hidden layer to the output layer is then interpreted as a variational approximation to the layer’s decoding of relevant information in the MIB. As a consequence of this perspective, the maximum likelihood estimate (MLE) principle in the context of SNNs becomes a special case of the variational MIB. We show that, compared to MLE, the variational MIB can encourage better information flow in SNNs in both principle and practice, and empirically improve performance in classification, adversarial robustness, and multi-modal learning in MNIST.

Download Full-text

State Abstraction as Compression in Apprenticeship Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013134 ◽

2019 ◽

Vol 33 ◽

pp. 3134-3142

Author(s):

David Abel ◽

Dilip Arumugam ◽

Kavosh Asadi ◽

Yuu Jinnai ◽

Michael L. Littman ◽

...

Keyword(s):

Rate Distortion ◽

Sequential Decision Making ◽

Sequential Decision ◽

Rate Distortion Theory ◽

Apprenticeship Learning ◽

Information Bottleneck ◽

State Abstraction ◽

And Performance ◽

Algorithmic Structure ◽

Made In

State abstraction can give rise to models of environments that are both compressed and useful, thereby enabling efficient sequential decision making. In this work, we offer the first formalism and analysis of the trade-off between compression and performance made in the context of state abstraction for Apprenticeship Learning. We build on Rate-Distortion theory, the classic Blahut-Arimoto algorithm, and the Information Bottleneck method to develop an algorithm for computing state abstractions that approximate the optimal tradeoff between compression and performance. We illustrate the power of this algorithmic structure to offer insights into effective abstraction, compression, and reinforcement learning through a mixture of analysis, visuals, and experimentation.

Download Full-text

Nonextensive Variational Principles for Rate Distortion Theory and the Information Bottleneck Method

2007 IEEE Symposium on Foundations of Computational Intelligence ◽

10.1109/foci.2007.371520 ◽

2007 ◽

Cited By ~ 1

Author(s):

R. C. Venkatesan

Keyword(s):

Variational Principles ◽

Rate Distortion ◽

Rate Distortion Theory ◽

Information Bottleneck

Download Full-text

The evolution of lossy compression

Journal of The Royal Society Interface ◽

10.1098/rsif.2017.0166 ◽

2017 ◽

Vol 14 (130) ◽

pp. 20170166 ◽

Cited By ~ 20

Author(s):

Sarah E. Marzen ◽

Simon DeDeo

Keyword(s):

Information Theory ◽

Rate Distortion ◽

Relevant Information ◽

Lossy Compression ◽

High Fidelity ◽

Complex Environments ◽

Environmental Complexity ◽

Trade Off ◽

Rate Distortion Theory ◽

Unstructured Environments

In complex environments, there are costs to both ignorance and perception. An organism needs to track fitness-relevant information about its world, but the more information it tracks, the more resources it must devote to perception. As a first step towards a general understanding of this trade-off, we use a tool from information theory, rate–distortion theory, to study large, unstructured environments with fixed, randomly drawn penalties for stimuli confusion (‘distortions’). We identify two distinct regimes for organisms in these environments: a high-fidelity regime where perceptual costs grow linearly with environmental complexity, and a low-fidelity regime where perceptual costs are, remarkably, independent of the number of environmental states. This suggests that in environments of rapidly increasing complexity, well-adapted organisms will find themselves able to make, just barely, the most subtle distinctions in their environment.

Download Full-text

Information-Theoretic Understanding of Population Risk Improvement with Model Compression

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5730 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3300-3307

Author(s):

Yuheng Bu ◽

Weihao Gao ◽

Shaofeng Zou ◽

Venugopal Veeravalli

Keyword(s):

Neural Networks ◽

Linear Regression ◽

Rate Distortion ◽

Generalization Error ◽

Regularization Technique ◽

Information Theoretic ◽

Rate Distortion Theory ◽

Model Compression ◽

Empirical Risk ◽

Theoretical Results

We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression. We first prove that model compression reduces an information-theoretic bound on the generalization error; this allows for an interpretation of model compression as a regularization technique to avoid overfitting. We then characterize the increase in empirical risk with model compression using rate distortion theory. These results imply that the population risk could be improved by model compression if the decrease in generalization error exceeds the increase in empirical risk. We show through a linear regression example that such a decrease in population risk due to model compression is indeed possible. Our theoretical results further suggest that the Hessian-weighted K-means clustering compression approach can be improved by regularizing the distance between the clustering centers. We provide experiments with neural networks to support our theoretical assertions.

Download Full-text

A Reward-Maximizing Spiking Neuron as a Bounded Rational Decision Maker

Neural Computation ◽

10.1162/neco_a_00758 ◽

2015 ◽

Vol 27 (8) ◽

pp. 1686-1720 ◽

Cited By ~ 5

Author(s):

Felix Leibfried ◽

Daniel A. Braun

Keyword(s):

Mutual Information ◽

Decision Maker ◽

Sensory Input ◽

Rate Distortion ◽

Relevant Information ◽

Motor Output ◽

Spiking Neuron ◽

Rational Decision ◽

Rate Distortion Theory ◽

The Many

Rate distortion theory describes how to communicate relevant information most efficiently over a channel with limited capacity. One of the many applications of rate distortion theory is bounded rational decision making, where decision makers are modeled as information channels that transform sensory input into motor output under the constraint that their channel capacity is limited. Such a bounded rational decision maker can be thought to optimize an objective function that trades off the decision maker’s utility or cumulative reward against the information processing cost measured by the mutual information between sensory input and motor output. In this study, we interpret a spiking neuron as a bounded rational decision maker that aims to maximize its expected reward under the computational constraint that the mutual information between the neuron’s input and output is upper bounded. This abstract computational constraint translates into a penalization of the deviation between the neuron’s instantaneous and average firing behavior. We derive a synaptic weight update rule for such a rate distortion optimizing neuron and show in simulations that the neuron efficiently extracts reward-relevant information from the input by trading off its synaptic strengths against the collected reward.

Download Full-text

Analysis of Non-Linear Activation Functions for Classification Tasks Using Convolutional Neural Networks

Recent Patents on Computer Science ◽

10.2174/2213275911666181025143029 ◽

2019 ◽

Vol 12 (3) ◽

pp. 156-161 ◽

Cited By ~ 3

Author(s):

Aman Dureja ◽

Payal Pahwa

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Primary Objective ◽

Experimental Comparison ◽

Activation Functions ◽

Practical Applications ◽

Network Activation ◽

Non Linear ◽

Hidden Layer

Background: In making the deep neural network, activation functions play an important role. But the choice of activation functions also affects the network in term of optimization and to retrieve the better results. Several activation functions have been introduced in machine learning for many practical applications. But which activation function should use at hidden layer of deep neural networks was not identified. Objective: The primary objective of this analysis was to describe which activation function must be used at hidden layers for deep neural networks to solve complex non-linear problems. Methods: The configuration for this comparative model was used by using the datasets of 2 classes (Cat/Dog). The number of Convolutional layer used in this network was 3 and the pooling layer was also introduced after each layer of CNN layer. The total of the dataset was divided into the two parts. The first 8000 images were mainly used for training the network and the next 2000 images were used for testing the network. Results: The experimental comparison was done by analyzing the network by taking different activation functions on each layer of CNN network. The validation error and accuracy on Cat/Dog dataset were analyzed using activation functions (ReLU, Tanh, Selu, PRelu, Elu) at number of hidden layers. Overall the Relu gave best performance with the validation loss at 25th Epoch 0.3912 and validation accuracy at 25th Epoch 0.8320. Conclusion: It is found that a CNN model with ReLU hidden layers (3 hidden layers here) gives best results and improve overall performance better in term of accuracy and speed. These advantages of ReLU in CNN at number of hidden layers are helpful to effectively and fast retrieval of images from the databases.

Download Full-text

Hardware implementation of radial-basis neural networks with Gaussian activation functions on FPGA

Neural Computing and Applications ◽

10.1007/s00521-021-05706-3 ◽

2021 ◽

Author(s):

Volodymyr Shymkovych ◽

Sergii Telenyk ◽

Petro Kravets

Keyword(s):

Neural Networks ◽

Hardware Implementation ◽

Gaussian Function ◽

Activation Function ◽

Rbf Neural Networks ◽

Activation Functions ◽

Rbf Network ◽

Combination Scheme ◽

Radial Basis ◽

Hidden Layer

AbstractThis article introduces a method for realizing the Gaussian activation function of radial-basis (RBF) neural networks with their hardware implementation on field-programmable gaits area (FPGAs). The results of modeling of the Gaussian function on FPGA chips of different families have been presented. RBF neural networks of various topologies have been synthesized and investigated. The hardware component implemented by this algorithm is an RBF neural network with four neurons of the latent layer and one neuron with a sigmoid activation function on an FPGA using 16-bit numbers with a fixed point, which took 1193 logic matrix gate (LUTs—LookUpTable). Each hidden layer neuron of the RBF network is designed on an FPGA as a separate computing unit. The speed as a total delay of the combination scheme of the block RBF network was 101.579 ns. The implementation of the Gaussian activation functions of the hidden layer of the RBF network occupies 106 LUTs, and the speed of the Gaussian activation functions is 29.33 ns. The absolute error is ± 0.005. The Spartan 3 family of chips for modeling has been used to get these results. Modeling on chips of other series has been also introduced in the article. RBF neural networks of various topologies have been synthesized and investigated. Hardware implementation of RBF neural networks with such speed allows them to be used in real-time control systems for high-speed objects.

Download Full-text

Exponential Synchronization of Stochastic Neural Networks with Time-Varying Delays and Lévy Noises via Event-Triggered Control

Neural Processing Letters ◽

10.1007/s11063-021-10509-7 ◽

2021 ◽

Author(s):

Danni Lu ◽

Dongbing Tong ◽

Qiaoyu Chen ◽

Wuneng Zhou ◽

Jun Zhou ◽

...

Keyword(s):

Neural Networks ◽

Exponential Synchronization ◽

Stochastic Neural Networks ◽

Time Varying ◽

Event Triggered

Download Full-text

Classification of the Microstructural Elements of the Vegetal Tissue of the Pumpkin (Cucurbita pepo L.) Using Convolutional Neural Networks

Applied Sciences ◽

10.3390/app11041581 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1581

Author(s):

Jimy Oblitas ◽

Jezreel Mejia ◽

Miguel De-la-Torre ◽

Himer Avila-George ◽

Lucía Seguí Gil ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Cucurbita Pepo ◽

Performance Metrics ◽

Microstructural Analysis ◽

Relevant Information ◽

Machine Learning Techniques ◽

Radial Basis Neural Networks ◽

Vegetable Foods ◽

Vegetal Tissue

Although knowledge of the microstructure of food of vegetal origin helps us to understand the behavior of food materials, the variability in the microstructural elements complicates this analysis. In this regard, the construction of learning models that represent the actual microstructures of the tissue is important to extract relevant information and advance in the comprehension of such behavior. Consequently, the objective of this research is to compare two machine learning techniques—Convolutional Neural Networks (CNN) and Radial Basis Neural Networks (RBNN)—when used to enhance its microstructural analysis. Two main contributions can be highlighted from this research. First, a method is proposed to automatically analyze the microstructural elements of vegetal tissue; and second, a comparison was conducted to select a classifier to discriminate between tissue structures. For the comparison, a database of microstructural elements images was obtained from pumpkin (Cucurbita pepo L.) micrographs. Two classifiers were implemented using CNN and RBNN, and statistical performance metrics were computed using a 5-fold cross-validation scheme. This process was repeated one hundred times with a random selection of images in each repetition. The comparison showed that the classifiers based on CNN produced a better fit, obtaining F1–score average of 89.42% in front of 83.83% for RBNN. In this study, the performance of classifiers based on CNN was significantly higher compared to those based on RBNN in the discrimination of microstructural elements of vegetable foods.

Download Full-text

Exploiting heterogeneity in operational neural networks by synaptic plasticity

Neural Computing and Applications ◽

10.1007/s00521-020-05543-w ◽

2021 ◽

Author(s):

Serkan Kiranyaz ◽

Junaid Malik ◽

Habib Ben Abdallah ◽

Turker Ince ◽

Alexandros Iosifidis ◽

...

Keyword(s):

Neural Networks ◽

Synaptic Plasticity ◽

Network Model ◽

Neuron Model ◽

Linear Operators ◽

Training Data ◽

Learning Performance ◽

Minimal Network ◽

Hidden Layer ◽

Hidden Neurons

AbstractThe recently proposed network model, Operational Neural Networks (ONNs), can generalize the conventional Convolutional Neural Networks (CNNs) that are homogenous only with a linear neuron model. As a heterogenous network model, ONNs are based on a generalized neuron model that can encapsulate any set of non-linear operators to boost diversity and to learn highly complex and multi-modal functions or spaces with minimal network complexity and training data. However, the default search method to find optimal operators in ONNs, the so-called Greedy Iterative Search (GIS) method, usually takes several training sessions to find a single operator set per layer. This is not only computationally demanding, also the network heterogeneity is limited since the same set of operators will then be used for all neurons in each layer. To address this deficiency and exploit a superior level of heterogeneity, in this study the focus is drawn on searching the best-possible operator set(s) for the hidden neurons of the network based on the “Synaptic Plasticity” paradigm that poses the essential learning theory in biological neurons. During training, each operator set in the library can be evaluated by their synaptic plasticity level, ranked from the worst to the best, and an “elite” ONN can then be configured using the top-ranked operator sets found at each hidden layer. Experimental results over highly challenging problems demonstrate that the elite ONNs even with few neurons and layers can achieve a superior learning performance than GIS-based ONNs and as a result, the performance gap over the CNNs further widens.

Download Full-text