The Deterministic Information Bottleneck

Lossy compression and clustering fundamentally involve a decision about which features are relevant and which are not. The information bottleneck method (IB) by Tishby, Pereira, and Bialek ( 1999 ) formalized this notion as an information-theoretic optimization problem and proposed an optimal trade-off between throwing away as many bits as possible and selectively keeping those that are most important. In the IB, compression is measured by mutual information. Here, we introduce an alternative formulation that replaces mutual information with entropy, which we call the deterministic information bottleneck (DIB) and argue better captures this notion of compression. As suggested by its name, the solution to the DIB problem turns out to be a deterministic encoder, or hard clustering, as opposed to the stochastic encoder, or soft clustering, that is optimal under the IB. We compare the IB and DIB on synthetic data, showing that the IB and DIB perform similarly in terms of the IB cost function, but that the DIB significantly outperforms the IB in terms of the DIB cost function. We also empirically find that the DIB offers a considerable gain in computational efficiency over the IB, over a range of convergence parameters. Our derivation of the DIB also suggests a method for continuously interpolating between the soft clustering of the IB and the hard clustering of the DIB.

Download Full-text

The Effect of Evidence Transfer on Latent Feature Relevance for Clustering

Informatics ◽

10.3390/informatics6020017 ◽

2019 ◽

Vol 6 (2) ◽

pp. 17

Author(s):

Athanasios Davvetas ◽

Iraklis A. Klampanos ◽

Spiros Skiadopoulos ◽

Vangelis Karkaletsis

Keyword(s):

Mutual Information ◽

Ground Truth ◽

Original Data ◽

Information Theoretic ◽

Information Bottleneck ◽

Latent Space ◽

Before And After ◽

Feature Relevance ◽

Latent Representations ◽

Transfer Method

Evidence transfer for clustering is a deep learning method that manipulates the latent representations of an autoencoder according to external categorical evidence with the effect of improving a clustering outcome. Evidence transfer’s application on clustering is designed to be robust when introduced with a low quality of evidence, while increasing the effectiveness of the clustering accuracy during relevant corresponding evidence. We interpret the effects of evidence transfer on the latent representation of an autoencoder by comparing our method to the information bottleneck method. Information bottleneck is an optimisation problem of finding the best tradeoff between maximising the mutual information of data representations and a task outcome while at the same time being effective in compressing the original data source. We posit that the evidence transfer method has essentially the same objective regarding the latent representations produced by an autoencoder. We verify our hypothesis using information theoretic metrics from feature selection in order to perform an empirical analysis over the information that is carried through the bottleneck of the latent space. We use the relevance metric to compare the overall mutual information between the latent representations and the ground truth labels before and after their incremental manipulation, as well as, to study the effects of evidence transfer regarding the significance of each latent feature.

Download Full-text

Bottleneck Problems: An Information and Estimation-Theoretic View

Entropy ◽

10.3390/e22111325 ◽

2020 ◽

Vol 22 (11) ◽

pp. 1325

Author(s):

Shahab Asoodeh ◽

Flavio P. Calmon

Keyword(s):

Mutual Information ◽

Closed Form ◽

Optimization Problems ◽

Source Coding ◽

Auxiliary Variable ◽

Information Theoretic ◽

Bottleneck Problems ◽

Information Bottleneck ◽

Discrete Random Variables ◽

Binary Case

Information bottleneck (IB) and privacy funnel (PF) are two closely related optimization problems which have found applications in machine learning, design of privacy algorithms, capacity problems (e.g., Mrs. Gerber’s Lemma), and strong data processing inequalities, among others. In this work, we first investigate the functional properties of IB and PF through a unified theoretical framework. We then connect them to three information-theoretic coding problems, namely hypothesis testing against independence, noisy source coding, and dependence dilution. Leveraging these connections, we prove a new cardinality bound on the auxiliary variable in IB, making its computation more tractable for discrete random variables. In the second part, we introduce a general family of optimization problems, termed “bottleneck problems”, by replacing mutual information in IB and PF with other notions of mutual information, namely f-information and Arimoto’s mutual information. We then argue that, unlike IB and PF, these problems lead to easily interpretable guarantees in a variety of inference tasks with statistical constraints on accuracy and privacy. While the underlying optimization problems are non-convex, we develop a technique to evaluate bottleneck problems in closed form by equivalently expressing them in terms of lower convex or upper concave envelope of certain functions. By applying this technique to a binary case, we derive closed form expressions for several bottleneck problems.

Download Full-text

Efficient compression in color naming and its evolution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1800521115 ◽

2018 ◽

Vol 115 (31) ◽

pp. 7937-7942 ◽

Cited By ~ 28

Author(s):

Noga Zaslavsky ◽

Charles Kemp ◽

Terry Regier ◽

Naftali Tishby

Keyword(s):

Structural Phase ◽

Color Space ◽

Language Variation ◽

Color Naming ◽

Color Category ◽

Trade Off ◽

Information Theoretic ◽

Information Bottleneck ◽

Single Process ◽

Cross Language

We derive a principled information-theoretic account of cross-language semantic variation. Specifically, we argue that languages efficiently compress ideas into words by optimizing the information bottleneck (IB) trade-off between the complexity and accuracy of the lexicon. We test this proposal in the domain of color naming and show that (i) color-naming systems across languages achieve near-optimal compression; (ii) small changes in a single trade-off parameter account to a large extent for observed cross-language variation; (iii) efficient IB color-naming systems exhibit soft rather than hard category boundaries and often leave large regions of color space inconsistently named, both of which phenomena are found empirically; and (iv) these IB systems evolve through a sequence of structural phase transitions, in a single process that captures key ideas associated with different accounts of color category evolution. These results suggest that a drive for information-theoretic efficiency may shape color-naming systems across languages. This principle is not specific to color, and so it may also apply to cross-language variation in other semantic domains.

Download Full-text

Do galactic bars depend on environment?: an information theoretic analysis of Galaxy Zoo 2

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3665 ◽

2020 ◽

Vol 501 (1) ◽

pp. 994-1001

Author(s):

Suman Sarkar ◽

Biswajit Pandey ◽

Snehasish Bhattacharjee

Keyword(s):

Spatial Distribution ◽

Mutual Information ◽

Local Density ◽

Statistical Significance ◽

Distribution Functions ◽

Cumulative Distribution ◽

Host Galaxy ◽

Data Sets ◽

Data Set ◽

Information Theoretic

ABSTRACT We use an information theoretic framework to analyse data from the Galaxy Zoo 2 project and study if there are any statistically significant correlations between the presence of bars in spiral galaxies and their environment. We measure the mutual information between the barredness of galaxies and their environments in a volume limited sample (Mr ≤ −21) and compare it with the same in data sets where (i) the bar/unbar classifications are randomized and (ii) the spatial distribution of galaxies are shuffled on different length scales. We assess the statistical significance of the differences in the mutual information using a t-test and find that both randomization of morphological classifications and shuffling of spatial distribution do not alter the mutual information in a statistically significant way. The non-zero mutual information between the barredness and environment arises due to the finite and discrete nature of the data set that can be entirely explained by mock Poisson distributions. We also separately compare the cumulative distribution functions of the barred and unbarred galaxies as a function of their local density. Using a Kolmogorov–Smirnov test, we find that the null hypothesis cannot be rejected even at $75{{\ \rm per\ cent}}$ confidence level. Our analysis indicates that environments do not play a significant role in the formation of a bar, which is largely determined by the internal processes of the host galaxy.

Download Full-text

Multi-Domain Communication Systems and Networks: A Tensor-Based Approach

Network ◽

10.3390/network1020005 ◽

2021 ◽

Vol 1 (2) ◽

pp. 50-74

Author(s):

Divyanshu Pandey ◽

Adithya Venugopal ◽

Harry Leib

Keyword(s):

Communication Systems ◽

Physical Layer ◽

Mathematical Framework ◽

Trade Off ◽

Information Theoretic ◽

Time Frequency ◽

Tensor Formulation ◽

Processing Techniques ◽

Multiple Domains ◽

Signal Processing Techniques

Most modern communication systems, such as those intended for deployment in IoT applications or 5G and beyond networks, utilize multiple domains for transmission and reception at the physical layer. Depending on the application, these domains can include space, time, frequency, users, code sequences, and transmission media, to name a few. As such, the design criteria of future communication systems must be cognizant of the opportunities and the challenges that exist in exploiting the multi-domain nature of the signals and systems involved for information transmission. Focussing on the Physical Layer, this paper presents a novel mathematical framework using tensors, to represent, design, and analyze multi-domain systems. Various domains can be integrated into the transceiver design scheme using tensors. Tools from multi-linear algebra can be used to develop simultaneous signal processing techniques across all the domains. In particular, we present tensor partial response signaling (TPRS) which allows the introduction of controlled interference within elements of a domain and also across domains. We develop the TPRS system using the tensor contracted convolution to generate a multi-domain signal with desired spectral and cross-spectral properties across domains. In addition, by studying the information theoretic properties of the multi-domain tensor channel, we present the trade-off between different domains that can be harnessed using this framework. Numerical examples for capacity and mean square error are presented to highlight the domain trade-off revealed by the tensor formulation. Furthermore, an application of the tensor framework to MIMO Generalized Frequency Division Multiplexing (GFDM) is also presented.

Download Full-text

A New Information-Theoretic Measure to Control the Robustness-Sensitivity Trade-Off for DMFFD Point-Set Registration

Lecture Notes in Computer Science - Information Processing in Medical Imaging ◽

10.1007/978-3-642-02498-6_18 ◽

2009 ◽

pp. 215-226 ◽

Cited By ~ 3

Author(s):

Nicholas J. Tustison ◽

Suyash P. Awate ◽

Gang Song ◽

Tessa S. Cook ◽

James C. Gee

Keyword(s):

Trade Off ◽

Information Theoretic ◽

Point Set Registration ◽

New Information ◽

Point Set

Download Full-text

Bulk private curves require large conditional mutual information

Journal of High Energy Physics ◽

10.1007/jhep09(2021)042 ◽

2021 ◽

Vol 2021 (9) ◽

Author(s):

Alex May

Keyword(s):

Mutual Information ◽

Boundary Region ◽

Theoretic Approach ◽

Strong Correlations ◽

Conditional Mutual Information ◽

Information Theoretic ◽

Causal Curve ◽

Resource Requirements ◽

Theoretic Argument ◽

Information Theoretic Approach

Abstract We prove a theorem showing that the existence of “private” curves in the bulk of AdS implies two regions of the dual CFT share strong correlations. A private curve is a causal curve which avoids the entanglement wedge of a specified boundary region $$ \mathcal{U} $$ U . The implied correlation is measured by the conditional mutual information $$ I\left({\mathcal{V}}_1:\left.{\mathcal{V}}_2\right|\mathcal{U}\right) $$ I V 1 : V 2 U , which is O(1/GN) when a private causal curve exists. The regions $$ {\mathcal{V}}_1 $$ V 1 and $$ {\mathcal{V}}_2 $$ V 2 are specified by the endpoints of the causal curve and the placement of the region $$ \mathcal{U} $$ U . This gives a causal perspective on the conditional mutual information in AdS/CFT, analogous to the causal perspective on the mutual information given by earlier work on the connected wedge theorem. We give an information theoretic argument for our theorem, along with a bulk geometric proof. In the geometric perspective, the theorem follows from the maximin formula and entanglement wedge nesting. In the information theoretic approach, the theorem follows from resource requirements for sending private messages over a public quantum channel.

Download Full-text

Unsupervised Learning via Total Correlation Explanation

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/740 ◽

2017 ◽

Cited By ~ 2

Author(s):

Greg Ver Steeg

Keyword(s):

Mutual Information ◽

Unsupervised Learning ◽

Supervised Learning ◽

Human Behavior ◽

Learning Problems ◽

Information Theoretic ◽

Sensory Environment ◽

Total Correlation ◽

Environment Dependence ◽

Multivariate Mutual Information

Learning by children and animals occurs effortlessly and largely without obvious supervision. Successes in automating supervised learning have not translated to the more ambiguous realm of unsupervised learning where goals and labels are not provided. Barlow (1961) suggested that the signal that brains leverage for unsupervised learning is dependence, or redundancy, in the sensory environment. Dependence can be characterized using the information-theoretic multivariate mutual information measure called total correlation. The principle of Total Cor-relation Ex-planation (CorEx) is to learn representations of data that "explain" as much dependence in the data as possible. We review some manifestations of this principle along with successes in unsupervised learning problems across diverse domains including human behavior, biology, and language.

Download Full-text

Synthetic Business Microdata

Journal of Privacy and Confidentiality ◽

10.29012/jpc.733 ◽

2020 ◽

Vol 10 (2) ◽

Author(s):

Chien-Hung Chien ◽

Alan Hepburn Welsh ◽

John D Moore

Keyword(s):

Synthetic Data ◽

Data Access ◽

Australian Bureau ◽

Trade Off ◽

Information Reduction ◽

Input And Output ◽

Business Data ◽

Business Survey ◽

Australian Bureau Of Statistics ◽

Inform Decision Making

Enhancing microdata access is one of the strategic priorities for the Australian Bureau of Statistics (ABS) in its transformation program. However, balancing the trade-off between enhancing data access and protecting confidentiality is a delicate act. The ABS could use synthetic data to make its business microdata more accessible for researchers to inform decision making while maintaining confidentiality. This study explores the synthetic data approach for the release and analysis of business data. Australian businesses in some industries are characterised by oligopoly or duopoly. This means the existing microdata protection techniques such as information reduction or perturbation may not be as effective as for household microdata. The research focuses on addressing the following questions: Can a synthetic data approach enhance microdata access for the longitudinal business data? What is the utility and protection trade-off using the synthetic data approach? The study compares confidentialised input and output approaches for protecting confidentiality and analysing Australian microdata from business survey or administrative data sources.

Download Full-text

An Information-Theoretic Account of Semantic Interference in Word Production

Frontiers in Psychology ◽

10.3389/fpsyg.2021.672408 ◽

2021 ◽

Vol 12 ◽

Author(s):

Richard Futrell

Keyword(s):

Mutual Information ◽

Semantic Similarity ◽

Rate Distortion ◽

Word Production ◽

Interference Effects ◽

Semantic Interference ◽

Information Theoretic ◽

Human Data ◽

Level Model ◽

Computational Level

I present a computational-level model of semantic interference effects in online word production within a rate–distortion framework. I consider a bounded-rational agent trying to produce words. The agent's action policy is determined by maximizing accuracy in production subject to computational constraints. These computational constraints are formalized using mutual information. I show that semantic similarity-based interference among words falls out naturally from this setup, and I present a series of simulations showing that the model captures some of the key empirical patterns observed in Stroop and Picture–Word Interference paradigms, including comparisons to human data from previous experiments.

Download Full-text