Tree Sampling Divergence: An Information-Theoretic Metric for Hierarchical Graph Clustering

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/286 ◽

2019 ◽

Author(s):

Bertrand Charpentier ◽

Thomas Bonald

Keyword(s):

Hierarchical Clustering ◽

Hierarchical Structure ◽

Probability Distributions ◽

Fundamental Property ◽

Information Loss ◽

Perfect Reconstruction ◽

Compact Representation ◽

Information Theoretic ◽

Graph Reconstruction ◽

Leibler Divergence

We introduce the tree sampling divergence (TSD), an information-theoretic metric for assessing the quality of the hierarchical clustering of a graph. Any hierarchical clustering of a graph can be represented as a tree whose nodes correspond to clusters of the graph. The TSD is the Kullback-Leibler divergence between two probability distributions over the nodes of this tree: those induced respectively by sampling at random edges and node pairs of the graph. A fundamental property of the proposed metric is that it is interpretable in terms of graph reconstruction. Specifically, it quantifies the ability to reconstruct the graph from the tree in terms of information loss. In particular, the TSD is maximum when perfect reconstruction is feasible, i.e., when the graph has a complete hierarchical structure. Another key property of TSD is that it applies to any tree, not necessarily binary. In particular, the TSD can be used to compress a binary tree while minimizing the information loss in terms of graph reconstruction, so as to get a compact representation of the hierarchical structure of a graph. We illustrate the behavior of TSD compared to existing metrics on experiments based on both synthetic and real datasets.

Download Full-text

Large-sample confidence intervals of information-theoretic measures in linguistics

Journal of Research Design and Statistics in Linguistics and Communication Science ◽

10.1558/jrds.40134 ◽

2020 ◽

Vol 6 (1) ◽

pp. 19-54

Author(s):

Ryan Ka Yau Lai ◽

Youngah Do

Keyword(s):

Maximum Likelihood ◽

Corpus Linguistics ◽

Delta Method ◽

Confidence Bounds ◽

Likelihood Estimator ◽

Information Theoretic ◽

Leibler Divergence ◽

Information Theoretic Measures ◽

Data Points ◽

Measure Of Uncertainty

This article explores a method of creating confidence bounds for information-theoretic measures in linguistics, such as entropy, Kullback-Leibler Divergence (KLD), and mutual information. We show that a useful measure of uncertainty can be derived from simple statistical principles, namely the asymptotic distribution of the maximum likelihood estimator (MLE) and the delta method. Three case studies from phonology and corpus linguistics are used to demonstrate how to apply it and examine its robustness against common violations of its assumptions in linguistics, such as insufficient sample size and non-independence of data points.

Download Full-text

Scalable information-theoretic path planning for a rover-helicopter team in uncertain environments

International Journal of Advanced Robotic Systems ◽

10.1177/1729881421999587 ◽

2021 ◽

Vol 18 (2) ◽

pp. 172988142199958

Author(s):

Larkin Folsom ◽

Masahiro Ono ◽

Kyohei Otsu ◽

Hyoshin Park

Keyword(s):

Path Planning ◽

Lower Risk ◽

Information Gain ◽

Probability Distributions ◽

Single Agent ◽

Uncertain Environments ◽

Information Theoretic ◽

Highly Correlated ◽

Naive Approach

Mission-critical exploration of uncertain environments requires reliable and robust mechanisms for achieving information gain. Typical measures of information gain such as Shannon entropy and KL divergence are unable to distinguish between different bimodal probability distributions or introduce bias toward one mode of a bimodal probability distribution. The use of a standard deviation (SD) metric reduces bias while retaining the ability to distinguish between higher and lower risk distributions. Areas of high SD can be safely explored through observation with an autonomous Mars Helicopter allowing safer and faster path plans for ground-based rovers. First, this study presents a single-agent information-theoretic utility-based path planning method for a highly correlated uncertain environment. Then, an information-theoretic two-stage multiagent rapidly exploring random tree framework is presented, which guides Mars helicopter through regions of high SD to reduce uncertainty for the rover. In a Monte Carlo simulation, we compare our information-theoretic framework with a rover-only approach and a naive approach, in which the helicopter scouts ahead of the rover along its planned path. Finally, the model is demonstrated in a case study on the Jezero region of Mars. Results show that the information-theoretic helicopter improves the travel time for the rover on average when compared with the rover alone or with the helicopter scouting ahead along the rover’s initially planned route.

Download Full-text

Duality Theorem for a Minimax Vector Optimization Problem

10.21203/rs.3.rs-176902/v1 ◽

2021 ◽

Author(s):

Jacob Atticus Armstrong Goodall

Keyword(s):

Vector Optimization ◽

Duality Theorem ◽

Optimization Problem ◽

Optimal Transport ◽

Probability Distributions ◽

Vector Optimization Problem ◽

Complexity Science ◽

Polish Spaces ◽

Leibler Divergence ◽

Mathematical Optimisation

Abstract A duality theorem is stated and proved for a minimax vector optimization problem where the vectors are elements of the set of products of compact Polish spaces. A special case of this theorem is derived to show that two metrics on the space of probability distributions on countable products of Polish spaces are identical. The appendix includes a proof that, under the appropriate conditions, the function studied in the optimisation problem is indeed a metric. The optimisation problem is comparable to multi-commodity optimal transport where there is dependence between commodities. This paper builds on the work of R.S. MacKay who introduced the metrics in the context of complexity science in [4] and [5]. The metrics have the advantage of measuring distance uniformly over the whole network while other metrics on probability distributions fail to do so (e.g total variation, Kullback–Leibler divergence, see [5]). This opens up the potential of mathematical optimisation in the setting of complexity science.

Download Full-text

Information Loss in Riffle Shuffling

Combinatorics Probability Computing ◽

10.1017/s0963548301004990 ◽

2002 ◽

Vol 11 (1) ◽

pp. 79-95 ◽

Cited By ~ 2

Author(s):

DUDLEY STARK ◽

A. GANESH ◽

NEIL O’CONNELL

Keyword(s):

Asymptotic Behaviour ◽

Relative Entropy ◽

Numerical Study ◽

Information Loss ◽

Information Theoretic

We study the asymptotic behaviour of the relative entropy (to stationarity) for a commonly used model for riffle shuffling a deck of n cards m times. Our results establish and were motivated by a prediction in a recent numerical study of Trefethen and Trefethen. Loosely speaking, the relative entropy decays approximately linearly (in m) for m < log2n, and approximately exponentially for m > log2n. The deck becomes random in this information-theoretic sense after m = 3/2 log2n shuffles.

Download Full-text

Introduction to Information Theory

Hidden Markov Processes ◽

10.23943/princeton/9780691133157.003.0002 ◽

2014 ◽

Author(s):

M. Vidyasagar

Keyword(s):

Information Theory ◽

Probability Distribution ◽

Relative Entropy ◽

Probability Distributions ◽

Random Variables ◽

Conditional Entropy ◽

Random Variable ◽

Entropy Function ◽

Concave Functions ◽

Leibler Divergence

This chapter provides an introduction to some elementary aspects of information theory, including entropy in its various forms. Entropy refers to the level of uncertainty associated with a random variable (or more precisely, the probability distribution of the random variable). When there are two or more random variables, it is worthwhile to study the conditional entropy of one random variable with respect to another. The last concept is relative entropy, also known as the Kullback–Leibler divergence, which measures the “disparity” between two probability distributions. The chapter first considers convex and concave functions before discussing the properties of the entropy function, conditional entropy, uniqueness of the entropy function, and the Kullback–Leibler divergence.

Download Full-text

Wormhole calculus, replicas, and entropies

Journal of High Energy Physics ◽

10.1007/jhep09(2020)194 ◽

2020 ◽

Vol 2020 (9) ◽

Cited By ~ 2

Author(s):

Steven B. Giddings ◽

Gustavo J. Turiaci

Keyword(s):

Black Hole ◽

Probability Distributions ◽

Information Loss ◽

Coupling Constants ◽

Quantum Mechanical ◽

Standard Description ◽

Standard Quantum ◽

Black Hole Information ◽

Baby Universes ◽

Baby Universe

Abstract We investigate contributions of spacetime wormholes, describing baby universe emission and absorption, to calculations of entropies and correlation functions, for example those based on the replica method. We find that the rules of the “wormhole calculus”, developed in the 1980s, together with standard quantum mechanical prescriptions for computing entropies and correlators, imply definite rules for limited patterns of connection between replica factors in simple calculations. These results stand in contrast with assumptions that all topologies connecting replicas should be summed over, and call into question the explanation for the latter. In a “free” approximation baby universes introduce probability distributions for coupling constants, and we review and extend arguments that successive experiments in a “parent” universe increasingly precisely fix such couplings, resulting in ultimately pure evolution. Once this has happened, the nontrivial question remains of how topology-changing effects can modify the standard description of black hole information loss.

Download Full-text

Products of weighted logic programs

Theory and Practice of Logic Programming ◽

10.1017/s1471068410000529 ◽

2011 ◽

Vol 11 (2-3) ◽

pp. 263-296 ◽

Cited By ~ 1

Author(s):

SHAY B. COHEN ◽

ROBERT J. SIMMONS ◽

NOAH A. SMITH

Keyword(s):

Machine Learning ◽

Dynamic Programming ◽

Logic Programming ◽

Scoring Function ◽

Logic Programs ◽

Information Theoretic ◽

Optimal Score ◽

Leibler Divergence ◽

Programming Algorithms ◽

Output Space

AbstractWeighted logic programming, a generalization of bottom-up logic programming, is a well-suited framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm's output space, such as a path through a graph or a grammatical derivation, and are given a real-valued score (often interpreted as a probability) that depends on the real weights of the base axioms used in the proof. The desired output is a function over all possible proofs, such as a sum of scores or an optimal score. We describe the product transformation, which can merge two weighted logic programs into a new one. The resulting program optimizes a product of proof scores from the original programs, constituting a scoring function known in machine learning as a “product of experts.” Through the addition of intuitive constraining side conditions, we show that several important dynamic programming algorithms can be derived by applying product to weighted logic programs corresponding to simpler weighted logic programs. In addition, we show how the computation of Kullback–Leibler divergence, an information-theoretic measure, can be interpreted using product.

Download Full-text

Schrödinger-Iike equations for spin measurements, states, and wave functions, based on the statistical approach of Guiasu

Canadian Journal of Physics ◽

10.1139/p94-021 ◽

1994 ◽

Vol 72 (3-4) ◽

pp. 130-133

Author(s):

Paul B. Slater

Keyword(s):

Statistical Estimation ◽

Statistical Approach ◽

Probability Distributions ◽

Wave Functions ◽

The Other ◽

Weight Functions ◽

Information Theoretic ◽

Mixed Spin ◽

Spin Measurements ◽

Schrödinger Form

Guiasu employed a statistical estimation principle to derive time-independent Schrödinger equations for the position but, as is usual, not the spin of a particle. Here, on the other hand, this principle is used to obtain Schrödinger-like equations for the spin but not the position of a particle. Steady states are described by continuous probability distributions, obtained by information-theoretic arguments, over spin measurements, states, and wave functions. These distributions serve as weight functions for orthogonal polynomials. Associated "wave functions," products of the polynomials and the square root of the weight function, satisfy differential equations, reducing to time-independent Schrödinger form at the point corresponding to the fully mixed spin-1/2 state.

Download Full-text

Accounting for Observational Uncertainty in Forecast Verification: An Information-Theoretical View on Forecasts, Observations, and Truth

Monthly Weather Review ◽

10.1175/2011mwr3573.1 ◽

2011 ◽

Vol 139 (7) ◽

pp. 2156-2162 ◽

Cited By ~ 23

Author(s):

Steven V. Weijs ◽

Nick van de Giesen

Keyword(s):

Information Loss ◽

Forecast Skill ◽

Cross Entropy ◽

Forecast Verification ◽

Noise Process ◽

Observational Uncertainty ◽

Leibler Divergence ◽

Theoretical View ◽

The Difference ◽

Observation Uncertainty

Abstract Recently, an information-theoretical decomposition of Kullback–Leibler divergence into uncertainty, reliability, and resolution was introduced. In this article, this decomposition is generalized to the case where the observation is uncertain. Along with a modified decomposition of the divergence score, a second measure, the cross-entropy score, is presented, which measures the estimated information loss with respect to the truth instead of relative to the uncertain observations. The difference between the two scores is equal to the average observational uncertainty and vanishes when observations are assumed to be perfect. Not acknowledging for observation uncertainty can lead to both overestimation and underestimation of forecast skill, depending on the nature of the noise process.

Download Full-text

Image Retrieval Using Mutual Information

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.1537 ◽

2013 ◽

Vol 427-429 ◽

pp. 1537-1543 ◽

Cited By ~ 2

Author(s):

Ya Fen Wang ◽

Feng Zhen Zhang ◽

Shan Jian Liu ◽

Meng Huang

Keyword(s):

Mutual Information ◽

Image Retrieval ◽

Image Similarity ◽

Theoretic Approach ◽

Information Theoretic ◽

Content Base Image Retrieval ◽

Leibler Divergence ◽

Similar Images ◽

Information Theoretic Approach ◽

The Given

In this paper, we study an information theoretic approach to image similarity measurement for content-base image retrieval. In this novel scheme, similarities are measured by the amount of information the images contained about one another mutual information (MI). The given approach is based on the premise that two similar images should have high mutual information, or equivalently, the querying image should convey high information about those similar to it. The method first generates a set of statistically representative visual patterns and uses the distributions of these patterns as images content descriptors. To measure the similarity of two images, we develop a method to compute the mutual information between their content descriptors. Two images with larger descriptor mutual information are regarded as more similar. We present experimental results, which demonstrate that mutual information is a more effective image similarity measure than those have been used in the literature such as Kullback-Leibler divergence and L2 norms.

Download Full-text