Joint-Conditional Entropy and Mutual Information Estimation Involving Three Random Variables and asymptotic Normality

Author(s):  
Amadou Diadie Ba ◽  
Gane Samb Lo
Author(s):  
M. Vidyasagar

This chapter provides an introduction to some elementary aspects of information theory, including entropy in its various forms. Entropy refers to the level of uncertainty associated with a random variable (or more precisely, the probability distribution of the random variable). When there are two or more random variables, it is worthwhile to study the conditional entropy of one random variable with respect to another. The last concept is relative entropy, also known as the Kullback–Leibler divergence, which measures the “disparity” between two probability distributions. The chapter first considers convex and concave functions before discussing the properties of the entropy function, conditional entropy, uniqueness of the entropy function, and the Kullback–Leibler divergence.


1973 ◽  
Vol 10 (4) ◽  
pp. 837-846 ◽  
Author(s):  
P. A. P. Moran

A central limit theorem is proved for the sum of random variables Xr all having the same form of distribution and each of which depends on a parameter which is the number occurring in the rth cell of a multinomial distribution with equal probabilities in N cells and a total n, where nN–1 tends to a non-zero constant. This result is used to prove the asymptotic normality of the distribution of the fractional volume of a large cube which is not covered by N interpenetrating spheres whose centres are at random, and for which NV–1 tends to a non-zero constant. The same theorem can be used to prove asymptotic normality for a large number of occupancy problems.


2009 ◽  
Vol 21 (3) ◽  
pp. 688-703 ◽  
Author(s):  
Vincent Q. Vu ◽  
Bin Yu ◽  
Robert E. Kass

Information estimates such as the direct method of Strong, Koberle, de Ruyter van Steveninck, and Bialek (1998) sidestep the difficult problem of estimating the joint distribution of response and stimulus by instead estimating the difference between the marginal and conditional entropies of the response. While this is an effective estimation strategy, it tempts the practitioner to ignore the role of the stimulus and the meaning of mutual information. We show here that as the number of trials increases indefinitely, the direct (or plug-in) estimate of marginal entropy converges (with probability 1) to the entropy of the time-averaged conditional distribution of the response, and the direct estimate of the conditional entropy converges to the time-averaged entropy of the conditional distribution of the response. Under joint stationarity and ergodicity of the response and stimulus, the difference of these quantities converges to the mutual information. When the stimulus is deterministic or nonstationary the direct estimate of information no longer estimates mutual information, which is no longer meaningful, but it remains a measure of variability of the response distribution across time.


1997 ◽  
Vol 07 (01) ◽  
pp. 97-105 ◽  
Author(s):  
Gustavo Deco ◽  
Christian Schittenkopf ◽  
Bernd Schürmann

We introduce an information-theory-based concept for the characterization of the information flow in chaotic systems in the framework of symbolic dynamics for finite and infinitesimal measurement resolutions. The information flow characterizes the loss of information about the initial conditions, i.e. the decay of statistical correlations (i.e. nonlinear and non-Gaussian) between the entire past and a point p steps into the future as a function of p. In the case where the partition generating the symbolic dynamics is finite, the information loss is measured by the mutual information that measures the statistical correlations between the entire past and a point p steps into the future. When the partition used is a generator and only one step ahead is observed (p = 1), our definition includes the Kolmogorov–Sinai entropy concept. The profiles in p of the mutual information describe the short- and long-range forecasting possibilities for the given partition resolution. For chaos it is more relevant to study the information loss for the case of infinitesimal partitions which characterizes the intrinsic behavior of the dynamics on an extremely fine scale. Due to the divergence of the mutual information for infinitesimal partitions, the "intrinsic" information flow is characterized by the conditional entropy which generalizes the Kolmogorov–Sinai entropy for the case of observing the uncertainty more than one step into the future. The intrinsic information flow offers an instrument for characterizing deterministic chaos by the transmission of information from the past to the future.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Guoping Zeng

There are various definitions of mutual information. Essentially, these definitions can be divided into two classes: (1) definitions with random variables and (2) definitions with ensembles. However, there are some mathematical flaws in these definitions. For instance, Class 1 definitions either neglect the probability spaces or assume the two random variables have the same probability space. Class 2 definitions redefine marginal probabilities from the joint probabilities. In fact, the marginal probabilities are given from the ensembles and should not be redefined from the joint probabilities. Both Class 1 and Class 2 definitions assume a joint distribution exists. Yet, they all ignore an important fact that the joint or the joint probability measure is not unique. In this paper, we first present a new unified definition of mutual information to cover all the various definitions and to fix their mathematical flaws. Our idea is to define the joint distribution of two random variables by taking the marginal probabilities into consideration. Next, we establish some properties of the newly defined mutual information. We then propose a method to calculate mutual information in machine learning. Finally, we apply our newly defined mutual information to credit scoring.


2015 ◽  
Vol 47 (04) ◽  
pp. 1175-1189 ◽  
Author(s):  
Raúl Gouet ◽  
F. Javier López ◽  
Gerardo Sanz

We prove strong convergence and asymptotic normality for the record and the weak record rate of observations of the form Y n = X n + T n , n ≥ 1, where (X n ) n ∈ Z is a stationary ergodic sequence of random variables and (T n ) n ≥ 1 is a stochastic trend process with stationary ergodic increments. The strong convergence result follows from the Dubins-Freedman law of large numbers and Birkhoff's ergodic theorem. For the asymptotic normality we rely on the approach of Ballerini and Resnick (1987), coupled with a moment bound for stationary sequences, which is used to deal with the random trend process. Examples of applications are provided. In particular, we obtain strong convergence and asymptotic normality for the number of ladder epochs in a random walk with stationary ergodic increments.


Sign in / Sign up

Export Citation Format

Share Document