scholarly journals Jackknife approach to the estimation of mutual information

2018 ◽  
Vol 115 (40) ◽  
pp. 9956-9961 ◽  
Author(s):  
Xianli Zeng ◽  
Yingcun Xia ◽  
Howell Tong

Quantifying the dependence between two random variables is a fundamental issue in data analysis, and thus many measures have been proposed. Recent studies have focused on the renowned mutual information (MI) [Reshef DN, et al. (2011)Science334:1518–1524]. However, “Unfortunately, reliably estimating mutual information from finite continuous data remains a significant and unresolved problem” [Kinney JB, Atwal GS (2014)Proc Natl Acad Sci USA111:3354–3359]. In this paper, we examine the kernel estimation of MI and show that the bandwidths involved should be equalized. We consider a jackknife version of the kernel estimate with equalized bandwidth and allow the bandwidth to vary over an interval. We estimate the MI by the largest value among these kernel estimates and establish the associated theoretical underpinnings.

Author(s):  
Dafydd Evans

Mutual information quantifies the determinism that exists in a relationship between random variables, and thus plays an important role in exploratory data analysis. We investigate a class of non-parametric estimators for mutual information, based on the nearest neighbour structure of observations in both the joint and marginal spaces. Unless both marginal spaces are one-dimensional, we demonstrate that a well-known estimator of this type can be computationally expensive under certain conditions, and propose a computationally efficient alternative that has a time complexity of order ( N  log  N ) as the number of observations N →∞.


Symmetry ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 1415
Author(s):  
Jesús E. García ◽  
Verónica A. González-López

In this paper, we show how the longest non-decreasing subsequence, identified in the graph of the paired marginal ranks of the observations, allows the construction of a statistic for the development of an independence test in bivariate vectors. The test works in the case of discrete and continuous data. Since the present procedure does not require the continuity of the variables, it expands the proposal introduced in Independence tests for continuous random variables based on the longest increasing subsequence (2014). We show the efficiency of the procedure in detecting dependence in real cases and through simulations.


1976 ◽  
Vol 8 (04) ◽  
pp. 806-819 ◽  
Author(s):  
B. W. Silverman

Families of exchangeably dissociated random variables are defined and discussed. These include families of the form g(Yi, Yj , …, Yz ) for some function g of m arguments and some sequence Yn of i.i.d. random variables on any suitable space. A central limit theorem for exchangeably dissociated random variables is proved and some remarks on the closeness of the normal approximation are made. The weak convergence of the empirical distribution process to a Gaussian process is proved. Some applications to data analysis are discussed.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Guoping Zeng

There are various definitions of mutual information. Essentially, these definitions can be divided into two classes: (1) definitions with random variables and (2) definitions with ensembles. However, there are some mathematical flaws in these definitions. For instance, Class 1 definitions either neglect the probability spaces or assume the two random variables have the same probability space. Class 2 definitions redefine marginal probabilities from the joint probabilities. In fact, the marginal probabilities are given from the ensembles and should not be redefined from the joint probabilities. Both Class 1 and Class 2 definitions assume a joint distribution exists. Yet, they all ignore an important fact that the joint or the joint probability measure is not unique. In this paper, we first present a new unified definition of mutual information to cover all the various definitions and to fix their mathematical flaws. Our idea is to define the joint distribution of two random variables by taking the marginal probabilities into consideration. Next, we establish some properties of the newly defined mutual information. We then propose a method to calculate mutual information in machine learning. Finally, we apply our newly defined mutual information to credit scoring.


Sign in / Sign up

Export Citation Format

Share Document