scholarly journals Revisiting Probability Distribution Assumptions for Information Theoretic Feature Selection

2020 ◽  
Vol 34 (04) ◽  
pp. 5908-5915
Author(s):  
Yuan Sun ◽  
Wei Wang ◽  
Michael Kirley ◽  
Xiaodong Li ◽  
Jeffrey Chan

Feature selection has been shown to be beneficial for many data mining and machine learning tasks, especially for big data analytics. Mutual Information (MI) is a well-known information-theoretic approach used to evaluate the relevance of feature subsets and class labels. However, estimating high-dimensional MI poses significant challenges. Consequently, a great deal of research has focused on using low-order MI approximations or computing a lower bound on MI called Variational Information (VI). These methods often require certain assumptions made on the probability distributions of features such that these distributions are realistic yet tractable to compute. In this paper, we reveal two sets of distribution assumptions underlying many MI and VI based methods: Feature Independence Distribution and Geometric Mean Distribution. We systematically analyze their strengths and weaknesses and propose a logical extension called Arithmetic Mean Distribution, which leads to an unbiased and normalised estimation of probability densities. We conduct detailed empirical studies across a suite of 29 real-world classification problems and illustrate improved prediction accuracy of our methods based on the identification of more informative features, thus providing support for our theoretical findings.

2014 ◽  
Vol 14 (11&12) ◽  
pp. 996-1013
Author(s):  
Alexey E. Rastegin

The information-theoretic approach to Bell's theorem is developed with use of the conditional $q$-entropies. The $q$-entropic measures fulfill many similar properties to the standard Shannon entropy. In general, both the locality and noncontextuality notions are usually treated with use of the so-called marginal scenarios. These hypotheses lead to the existence of a joint probability distribution, which marginalizes to all particular ones. Assuming the existence of such a joint probability distribution, we derive the family of inequalities of Bell's type in terms of conditional $q$-entropies for all $q\geq1$. Quantum violations of the new inequalities are exemplified within the Clauser--Horne--Shimony--Holt (CHSH) and Klyachko--Can--Binicio\v{g}lu--Shumovsky (KCBS) scenarios. An extension to the case of $n$-cycle scenario is briefly mentioned. The new inequalities with conditional $q$-entropies allow to expand a class of probability distributions, for which the nonlocality or contextuality can be detected within entropic formulation. The $q$-entropic inequalities can also be useful in analyzing cases with detection inefficiencies. Using two models of such a kind, we consider some potential advantages of the $q$-entropic formulation.


2016 ◽  
Vol 16 (3&4) ◽  
pp. 313-331
Author(s):  
Alexey E. Rastegin

We address an information-theoretic approach to noise and disturbance in quantum measurements. Properties of corresponding probability distributions are characterized by means of both the R´enyi and Tsallis entropies. Related information-theoretic measures of noise and disturbance are introduced. These definitions are based on the concept of conditional entropy. To motivate introduced measures, some important properties of the conditional R´enyi and Tsallis entropies are discussed. There exist several formulations of entropic uncertainty relations for a pair of observables. Trade-off relations for noise and disturbance are derived on the base of known formulations of such a kind.


Sign in / Sign up

Export Citation Format

Share Document