Statistical estimation of conditional Shannon entropy
The new estimates of the conditional Shannon entropy are introduced in the framework of the model describing a discrete response variable depending on a vector of d factors having a density w.r.t. the Lebesgue measure in ℝd. Namely, the mixed-pair model (X, Y ) is considered where X and Y take values in ℝd and an arbitrary finite set, respectively. Such models include, for instance, the famous logistic regression. In contrast to the well-known Kozachenko–Leonenko estimates of unconditional entropy the proposed estimates are constructed by means of the certain spacial order statistics (or k-nearest neighbor statistics where k = kn depends on amount of observations n) and a random number of i.i.d. observations contained in the balls of specified random radii. The asymptotic unbiasedness and L2-consistency of the new estimates are established under simple conditions. The obtained results can be applied to the feature selection problem which is important, e.g., for medical and biological investigations.