scholarly journals Statistical estimation of conditional Shannon entropy

2019 ◽  
Vol 23 ◽  
pp. 350-386 ◽  
Author(s):  
Alexander Bulinski ◽  
Alexey Kozhevin

The new estimates of the conditional Shannon entropy are introduced in the framework of the model describing a discrete response variable depending on a vector of d factors having a density w.r.t. the Lebesgue measure in ℝd. Namely, the mixed-pair model (X, Y ) is considered where X and Y take values in ℝd and an arbitrary finite set, respectively. Such models include, for instance, the famous logistic regression. In contrast to the well-known Kozachenko–Leonenko estimates of unconditional entropy the proposed estimates are constructed by means of the certain spacial order statistics (or k-nearest neighbor statistics where k = kn depends on amount of observations n) and a random number of i.i.d. observations contained in the balls of specified random radii. The asymptotic unbiasedness and L2-consistency of the new estimates are established under simple conditions. The obtained results can be applied to the feature selection problem which is important, e.g., for medical and biological investigations.

2021 ◽  
Vol 12 (2) ◽  
pp. 85-99
Author(s):  
Nassima Dif ◽  
Zakaria Elberrichi

Hybrid metaheuristics has received a lot of attention lately to solve combinatorial optimization problems. The purpose of hybridization is to create a cooperation between metaheuristics for better solutions. Most proposed works were interested in static hybridization. The objective of this work is to propose a novel dynamic hybridization method (GPBD) that generates the most suitable sequential hybridization between GA, PSO, BAT, and DE metaheuristics, according to each problem. The authors choose to test this approach for solving the best feature selection problem in a wrapper tactic, performed on face image recognition datasets, with the k-nearest neighbor (KNN) learning algorithm. The comparative study of the metaheuristics and their hybridization GPBD shows that the proposed approach achieved the best results. It was definitely competitive with other filter approaches proposed in the literature. It achieved a perfect accuracy score of 100% for Orl10P, Pix10P, and PIE10P datasets.


2018 ◽  
Vol 9 (2) ◽  
pp. 48-71 ◽  
Author(s):  
Khadidja Belattar ◽  
Sihem Mostefai ◽  
Amer Draa

Feature selection is an important pre-processing technique in the pattern recognition domain. This article proposes a hybridization between Genetic Algorithm (GA) and the Linear Discriminant Analysis (LDA) for solving the feature selection problem in Content-Based Image Retrieval (CBIR) applied to dermatological images. In the first step, we preprocess and segment the input image, then we derive color and texture features characterizing healthy skin and the segmented skin lesion. At this stage, a binary GA is used to evolve chromosome subsets whose fitness is evaluated by a Logistic Regression classifier. The optimal identified features are then used to feed LDA for a CBIR system, based on a K-Nearest Neighbor classification. To assess the proposed approach, the authors have opted for a K-fold cross validation method on a database of 1097 images of melanomas and other skin lesions. As a result, the authors obtained a reduced number of features and an improved CBDIR system compared to PCA, LDA and ICA methods.


2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.


2007 ◽  
Vol 2 (3) ◽  
pp. 299-308 ◽  
Author(s):  
Magdalene Marinaki ◽  
Yannis Marinakis ◽  
Michael Doumpos ◽  
Nikolaos Matsatsinis ◽  
Constantin Zopounidis

2015 ◽  
Vol 83 ◽  
pp. 81-91 ◽  
Author(s):  
Aiguo Wang ◽  
Ning An ◽  
Guilin Chen ◽  
Lian Li ◽  
Gil Alterovitz

Sign in / Sign up

Export Citation Format

Share Document