Statistical estimation of conditional Shannon entropy

The new estimates of the conditional Shannon entropy are introduced in the framework of the model describing a discrete response variable depending on a vector of d factors having a density w.r.t. the Lebesgue measure in ℝd. Namely, the mixed-pair model (X, Y ) is considered where X and Y take values in ℝd and an arbitrary finite set, respectively. Such models include, for instance, the famous logistic regression. In contrast to the well-known Kozachenko–Leonenko estimates of unconditional entropy the proposed estimates are constructed by means of the certain spacial order statistics (or k-nearest neighbor statistics where k = kn depends on amount of observations n) and a random number of i.i.d. observations contained in the balls of specified random radii. The asymptotic unbiasedness and L2-consistency of the new estimates are established under simple conditions. The obtained results can be applied to the feature selection problem which is important, e.g., for medical and biological investigations.

Download Full-text

A Novel Dynamic Hybridization Method for Best Feature Selection

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2021040106 ◽

2021 ◽

Vol 12 (2) ◽

pp. 85-99

Author(s):

Nassima Dif ◽

Zakaria Elberrichi

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Optimization Problems ◽

Learning Algorithm ◽

Accuracy Score ◽

Hybridization Method ◽

K Nearest Neighbor ◽

Feature Selection Problem ◽

Combinatorial Optimization Problems ◽

The Comparative Study

Hybrid metaheuristics has received a lot of attention lately to solve combinatorial optimization problems. The purpose of hybridization is to create a cooperation between metaheuristics for better solutions. Most proposed works were interested in static hybridization. The objective of this work is to propose a novel dynamic hybridization method (GPBD) that generates the most suitable sequential hybridization between GA, PSO, BAT, and DE metaheuristics, according to each problem. The authors choose to test this approach for solving the best feature selection problem in a wrapper tactic, performed on face image recognition datasets, with the k-nearest neighbor (KNN) learning algorithm. The comparative study of the metaheuristics and their hybridization GPBD shows that the proposed approach achieved the best results. It was definitely competitive with other filter approaches proposed in the literature. It achieved a perfect accuracy score of 100% for Orl10P, Pix10P, and PIE10P datasets.

Download Full-text

A Hybrid GA-LDA Scheme for Feature Selection in Content-Based Image Retrieval

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2018040103 ◽

2018 ◽

Vol 9 (2) ◽

pp. 48-71 ◽

Cited By ~ 3

Author(s):

Khadidja Belattar ◽

Sihem Mostefai ◽

Amer Draa

Keyword(s):

Feature Selection ◽

Image Retrieval ◽

Nearest Neighbor ◽

Skin Lesions ◽

Processing Technique ◽

Input Image ◽

Content Based Image Retrieval ◽

K Nearest Neighbor ◽

Feature Selection Problem ◽

Linear Discriminant

Feature selection is an important pre-processing technique in the pattern recognition domain. This article proposes a hybridization between Genetic Algorithm (GA) and the Linear Discriminant Analysis (LDA) for solving the feature selection problem in Content-Based Image Retrieval (CBIR) applied to dermatological images. In the first step, we preprocess and segment the input image, then we derive color and texture features characterizing healthy skin and the segmented skin lesion. At this stage, a binary GA is used to evolve chromosome subsets whose fitness is evaluated by a Logistic Regression classifier. The optimal identified features are then used to feed LDA for a CBIR system, based on a K-Nearest Neighbor classification. To assess the proposed approach, the authors have opted for a K-fold cross validation method on a database of 1097 images of melanomas and other skin lesions. As a result, the authors obtained a reduced number of features and an improved CBDIR system compared to PCA, LDA and ICA methods.

Download Full-text

Product Review Based Customer Sentiment Analysis using an Ensemble of mRMR and Forest Optimization Algorithm (FOA)

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2022010107 ◽

2022 ◽

Vol 13 (1) ◽

pp. 0-0

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Optimization Algorithm ◽

Nearest Neighbor ◽

Hybrid Approach ◽

Support Vector ◽

K Nearest Neighbor ◽

Feature Selection Technique ◽

Feature Selection Problem

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.

Download Full-text