Feature selection algorithms in classification problems: an experimental evaluation

Feature selection is a multi-objective problem with the two main conflicting objectives of minimising the number of features and maximising the classification performance. However, most existing feature selection algorithms are single objective and do not appropriately reflect the actual need. There are a small number of multi-objective feature selection algorithms, which are wrapper based and accordingly are computationally expensive and less general than filter algorithms. Evolutionary computation techniques are particularly suitable for multi-objective optimisation because they use a population of candidate solutions and are able to find multiple non-dominated solutions in a single run. However, the two well-known evolutionary multi-objective algorithms, non-dominated sorting based multi-objective genetic algorithm II (NSGAII) and strength Pareto evolutionary algorithm 2 (SPEA2) have not been applied to filter based feature selection. In this work, based on NSGAII and SPEA2, we develop two multi-objective, filter based feature selection frameworks. Four multi-objective feature selection methods are then developed by applying mutual information and entropy as two different filter evaluation criteria in each of the two proposed frameworks. The proposed multi-objective algorithms are examined and compared with a single objective method and three traditional methods (two filters and one wrapper) on eight benchmark datasets. A decision tree is employed to test the classification performance. Experimental results show that the proposed multi-objective algorithms can automatically evolve a set of non-dominated solutions that include a smaller number of features and achieve better classification performance than using all features. NSGAII and SPEA2 outperform the single objective algorithm, the two traditional filter algorithms and even the traditional wrapper algorithm in terms of both the number of features and the classification performance in most cases. NSGAII achieves similar performance to SPEA2 for the datasets that consist of a small number of features and slightly better results when the number of features is large. This work represents the first study on NSGAII and SPEA2 for filter feature selection in classification problems with both providing field leading classification performance.

Download Full-text

Feature Selection Algorithm Using Relative Odds for Data Mining Classification

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch005 ◽

2020 ◽

pp. 81-106 ◽

Cited By ~ 3

Author(s):

Donald Douglas Atsa'am

Keyword(s):

Feature Selection ◽

Binary Classification ◽

Initial Step ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Classification Problems ◽

Odds Ratios ◽

Relative Odds ◽

Importance Ranking ◽

Selection Algorithms

A filter feature selection algorithm is developed and its performance tested. In the initial step, the algorithm dichotomizes the dataset then separately computes the association between each predictor and the class variable using relative odds (odds ratios). The value of the odds ratios becomes the importance ranking of the corresponding explanatory variable in determining the output. Logistic regression classification is deployed to test the performance of the new algorithm in comparison with three existing feature selection algorithms: the Fisher index, Pearson's correlation, and the varImp function. A number of experimental datasets are employed, and in most cases, the subsets selected by the new algorithm produced models with higher classification accuracy than the subsets suggested by the existing feature selection algorithms. Therefore, the proposed algorithm is a reliable alternative in filter feature selection for binary classification problems.

Download Full-text

Feature selection algorithms: a survey and experimental evaluation

2002 IEEE International Conference on Data Mining, 2002. Proceedings. ◽

10.1109/icdm.2002.1183917 ◽

2003 ◽

Cited By ~ 193

Author(s):

L.C. Molina ◽

L. Belanche ◽

A. Nebot

Keyword(s):

Feature Selection ◽

Experimental Evaluation ◽

Selection Algorithms

Download Full-text

Fusion Approaches of Feature Selection Algorithms for Classification Problems

2016 5th Brazilian Conference on Intelligent Systems (BRACIS) ◽

10.1109/bracis.2016.075 ◽

2016 ◽

Cited By ~ 2

Author(s):

Jhoseph Jesus ◽

Daniel Araujo ◽

Anne Canuto

Keyword(s):

Feature Selection ◽

Classification Problems ◽

Selection Algorithms

Download Full-text

An Enhancement of Feature Selection Algorithm for EDM: A Review

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i5.661 ◽

2018 ◽

Vol 8 (5) ◽

pp. 29

Author(s):

Manpreet Kaur ◽

Chamkaur Singh

Keyword(s):

Feature Selection ◽

Educational Data Mining ◽

Problem Formulation ◽

Research Area ◽

Education Quality ◽

Educational Institutions ◽

Selection Algorithm ◽

Positive Role ◽

Data Set ◽

Selection Algorithms

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.

Download Full-text

Empowering Simultaneous Feature and Instance Selection in Classification Problems through the Adaptation of Two Selection Algorithms

2010 Ninth International Conference on Machine Learning and Applications ◽

10.1109/icmla.2010.121 ◽

2010 ◽

Author(s):

Rafael Augusto Ferreira do Carmo ◽

Fabricio Gomes de Freitas ◽

Jerffeson Teixeira de Souza

Keyword(s):

Instance Selection ◽

Classification Problems ◽

Selection Algorithms

Download Full-text

A lazy feature selection method for multi-label classification

Intelligent Data Analysis ◽

10.3233/ida-194878 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-34

Author(s):

Rafael B. Pereira ◽

Alexandre Plastino ◽

Bianca Zadrozny ◽

Luiz H.C. Merschmann

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Video Classification ◽

Classification Problems ◽

Class Label ◽

New Feature ◽

Feature Selection Techniques ◽

Biomolecular Analysis

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text

A robust SVM-based approach with feature selection and outliers detection for classification problems

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115017 ◽

2021 ◽

pp. 115017

Author(s):

Marta Baldomero-Naranjo ◽

Luisa I. Martínez-Merino ◽

Antonio M. Rodríguez-Chía

Keyword(s):

Feature Selection ◽

Classification Problems ◽

Outliers Detection

Download Full-text

Comparison of Feature Selection Algorithms for Minimization of Target Specific FFQs

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378246 ◽

2020 ◽

Author(s):

Nina Rescic ◽

Tome Eftimov ◽

Barbara Korousic Seljak

Keyword(s):

Feature Selection ◽

Selection Algorithms

Download Full-text

An Ensemble Feature Selection Approach to Identify Relevant Features from EEG Signals

Applied Sciences ◽

10.3390/app11156983 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6983

Author(s):

Maritza Mera-Gaona ◽

Diego M. López ◽

Rubiel Vargas-Canas

Keyword(s):

Feature Selection ◽

Sensitivity And Specificity ◽

Eeg Signals ◽

Specificity And Sensitivity ◽

Selection Approach ◽

Feature Selection Approach ◽

The Stability ◽

Selection Algorithms ◽

Epileptiform Events

Identifying relevant data to support the automatic analysis of electroencephalograms (EEG) has become a challenge. Although there are many proposals to support the diagnosis of neurological pathologies, the current challenge is to improve the reliability of the tools to classify or detect abnormalities. In this study, we used an ensemble feature selection approach to integrate the advantages of several feature selection algorithms to improve the identification of the characteristics with high power of differentiation in the classification of normal and abnormal EEG signals. Discrimination was evaluated using several classifiers, i.e., decision tree, logistic regression, random forest, and Support Vecctor Machine (SVM); furthermore, performance was assessed by accuracy, specificity, and sensitivity metrics. The evaluation results showed that Ensemble Feature Selection (EFS) is a helpful tool to select relevant features from the EEGs. Thus, the stability calculated for the EFS method proposed was almost perfect in most of the cases evaluated. Moreover, the assessed classifiers evidenced that the models improved in performance when trained with the EFS approach’s features. In addition, the classifier of epileptiform events built using the features selected by the EFS method achieved an accuracy, sensitivity, and specificity of 97.64%, 96.78%, and 97.95%, respectively; finally, the stability of the EFS method evidenced a reliable subset of relevant features. Moreover, the accuracy, sensitivity, and specificity of the EEG detector are equal to or greater than the values reported in the literature.

Download Full-text