A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy

2020 ◽

Vol 13 (2) ◽

pp. 223-238

Author(s):

Abhishek Dixit ◽

Ashish Mani ◽

Rohit Bansal

Keyword(s):

Feature Selection ◽

Differential Evolution ◽

Classification Accuracy ◽

High Dimensional Data ◽

High Dimensional ◽

Svm Classifier ◽

Text Data ◽

Data Set ◽

Content Type ◽

Mutation Strategy

PurposeFeature selection is an important step for data pre-processing specially in the case of high dimensional data set. Performance of the data model is reduced if the model is trained with high dimensional data set, and it results in poor classification accuracy. Therefore, before training the model an important step to apply is the feature selection on the dataset to improve the performance and classification accuracy.Design/methodology/approachA novel optimization approach that hybridizes binary particle swarm optimization (BPSO) and differential evolution (DE) for fine tuning of SVM classifier is presented. The name of the implemented classifier is given as DEPSOSVM.FindingsThis approach is evaluated using 20 UCI benchmark text data classification data set. Further, the performance of the proposed technique is also evaluated on UCI benchmark image data set of cancer images. From the results, it can be observed that the proposed DEPSOSVM techniques have significant improvement in performance over other algorithms in the literature for feature selection. The proposed technique shows better classification accuracy as well.Originality/valueThe proposed approach is different from the previous work, as in all the previous work DE/(rand/1) mutation strategy is used whereas in this study DE/(rand/2) is used and the mutation strategy with BPSO is updated. Another difference is on the crossover approach in our case as we have used a novel approach of comparing best particle with sigmoid function. The core contribution of this paper is to hybridize DE with BPSO combined with SVM classifier (DEPSOSVM) to handle the feature selection problems.

Download Full-text

A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data

Computational and Mathematical Methods in Medicine ◽

10.1155/2017/7907163 ◽

2017 ◽

Vol 2017 ◽

pp. 1-18 ◽

Cited By ~ 5

Author(s):

Andrea Bommert ◽

Jörg Rahnenführer ◽

Michel Lang

Keyword(s):

Feature Selection ◽

Predictive Model ◽

Predictive Accuracy ◽

Pearson Correlation ◽

High Dimensional Data ◽

High Dimensional ◽

Sparse Models ◽

Data Set ◽

The Stability ◽

Selection Of

Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.

Download Full-text

Ensemble feature selection for high dimensional data: a new method and a comparative study

Advances in Data Analysis and Classification ◽

10.1007/s11634-017-0285-y ◽

2017 ◽

Vol 12 (4) ◽

pp. 937-952 ◽

Cited By ~ 21

Author(s):

Afef Ben Brahim ◽

Mohamed Limam

Keyword(s):

Feature Selection ◽

Comparative Study ◽

High Dimensional Data ◽

New Method ◽

High Dimensional ◽

Selection For

Download Full-text

Feature Selection and Classification for High-Dimensional Incomplete Multimodal Data

Mathematical Problems in Engineering ◽

10.1155/2018/1583969 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Wan-Yu Deng ◽

Dan Liu ◽

Ying-Ying Dong

Keyword(s):

Feature Selection ◽

Data Fusion ◽

Classification Accuracy ◽

Missing Values ◽

High Dimensional Data ◽

Complete Data ◽

Experimental Results ◽

High Dimensional ◽

Multimodal Data ◽

Fusion Methods

Due to missing values, incomplete dataset is ubiquitous in multimodal scene. Complete data is a prerequisite of the most existing multimodality data fusion methods. For incomplete multimodal high-dimensional data, we propose a feature selection and classification method. Our method mainly focuses on extracting the most relevant features from the high-dimensional features and then improving the classification accuracy. The experimental results show that our method produces considerably better performance on incomplete multimodal data such as ADNI dataset and Office dataset, compared to the case of complete data.

Download Full-text

Feature Selection Techniques in High Dimensional Data With Machine Learning and Deep Learning

Advances in Data Mining and Database Management - Handbook of Research on Automated Feature Engineering and Advanced Applications in Data Science ◽

10.4018/978-1-7998-6659-6.ch002 ◽

2021 ◽

pp. 17-37

Author(s):

Bhanu Chander

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Deep Learning ◽

High Dimensional Data ◽

Complete Information ◽

High Dimensional ◽

Future Research ◽

Future Research Directions ◽

Class Labels ◽

Feature Selection Techniques

High-dimensional data inspection is one of the major disputes for researchers plus engineers in domains of deep learning (DL), machine learning (ML), as well as data mining. Feature selection (FS) endows with proficient manner to determine these difficulties through eradicating unrelated and outdated data, which be capable of reducing calculation time, progress learns precision, and smooth the progress of an enhanced understanding of the learning representation or information. To eradicate an inappropriate feature, an FS standard was essential, which can determine the significance of every feature in the company of the output class/labels. Filter schemes employ variable status procedure as the standard criterion for variable collection by means of ordering. Ranking schemes utilized since their straightforwardness and high-quality accomplishment are detailed for handy appliances. The goal of this chapter is to produce complete information on FS approaches, its applications, and future research directions.

Download Full-text

Feature Selection using Genetic Algorithm for Clustering high Dimensional Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.11.11001 ◽

2018 ◽

Vol 7 (2.11) ◽

pp. 27 ◽

Cited By ~ 1

Author(s):

Kahkashan Kouser ◽

Amrita Priyam

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Clustering Algorithm ◽

High Dimensional Data ◽

Feature Space ◽

High Dimensional ◽

Feature Subset ◽

Data Set ◽

Optimal Feature Subset ◽

Optimal Feature

One of the open problems of modern data mining is clustering high dimensional data. For this in the paper a new technique called GA-HDClustering is proposed, which works in two steps. First a GA-based feature selection algorithm is designed to determine the optimal feature subset; an optimal feature subset is consisting of important features of the entire data set next, a K-means algorithm is applied using the optimal feature subset to find the clusters. On the other hand, traditional K-means algorithm is applied on the full dimensional feature space. Finally, the result of GA-HDClustering is compared with the traditional clustering algorithm. For comparison different validity matrices such as Sum of squared error (SSE), Within Group average distance (WGAD), Between group distance (BGD), Davies-Bouldin index(DBI), are used .The GA-HDClustering uses genetic algorithm for searching an effective feature subspace in a large feature space. This large feature space is made of all dimensions of the data set. The experiment performed on the standard data set revealed that the GA-HDClustering is superior to traditional clustering algorithm.

Download Full-text

Analysis of Feature Selection Algorithms and a Comparative study on Heterogeneous Classifier for High Dimensional Data survey

International Journal of Engineering Trends and Technology ◽

10.14445/22315381/ijett-v53p211 ◽

2017 ◽

Vol 53 (2) ◽

pp. 59-63

Author(s):

Kassahun Azezew Ayidagn ◽

◽

Shilpa Gite

Keyword(s):

Feature Selection ◽

Comparative Study ◽

High Dimensional Data ◽

High Dimensional ◽

Data Survey ◽

Selection Algorithms

Download Full-text

BagMeLiF: stable boosting-based hybrid-ensemble feature selection algorithm for high-dimensional data

2020 International Conference on Control, Robotics and Intelligent System ◽

10.1145/3437802.3437835 ◽

2020 ◽

Author(s):

Nikita Pilnenskiy ◽

Ivan Smetannikov

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

CLASSIFICATION OF HIGH-DIMENSIONAL MICROARRAY DATA WITH A TWO-STEP PROCEDURE VIA A WILCOXON CRITERION AND MULTILAYER PERCEPTRON

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026811002969 ◽

2011 ◽

Vol 10 (01) ◽

pp. 1-14

Author(s):

VLADIMIR NIKULIN ◽

TIAN-HSIANG HUANG ◽

GEOFFREY J. MCLACHLAN

Keyword(s):

Data Mining ◽

Feature Selection ◽

High Dimensional ◽

Second Step ◽

Support Vector ◽

Step Procedure ◽

Leave One Out ◽

Natural Combination ◽

Feature Selection Techniques

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.

Download Full-text

On fuzzy feature selection in designing fuzzy classifiers for high-dimensional data

Evolving Systems ◽

10.1007/s12530-015-9142-4 ◽

2015 ◽

Vol 7 (4) ◽

pp. 255-265 ◽

Cited By ~ 6

Author(s):

Eghbal G. Mansoori ◽

Khadijeh S. Shafiee

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Fuzzy Classifiers ◽

Fuzzy Feature Selection

Download Full-text

A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy

DEPSOSVM: variant of differential evolution based on PSO for image and text data classification

A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data

Ensemble feature selection for high dimensional data: a new method and a comparative study

Feature Selection and Classification for High-Dimensional Incomplete Multimodal Data

Feature Selection Techniques in High Dimensional Data With Machine Learning and Deep Learning

Feature Selection using Genetic Algorithm for Clustering high Dimensional Data

Analysis of Feature Selection Algorithms and a Comparative study on Heterogeneous Classifier for High Dimensional Data survey

BagMeLiF: stable boosting-based hybrid-ensemble feature selection algorithm for high-dimensional data

CLASSIFICATION OF HIGH-DIMENSIONAL MICROARRAY DATA WITH A TWO-STEP PROCEDURE VIA A WILCOXON CRITERION AND MULTILAYER PERCEPTRON

On fuzzy feature selection in designing fuzzy classifiers for high-dimensional data

Export Citation Format