A Jackknife and Voting Classifier Approach to Feature Selection and Classification

With technological advances now allowing measurement of thousands of genes, proteins and metabolites, researchers are using this information to develop diagnostic and prognostic tests and discern the biological pathways underlying diseases. Often, an investigator's objective is to develop a classification rule to predict group membership of unknown samples based on a small set of features and that could ultimately be used in a clinical setting. While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features. We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting. We first use a jackknife procedure to identify important features and then, for classification, we use voting classifiers which are simple and easy to implement. We compared our method to random forest and support vector machines using three benchmark cancer ‘omics datasets with different characteristics. We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy. Further, the jackknife procedure yielded stable feature sets. Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.

Download Full-text

FS/spl I.bar/SFS: a novel feature selection method for support vector machines

2004 IEEE International Conference on Acoustics, Speech, and Signal Processing ◽

10.1109/icassp.2004.1327231 ◽

2004 ◽

Author(s):

Yi Liu ◽

Y.F. Zheng

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Vector Machines

Download Full-text

An innovative feature selection method for support vector machines and its test on the estimation of the credit risk of default

Review of Financial Economics ◽

10.1002/rfe.1049 ◽

2018 ◽

Vol 37 (3) ◽

pp. 404-427 ◽

Cited By ~ 1

Author(s):

Eduard Sariev ◽

Guido Germano

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Credit Risk ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Vector Machines

Download Full-text

A Texture-based Classification Method for Proteins in Two-Dimensional Electrophoresis Gel Images - A Feature Selection Method using Support Vector Machines and Genetic Algorithms

Proceedings of the International Conference on Computer Vision Theory and Applications ◽

10.5220/0004208704010404 ◽

2013 ◽

Keyword(s):

Genetic Algorithms ◽

Feature Selection ◽

Support Vector Machines ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Two Dimensional ◽

Two Dimensional Electrophoresis ◽

Vector Machines ◽

Dimensional Electrophoresis

Download Full-text

Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification

The Scientific World JOURNAL ◽

10.1155/2013/419187 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 44

Author(s):

Mustafa Serter Uzer ◽

Nihat Yilmaz ◽

Onur Inan

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Artificial Bee Colony ◽

Hybrid Approach ◽

Feature Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Bee Colony ◽

Vector Machines

This paper offers a hybrid approach that uses the artificial bee colony (ABC) algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.

Download Full-text