A Stable Instance Based Filter for Feature Selection in Small Sample Size Data Sets

By minimizing the zero-norm of the separating hyperplane, the support feature machine (SFM) finds the smallest subspace (the least number of features) of a data set such that within this subspace, two classes are linearly separable without error. This way, the dimensionality of the data is more efficiently reduced than with support vector–based feature selection, which can be shown both theoretically and empirically. In this letter, we first provide a new formulation of the previously introduced concept of the SFM. With this new formulation, classification of unbalanced and nonseparable data is straightforward, which allows using the SFM for feature selection and classification in a large variety of different scenarios. To illustrate how the SFM can be used to identify both the smallest subset of discriminative features and the total number of informative features in biological data sets we apply repetitive feature selection based on the SFM to a functional magnetic resonance imaging data set. We suggest that these capabilities qualify the SFM as a universal method for feature selection, especially for high-dimensional small-sample-size data sets that often occur in biological and medical applications.

Download Full-text

Stable Bagging Feature Selection on Medical Data

10.21203/rs.3.rs-50237/v1 ◽

2020 ◽

Author(s):

Salem Alelyani

Keyword(s):

Feature Selection ◽

Sample Size ◽

Variance Reduction ◽

Small Sample Size ◽

Small Sample ◽

Domain Experts ◽

Complex Dimensions ◽

The Stability ◽

Stability And Accuracy ◽

Selection Algorithms

Abstract In the medical field, distinguishing genes that are relevant to a specific disease, let's say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The results of the selection stability and accuracy show the improvement in terms of both the stability and the accuracy with the bagging technique.

Download Full-text

Diagnosis of bladder cancers with small sample size via feature selection

Expert Systems with Applications ◽

10.1016/j.eswa.2010.09.135 ◽

2011 ◽

Vol 38 (4) ◽

pp. 4649-4654 ◽

Cited By ~ 17

Author(s):

T. Warren Liao

Keyword(s):

Feature Selection ◽

Sample Size ◽

Small Sample Size ◽

Small Sample

Download Full-text

Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data

Journal of Multivariate Analysis ◽

10.1016/j.jmva.2015.08.022 ◽

2016 ◽

Vol 143 ◽

pp. 127-142 ◽

Cited By ~ 7

Author(s):

Kai Dong ◽

Herbert Pang ◽

Tiejun Tong ◽

Marc G. Genton

Keyword(s):

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

High Dimensional ◽

Size Data

Download Full-text

A Stable Instance Based Filter for Feature Selection in Small Sample Size Data Sets

Wrapper feature selection for small sample size data driven by complete error estimates

DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Comparison of performance of exponential, Cox proportional hazards, weibull and frailty survival models for analysis of small sample size data

Probability-possibility transformation for small sample size data

Classifier for chinese traditional medicine with high-dimensional and small sample-size data

Empirical Analysis using Feature Selection and Bootstrap Data for Small Sample Size Problems

The Support Feature Machine: Classification with the Least Number of Features and Application to Neuroimaging Data

Stable Bagging Feature Selection on Medical Data

Diagnosis of bladder cancers with small sample size via feature selection

Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data

Export Citation Format