A Stable Instance Based Filter for Feature Selection in Small Sample Size Data Sets

Author(s):  
Afef Ben Brahim ◽  
Mohamed Limam
2012 ◽  
Vol 108 (1) ◽  
pp. 138-150 ◽  
Author(s):  
Martin Macaš ◽  
Lenka Lhotská ◽  
Eduard Bakstein ◽  
Daniel Novák ◽  
Jiří Wild ◽  
...  

2013 ◽  
Vol 25 (6) ◽  
pp. 1548-1584 ◽  
Author(s):  
Sascha Klement ◽  
Silke Anders ◽  
Thomas Martinetz

By minimizing the zero-norm of the separating hyperplane, the support feature machine (SFM) finds the smallest subspace (the least number of features) of a data set such that within this subspace, two classes are linearly separable without error. This way, the dimensionality of the data is more efficiently reduced than with support vector–based feature selection, which can be shown both theoretically and empirically. In this letter, we first provide a new formulation of the previously introduced concept of the SFM. With this new formulation, classification of unbalanced and nonseparable data is straightforward, which allows using the SFM for feature selection and classification in a large variety of different scenarios. To illustrate how the SFM can be used to identify both the smallest subset of discriminative features and the total number of informative features in biological data sets we apply repetitive feature selection based on the SFM to a functional magnetic resonance imaging data set. We suggest that these capabilities qualify the SFM as a universal method for feature selection, especially for high-dimensional small-sample-size data sets that often occur in biological and medical applications.


2020 ◽  
Author(s):  
Salem Alelyani

Abstract In the medical field, distinguishing genes that are relevant to a specific disease, let's say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The results of the selection stability and accuracy show the improvement in terms of both the stability and the accuracy with the bagging technique.


2016 ◽  
Vol 143 ◽  
pp. 127-142 ◽  
Author(s):  
Kai Dong ◽  
Herbert Pang ◽  
Tiejun Tong ◽  
Marc G. Genton

Sign in / Sign up

Export Citation Format

Share Document