scholarly journals Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information

2016 ◽  
Vol 2016 ◽  
pp. 1-16 ◽  
Author(s):  
Atiyeh Mortazavi ◽  
Mohammad Hossein Moattar

High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

2006 ◽  
Vol 2 ◽  
pp. 117693510600200 ◽  
Author(s):  
Jing Wang ◽  
Kim Anh Do ◽  
Sijin Wen ◽  
Spyros Tsavachidis ◽  
Timothy J. Mcdonnell ◽  
...  

Motivation Individual microarray studies searching for prognostic biomarkers often have few samples and low statistical power; however, publicly accessible data sets make it possible to combine data across studies. Method We present a novel approach for combining microarray data across institutions and platforms. We introduce a new algorithm, robust greedy feature selection (RGFS), to select predictive genes. Results We combined two prostate cancer microarray data sets, confirmed the appropriateness of the approach with the Kolmogorov-Smirnov goodness-of-fit test, and built several predictive models. The best logistic regression model with stepwise forward selection used 7 genes and had a misclassification rate of 31%. Models that combined LDA with different feature selection algorithms had misclassification rates between 19% and 33%, and the sets of genes in the models varied substantially during cross-validation. When we combined RGFS with LDA, the best model used two genes and had a misclassification rate of 15%. Availability Affymetrix U95Av2 array data are available at http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi . The cDNA microarray data are available through the Stanford Microarray Database ( http://cmgm.stanford.edu/pbrown/ ). GeneLink software is freely available at http://bioinformatics.mdanderson.org/GeneLink/ . DNA-Chip Analyzer software is publicly available at http://biosun1.harvard.edu/complab/dchip/ .


2013 ◽  
Vol 11 (03) ◽  
pp. 1341006
Author(s):  
QIANG LOU ◽  
ZORAN OBRADOVIC

In order to more accurately predict an individual's health status, in clinical applications it is often important to perform analysis of high-dimensional gene expression data that varies with time. A major challenge in predicting from such temporal microarray data is that the number of biomarkers used as features is typically much larger than the number of labeled subjects. One way to address this challenge is to perform feature selection as a preprocessing step and then apply a classification method on selected features. However, traditional feature selection methods cannot handle multivariate temporal data without applying techniques that flatten temporal data into a single matrix in advance. In this study, a feature selection filter that can directly select informative features from temporal gene expression data is proposed. In our approach, we measure the distance between multivariate temporal data from two subjects. Based on this distance, we define the objective function of temporal margin based feature selection to maximize each subject's temporal margin in its own relevant subspace. The experimental results on synthetic and two real flu data sets provide evidence that our method outperforms the alternatives, which flatten the temporal data in advance.


Author(s):  
Srinivas Kolli Et. al.

Clustering is the most complex in multi/high dimensional data because of sub feature selection from overall features present in categorical data sources. Sub set feature be the aggressive approach to decrease feature dimensionality in mining of data, identification of patterns. Main aim behind selection of feature with respect to selection of optimal feature and decrease the redundancy. In-order to compute with redundant/irrelevant features in high dimensional sample data exploration based on feature selection calculation with data granular described in this document. Propose aNovel Granular Feature Multi-variant Clustering based Genetic Algorithm (NGFMCGA) model to evaluate the performance results in this implementation. This model main consists two phases, in first phase, based on theoretic graph grouping procedure divide features into different clusters, in second phase, select strongly  representative related feature from each cluster with respect to matching of subset of features. Features present in this concept are independent because of features select from different clusters, proposed approach clustering have high probability in processing and increasing the quality of independent and useful features.Optimal subset feature selection improves accuracy of clustering and feature classification, performance of proposed approach describes better accuracy with respect to optimal subset selection is applied on publicly related data sets and it is compared with traditional supervised evolutionary approaches


2005 ◽  
Vol 63 ◽  
pp. 325-343 ◽  
Author(s):  
D. Huang ◽  
Tommy W.S. Chow

2007 ◽  
Vol 19 (7) ◽  
pp. 1939-1961 ◽  
Author(s):  
Shay Cohen ◽  
Gideon Dror ◽  
Eytan Ruppin

We present and study the contribution-selection algorithm (CSA), a novel algorithm for feature selection. The algorithm is based on the multiperturbation shapley analysis (MSA), a framework that relies on game theory to estimate usefulness. The algorithm iteratively estimates the usefulness of features and selects them accordingly, using either forward selection or backward elimination. It can optimize various performance measures over unseen data such as accuracy, balanced error rate, and area under receiver-operator-characteristic curve. Empirical comparison with several other existing feature selection methods shows that the backward elimination variant of CSA leads to the most accurate classification results on an array of data sets.


2014 ◽  
Vol 24 (06) ◽  
pp. 1450021 ◽  
Author(s):  
RUDRASIS CHAKRABORTY ◽  
CHIN-TENG LIN ◽  
NIKHIL R. PAL

For many applications, to reduce the processing time and the cost of decision making, we need to reduce the number of sensors, where each sensor produces a set of features. This sensor selection problem is a generalized feature selection problem. Here, we first present a sensor (group-feature) selection scheme based on Multi-Layered Perceptron Networks. This scheme sometimes selects redundant groups of features. So, we propose a selection scheme which can control the level of redundancy between the selected groups. The idea is general and can be used with any learning scheme. We have demonstrated the effectiveness of our scheme on several data sets. In this context, we define different measures of sensor dependency (dependency between groups of features). We have also presented an alternative learning scheme which is more effective than our old scheme. The proposed scheme is also adapted to radial basis function (RBS) network. The advantages of our scheme are threefold. It looks at all the groups together and hence can exploit nonlinear interaction between groups, if any. Our scheme can simultaneously select useful groups as well as learn the underlying system. The level of redundancy among groups can also be controlled.


Sign in / Sign up

Export Citation Format

Share Document