scholarly journals SVM-BT-RFE: An improved gene selection framework using Bayesian T-test embedded in support vector machine (recursive feature elimination) algorithm

2015 ◽  
Vol 1 (2) ◽  
pp. 86-96 ◽  
Author(s):  
Shruti Mishra ◽  
Debahuti Mishra
Author(s):  
JUANA CANUL-REICH ◽  
LAWRENCE O. HALL ◽  
DMITRY B. GOLDGOF ◽  
JOHN N. KORECKI ◽  
STEVEN ESCHRICH

Gene-expression microarray datasets often consist of a limited number of samples with a large number of gene-expression measurements, usually on the order of thousands. Therefore, dimensionality reduction is critical prior to any classification task. In this work, the iterative feature perturbation method (IFP), an embedded gene selector, is introduced and applied to four microarray cancer datasets: colon cancer, leukemia, Moffitt colon cancer, and lung cancer. We compare results obtained by IFP to those of support vector machine-recursive feature elimination (SVM-RFE) and the t-test as a feature filter using a linear support vector machine as the base classifier. Analysis of the intersection of gene sets selected by the three methods across the four datasets was done. Additional experiments included an initial pre-selection of the top 200 genes based on their p values. IFP and SVM-RFE were then applied on the reduced feature sets. These results showed up to 3.32% average performance improvement for IFP across the four datasets. A statistical analysis (using the Friedman/Holm test) for both scenarios showed the highest accuracies came from the t-test as a filter on experiments without gene pre-selection. IFP and SVM-RFE had greater classification accuracy after gene pre-selection. Analysis showed the t-test is a good gene selector for microarray data. IFP and SVM-RFE showed performance improvement on a reduced by t-test dataset. The IFP approach resulted in comparable or superior average class accuracy when compared to SVM-RFE on three of the four datasets. The same or similar accuracies can be obtained with different sets of genes.


2012 ◽  
Vol 2012 ◽  
pp. 1-12 ◽  
Author(s):  
Chen-An Tsai ◽  
Chien-Hsun Huang ◽  
Ching-Wei Chang ◽  
Chun-Houh Chen

The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use oft-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.


Molecules ◽  
2019 ◽  
Vol 24 (12) ◽  
pp. 2220 ◽  
Author(s):  
Csaba Váradi ◽  
Károly Nehéz ◽  
Olivér Hornyák ◽  
Béla Viskolcz ◽  
Jonathan Bones

In this study, we present the application of a novel capillary electrophoresis (CE) method in combination with label-free quantitation and support vector machine-based feature selection (support vector machine-estimated recursive feature elimination or SVM-RFE) to identify potential glycan alterations in Parkinson’s disease. Specific focus was placed on the use of neutral coated capillaries, by a dynamic capillary coating strategy, to ensure stable and repeatable separations without the need of non-mass spectrometry (MS) friendly additives within the separation electrolyte. The developed online dynamic coating strategy was applied to identify serum N-glycosylation by CE-MS/MS in combination with exoglycosidase sequencing. The annotated structures were quantified in 15 controls and 15 Parkinson’s disease patients by label-free quantitation. Lower sialylation and increased fucosylation were found in Parkinson’s disease patients on tri-antennary glycans with 2 and 3 terminal sialic acids. The set of potential glycan alterations was narrowed by a recursive feature elimination algorithm resulting in the efficient classification of male patients.


2005 ◽  
Vol 2005 (2) ◽  
pp. 160-171 ◽  
Author(s):  
Yong Mao ◽  
Xiaobo Zhou ◽  
Daoying Pi ◽  
Youxian Sun ◽  
Stephen T. C. Wong

We investigate the problems of multiclass cancer classification with gene selection from gene expression data. Two different constructed multiclass classifiers with gene selection are proposed, which are fuzzy support vector machine (FSVM) with gene selection and binary classification tree based on SVM with gene selection. Using F test and recursive feature elimination based on SVM as gene selection methods, binary classification tree based on SVM with F test, binary classification tree based on SVM with recursive feature elimination based on SVM, and FSVM with recursive feature elimination based on SVM are tested in our experiments. To accelerate computation, preselecting the strongest genes is also used. The proposed techniques are applied to analyze breast cancer data, small round blue-cell tumors, and acute leukemia data. Compared to existing multiclass cancer classifiers and binary classification tree based on SVM with F test or binary classification tree based on SVM with recursive feature elimination based on SVM mentioned in this paper, FSVM based on recursive feature elimination based on SVM can find most important genes that affect certain types of cancer with high recognition accuracy.


2017 ◽  
Vol 48 (3) ◽  
pp. 594-607 ◽  
Author(s):  
Xiaojuan Huang ◽  
Li Zhang ◽  
Bangjun Wang ◽  
Fanzhang Li ◽  
Zhao Zhang

Electronics ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 647
Author(s):  
Kathiravan Srinivasan ◽  
Nivedhitha Mahendran ◽  
Durai Raj Vincent ◽  
Chuan-Yu Chang ◽  
Shabbir Syed-Abdul

Unipolar depression (UD), also referred to as clinical depression, appears to be a widespread mental disorder around the world. Further, this is a vital state related to a person’s health that influences his/her daily routine. Besides, this state also influences the person’s frame of mind, behavior, and several body functionalities like sleep, appetite, and also it can cause a scenario where a person could harm himself/herself or others. In several cases, it becomes an arduous task to detect UD, since, it is a state of comorbidity. For that reason, this research proposes a more convenient approach for the physicians to detect the state of clinical depression at an initial phase using an integrated multistage support vector machine model. Initially, the dataset is preprocessed using multiple imputation by chained equations (MICE) technique. Then, for selecting the appropriate features, the support vector machine-based recursive feature elimination (SVM RFE) is deployed. Subsequently, the integrated multistage support vector machine classifier is built by employing the bagging random sampling technique. Finally, the experimental outcomes indicate that the proposed integrated multistage support vector machine model surpasses methods such as logistic regression, multilayer perceptron, random forest, and bagging SVM (majority voting), in terms of overall performance.


Sign in / Sign up

Export Citation Format

Share Document