Gene Selection from Microarray Data for Alzheimer's Disease Using Random Forest

Author(s):  
Kazutaka Nishiwaki ◽  
Katsutoshi Kanamori ◽  
Hayato Ohwada

A significant amount of microarray gene expression data is available on the Internet, and researchers are allowed to analyze such data freely. However, microarray data includes thousands of genes, and analysis using conventional techniques is too difficult. Therefore, selecting informative gene(s) from high-dimensional data is very important. In this study, the authors propose a gene selection method using random forest as a machine learning technique. They applied this method to microarray data on Alzheimer's disease and conducted an experiment to rank genes. The authors' results indicated some genes that have been investigated for their relevance to Alzheimer's disease, proving that their proposed cognitive method was successful in finding disease-related genes using microarray data.

2020 ◽  
pp. 1391-1404
Author(s):  
Kazutaka Nishiwaki ◽  
Katsutoshi Kanamori ◽  
Hayato Ohwada

A significant amount of microarray gene expression data is available on the Internet, and researchers are allowed to analyze such data freely. However, microarray data includes thousands of genes, and analysis using conventional techniques is too difficult. Therefore, selecting informative gene(s) from high-dimensional data is very important. In this study, the authors propose a gene selection method using random forest as a machine learning technique. They applied this method to microarray data on Alzheimer's disease and conducted an experiment to rank genes. The authors' results indicated some genes that have been investigated for their relevance to Alzheimer's disease, proving that their proposed cognitive method was successful in finding disease-related genes using microarray data.


2005 ◽  
Vol 15 (06) ◽  
pp. 475-484 ◽  
Author(s):  
FENG CHU ◽  
LIPO WANG

Microarray gene expression data usually have a large number of dimensions, e.g., over ten thousand genes, and a small number of samples, e.g., a few tens of patients. In this paper, we use the support vector machine (SVM) for cancer classification with microarray data. Dimensionality reduction methods, such as principal components analysis (PCA), class-separability measure, Fisher ratio, and t-test, are used for gene selection. A voting scheme is then employed to do multi-group classification by k(k - 1) binary SVMs. We are able to obtain the same classification accuracy but with much fewer features compared to other published results.


2019 ◽  
Vol 9 ◽  
Author(s):  
Yang Hu ◽  
Tianyi Zhao ◽  
Tianyi Zang ◽  
Ying Zhang ◽  
Liang Cheng

2014 ◽  
Vol 6 ◽  
pp. 115-125 ◽  
Author(s):  
A.V. Lebedev ◽  
E. Westman ◽  
G.J.P. Van Westen ◽  
M.G. Kramberger ◽  
A. Lundervold ◽  
...  

2019 ◽  
Vol 10 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Changrui Liao ◽  
Gin-Den Chen ◽  
Chi-Chang Chang

2020 ◽  
Author(s):  
Olivia M Bernstein ◽  
Joshua D. Grill ◽  
Daniel L. Gillen

Abstract Background: Early study exit is detrimental to statistical power and increases the risk for bias in Alzheimer’s disease clinical trials. Previous analyses in early phase academic trials demonstrated associations between rates of trial incompletion and participants’ study partner type, with participants enrolling with non-spouse study partners being at greater risk.Methods: We conducted secondary analyses of two multinational phase III trials of semagacestat, an oral gamma secretase inhibitor, for mild-to-moderate AD dementia. Cox’s proportional hazards regression model was used to estimate the relationship between study partner type and the risk of early exit from the trial after adjustment for a priori identified potential confounding factors. Additionally, we used a random forest model to identify top predictors of dropout.Results: Among participants with spousal, adult child, and other study partners, respectively, 35%, 38%, and 36% dropped out or died prior to protocol-defined study completion, respectively. In unadjusted models, the risk of trial incompletion differed by study partner type (unadjusted p-value=0.027 for test of differences by partner type), but in models adjusting for potential confounding factors the differences were not statistically significant (p-value=0.928). In exploratory modeling, participant age was identified as the primary characteristic to explain the relationship between study partner type and the risk of failing to complete the trial. Participant age was also the strongest predictor of trial incompletion in the random forest model.Conclusions: After adjustment for age, no qualitative differences in the risk of incompletion were observed when comparing participants with different study partner types in these trials. Differences between our findings and the findings of previous studies may be explained by differences in trial phase, size, geographic regions, or the composition of academic and non-academic sites.


Sign in / Sign up

Export Citation Format

Share Document