Bayesian clinical classification from high-dimensional data: Signatures versus variability

2016 ◽  
Vol 27 (2) ◽  
pp. 336-351 ◽  
Author(s):  
Akram Shalabi ◽  
Masato Inoue ◽  
Johnathan Watkins ◽  
Emanuele De Rinaldis ◽  
Anthony CC Coolen

When data exhibit imbalance between a large number d of covariates and a small number n of samples, clinical outcome prediction is impaired by overfitting and prohibitive computation demands. Here we study two simple Bayesian prediction protocols that can be applied to data of any dimension and any number of outcome classes. Calculating Bayesian integrals and optimal hyperparameters analytically leaves only a small number of numerical integrations, and CPU demands scale as O(nd). We compare their performance on synthetic and genomic data to the mclustDA method of Fraley and Raftery. For small d they perform as well as mclustDA or better. For d = 10,000 or more mclustDA breaks down computationally, while the Bayesian methods remain efficient. This allows us to explore phenomena typical of classification in high-dimensional spaces, such as overfitting and the reduced discriminative effectiveness of signatures compared to intra-class variability.

2014 ◽  
Vol 22 (1) ◽  
pp. 109-120 ◽  
Author(s):  
Dokyoon Kim ◽  
Je-Gun Joung ◽  
Kyung-Ah Sohn ◽  
Hyunjung Shin ◽  
Yu Rang Park ◽  
...  

Abstract Objective Cancer can involve gene dysregulation via multiple mechanisms, so no single level of genomic data fully elucidates tumor behavior due to the presence of numerous genomic variations within or between levels in a biological system. We have previously proposed a graph-based integration approach that combines multi-omics data including copy number alteration, methylation, miRNA, and gene expression data for predicting clinical outcome in cancer. However, genomic features likely interact with other genomic features in complex signaling or regulatory networks, since cancer is caused by alterations in pathways or complete processes. Methods Here we propose a new graph-based framework for integrating multi-omics data and genomic knowledge to improve power in predicting clinical outcomes and elucidate interplay between different levels. To highlight the validity of our proposed framework, we used an ovarian cancer dataset from The Cancer Genome Atlas for predicting stage, grade, and survival outcomes. Results Integrating multi-omics data with genomic knowledge to construct pre-defined features resulted in higher performance in clinical outcome prediction and higher stability. For the grade outcome, the model with gene expression data produced an area under the receiver operating characteristic curve (AUC) of 0.7866. However, models of the integration with pathway, Gene Ontology, chromosomal gene set, and motif gene set consistently outperformed the model with genomic data only, attaining AUCs of 0.7873, 0.8433, 0.8254, and 0.8179, respectively. Conclusions Integrating multi-omics data and genomic knowledge to improve understanding of molecular pathogenesis and underlying biology in cancer should improve diagnostic and prognostic indicators and the effectiveness of therapies.


Methods ◽  
2014 ◽  
Vol 67 (3) ◽  
pp. 344-353 ◽  
Author(s):  
Dokyoon Kim ◽  
Hyunjung Shin ◽  
Kyung-Ah Sohn ◽  
Anurag Verma ◽  
Marylyn D. Ritchie ◽  
...  

PLoS ONE ◽  
2018 ◽  
Vol 13 (11) ◽  
pp. e0207001 ◽  
Author(s):  
Kang-Yi Su ◽  
Jeng-Sen Tseng ◽  
Keng-Mao Liao ◽  
Tsung-Ying Yang ◽  
Kun-Chieh Chen ◽  
...  

2022 ◽  
Vol 123 ◽  
pp. 102230
Author(s):  
Shuchao Pang ◽  
Matthew Field ◽  
Jason Dowling ◽  
Shalini Vinod ◽  
Lois Holloway ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document