scholarly journals Genomic Machine Learning Model Predicts Radiation Therapy Benefit in Early-Stage Breast Cancer Patients with High Accuracy

Author(s):  
Kimberly Badal ◽  
Jerome E. Foster ◽  
Rajini Haraksingh ◽  
Melford John

Abstract BackgroundRadiation therapy (RT) is frequently recommended for post-surgery treatment of early-stage breast cancer (BC) patients, though not all benefit. Clinical factors currently guide RT treatment decisions. At present, models to predict RT-benefit predominantly use statistical methods with modest performance. In this paper we present a high-accuracy genomic Machine Learning (ML) model to predict RT-benefit in early-stage BC patients. We also present a novel method for selecting genomic features for training ML algorithms. MethodsGene expression data from 463 early-stage BC patients treated with surgery and RT from the METABRIC cohort were obtained. Wilcoxon Rank Sum (Wilcoxon RS) test and Cox Proportional Hazards (Cox PH) were used to reduce the number of genes used to train eight ML algorithms. ML algorithms were trained on 80% of data using 10-fold cross validation and tested on 20% of data to assess performance in predicting relapse status. Results Genome-wide gene expression data was reduced by 96% using Wilcoxon RS and Cox PH to a 1,596 gene set and a 977 gene set. These gene sets were used to train eight ML algorithms resulting in models that ranged in performance accuracies from 54.01% to 95.6%. Highest accuracies were obtained using Support Vector Machine (SVM977–93.41%, SVM1596–95.6%) and Neural Networks algorithms (NN977 – 92.31%, NN1596 – 93.41%). In RT-untreated patients, accuracies of all models were 30% to 40% lower compared to RT-treated patients. SVM977 had the highest sensitivity of 91.09%. Members of the 977 set were enriched with genes involved in cell cycle and differentiation as well as genes associated with radiosensitivity and radioresistance. Conclusion This study presents a novel genomic feature selection approach that used Wilcoxon RS followed by Cox PH to reduce the number of genes from genome-wide gene expression data used for training ML algorithms by 96%. This approach led to an SVM model that used the expression values of 977 genes to predict RT-benefit in early-stage BC patients with 93.41% accuracy. This work demonstrates that ML models can be clinically useful for predicting cancer patient outcomes.

2009 ◽  
Vol 15 (7) ◽  
pp. 1032-1038 ◽  
Author(s):  
Jrgen Olsen ◽  
Thomas A. Gerds ◽  
Jakob B. Seidelin ◽  
Claudio Csillag ◽  
Jacob T. Bjerrum ◽  
...  

2008 ◽  
Vol 9 (Suppl 2) ◽  
pp. S12 ◽  
Author(s):  
Margherita Mutarelli ◽  
Luigi Cicatiello ◽  
Lorenzo Ferraro ◽  
Olì MV Grober ◽  
Maria Ravo ◽  
...  

BMC Cancer ◽  
2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Yimei Jiang ◽  
Xiaowei Yan ◽  
Kun Liu ◽  
Yiqing Shi ◽  
Changgang Wang ◽  
...  

Abstract Background In recent years, the differences between left-sided colon cancer (LCC) and right-sided colon cancer (RCC) have received increasing attention due to the clinicopathological variation between them. However, some of these differences have remained unclear and conflicting results have been reported. Methods From The Cancer Genome Atlas (TCGA), we obtained RNA sequencing data and gene mutation data on 323 and 283 colon cancer patients, respectively. Differential analysis was firstly done on gene expression data and mutation data between LCC and RCC, separately. Machine learning (ML) methods were then used to select key genes or mutations as features to construct models to classify LCC and RCC patients. Finally, we conducted correlation analysis to identify the correlations between differentially expressed genes (DEGs) and mutations using logistic regression (LR) models. Results We found distinct gene mutation and expression patterns between LCC and RCC patients and further selected the 30 most important mutations and 17 most important gene expression features using ML methods. The classification models created using these features classified LCC and RCC patients with high accuracy (areas under the curve (AUC) of 0.8 and 0.96 for mutation and gene expression data, respectively). The expression of PRAC1 and BRAF V600E mutation (rs113488022) were the most important feature for each model. Correlations of mutations and gene expression data were also identified using LR models. Among them, rs113488022 was found to have significance relevance to the expression of four genes, and thus should be focused on in further study. Conclusions On the basis of ML methods, we found some key molecular differences between LCC and RCC, which could differentiate these two groups of patients with high accuracy. These differences might be key factors behind the variation in clinical features between LCC and RCC and thus help to improve treatment, such as determining the appropriate therapy for patients.


Oncotarget ◽  
2015 ◽  
Vol 7 (3) ◽  
pp. 3002-3017 ◽  
Author(s):  
Sandeep K. Singhal ◽  
Nawaid Usmani ◽  
Stefan Michiels ◽  
Otto Metzger-Filho ◽  
Kamal S. Saini ◽  
...  

2016 ◽  
Vol 23 (11) ◽  
pp. 2702-2712 ◽  
Author(s):  
Ivana Bozovic-Spasojevic ◽  
Dimitrios Zardavas ◽  
Sylvain Brohée ◽  
Lieveke Ameye ◽  
Debora Fumagalli ◽  
...  

PLoS ONE ◽  
2012 ◽  
Vol 7 (2) ◽  
pp. e32394 ◽  
Author(s):  
Morten Hansen ◽  
Thomas Alexander Gerds ◽  
Ole Haagen Nielsen ◽  
Jakob Benedict Seidelin ◽  
Jesper Thorvald Troelsen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document