Improving survival prediction of high-grade glioma via machine learning techniques based on MRI radiomic, genetic and clinical risk factors

2019 ◽  
Vol 120 ◽  
pp. 108609 ◽  
Author(s):  
Yan Tan ◽  
Wei Mu ◽  
Xiao-chun Wang ◽  
Guo-qiang Yang ◽  
Robert James Gillies ◽  
...  
2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 897.2-897
Author(s):  
M. Maurits ◽  
T. Huizinga ◽  
M. Reinders ◽  
S. Raychaudhuri ◽  
E. Karlson ◽  
...  

Background:Heterogeneity in disease populations complicates discovery of risk factors. To identify risk factors for subpopulations of diseases, we need analytical methods that can deal with unidentified disease subgroups.Objectives:Inspired by successful approaches from the Big Data field, we developed a high-throughput approach to identify subpopulations within patients with heterogeneous, complex diseases using the wealth of information available in Electronic Medical Records (EMRs).Methods:We extracted longitudinal healthcare-interaction records coded by 1,853 PheCodes[1] of the 64,819 patients from the Boston’s Partners-Biobank. Through dimensionality reduction using t-SNE[2] we created a 2D embedding of 32,424 of these patients (set A). We then identified distinct clusters post-t-SNE using DBscan[3] and visualized the relative importance of individual PheCodes within them using specialized spectrographs. We replicated this procedure in the remaining 32,395 records (set B).Results:Summary statistics of both sets were comparable (Table 1).Table 1.Summary statistics of the total Partners Biobank dataset and the 2 partitions.Set-Aset-BTotalEntries12,200,31112,177,13124,377,442Patients32,42432,39564,819Patientyears369,546.33368,597.92738,144.2unique ICD codes25,05624,95326,305unique Phecodes1,8511,8531,853We found 284 clusters in set A and 295 in set B, of which 63.4% from set A could be mapped to a cluster in set B with a median (range) correlation of 0.24 (0.03 – 0.58).Clusters represented similar yet distinct clinical phenotypes; e.g. patients diagnosed with “other headache syndrome” were separated into four distinct clusters characterized by migraines, neurofibromatosis, epilepsy or brain cancer, all resulting in patients presenting with headaches (Fig. 1 & 2). Though EMR databases tend to be noisy, our method was also able to differentiate misclassification from true cases; SLE patients with RA codes clustered separately from true RA cases.Figure 1.Two dimensional representation of Set A generated using dimensionality reduction (tSNE) and clustering (DBScan).Figure 2.Phenotype Spectrographs (PheSpecs) of four clusters characterized by “Other headache syndromes”, driven by codes relating to migraine, epilepsy, neurofibromatosis or brain cancer.Conclusion:We have shown that EMR data can be used to identify and visualize latent structure in patient categorizations, using an approach based on dimension reduction and clustering machine learning techniques. Our method can identify misclassified patients as well as separate patients with similar problems into subsets with different associated medical problems. Our approach adds a new and powerful tool to aid in the discovery of novel risk factors in complex, heterogeneous diseases.References:[1] Denny, J.C. et al. Bioinformatics (2010)[2]van der Maaten et al. Journal of Machine Learning Research (2008)[3] Ester, M. et al. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. (1996)Disclosure of Interests:Marc Maurits: None declared, Thomas Huizinga Grant/research support from: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Consultant of: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Marcel Reinders: None declared, Soumya Raychaudhuri: None declared, Elizabeth Karlson: None declared, Erik van den Akker: None declared, Rachel Knevel: None declared


Author(s):  
John J. Squiers ◽  
Jeffrey E. Thatcher ◽  
David Bastawros ◽  
Andrew J. Applewhite ◽  
Ronald D. Baxter ◽  
...  

2020 ◽  
Author(s):  
Georgios Kantidakis ◽  
Hein Putter ◽  
Carlo Lancia ◽  
Jacob de Boer ◽  
Andries E Braat ◽  
...  

Abstract Background: Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.Methods: In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.Results: Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.Conclusion: In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Georgios Kantidakis ◽  
Hein Putter ◽  
Carlo Lancia ◽  
Jacob de Boer ◽  
Andries E. Braat ◽  
...  

Abstract Background Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians. Methods In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques. Results Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years. Conclusion In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables. Trial registration Retrospective data were provided by the Scientific Registry of Transplant Recipients under Data Use Agreement number 9477 for analysis of risk factors after liver transplantation.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Zi-Qi Pan ◽  
Shu-Jun Zhang ◽  
Xiang-Lian Wang ◽  
Yu-Xin Jiao ◽  
Jian-Jian Qiu

Background and Objective. Although radiotherapy has become one of the main treatment methods for cancer, there is no noninvasive method to predict the radiotherapeutic response of individual glioblastoma (GBM) patients before surgery. The purpose of this study is to develop and validate a machine learning-based radiomics signature to predict the radiotherapeutic response of GBM patients. Methods. The MRI images, genetic data, and clinical data of 152 patients with GBM were analyzed. 122 patients from the TCIA dataset (training set: n = 82 ; validation set: n = 40 ) and 30 patients from local hospitals were used as an independent test dataset. Radiomics features were extracted from multiple regions of multiparameter MRI. Kaplan-Meier survival analysis was used to verify the ability of the imaging signature to predict the response of GBM patients to radiotherapy before an operation. Multivariate Cox regression including radiomics signature and preoperative clinical risk factors was used to further improve the ability to predict the overall survival (OS) of individual GBM patients, which was presented in the form of a nomogram. Results. The radiomics signature was built by eight selected features. The C -index of the radiomics signature in the TCIA and independent test cohorts was 0.703 ( P < 0.001 ) and 0.757 ( P = 0.001 ), respectively. Multivariate Cox regression analysis confirmed that the radiomics signature (HR: 0.290, P < 0.001 ), age (HR: 1.023, P = 0.01 ), and KPS (HR: 0.968, P < 0.001 ) were independent risk factors for OS in GBM patients before surgery. When the radiomics signature and preoperative clinical risk factors were combined, the radiomics nomogram further improved the performance of OS prediction in individual patients ( C ‐ index = 0.764 and 0.758 in the TCIA and test cohorts, respectively). Conclusion. This study developed a radiomics signature that can predict the response of individual GBM patients to radiotherapy and may be a new supplement for precise GBM radiotherapy.


2020 ◽  
Vol 11 ◽  
Author(s):  
Siqi Dai ◽  
Shuang Xu ◽  
Yao Ye ◽  
Kefeng Ding

BackgroundDespite recent advance in immune therapy, great heterogeneity exists in the outcomes of colorectal cancer (CRC) patients. In this study, we aimed to analyze the immune-related gene (IRG) expression profiles from three independent public databases and develop an effective signature to forecast patient’s prognosis.MethodsIRGs were collected from the ImmPort database. The CRC dataset from The Cancer Genome Atlas (TCGA) database was used to identify a prognostic gene signature, which was verified in another two CRC datasets from the Gene Expression Omnibus (GEO). Gene function enrichment analysis was conducted. A prognostic nomogram was built incorporating the IRG signature with clinical risk factors.ResultsThe three datasets had 487, 579, and 224 patients, respectively. A prognostic six-gene-signature (CCL22, LIMK1, MAPKAPK3, FLOT1, GPRC5B, and IL20RB) was developed through feature selection that showed good differentiation between the low- and high-risk groups in the training set (p &lt; 0.001), which was later confirmed in the two validation groups (log-rank p &lt; 0.05). The signature outperformed tumor TNM staging for survival prediction. GO and KEGG functional annotation analysis suggested that the signature was significantly enriched in metabolic processes and regulation of immunity (p &lt; 0.05). When combined with clinical risk factors, the model showed robust prediction capability.ConclusionThe immune-related six-gene signature is a reliable prognostic indicator for CRC patients and could provide insight for personalized cancer management.


2010 ◽  
Vol 2 (4) ◽  
pp. 350-355 ◽  
Author(s):  
Sandhya Joshi ◽  
P. Deepa Shenoy ◽  
Vibhudendra Simha G.G. ◽  
Venugopal K. R ◽  
L.M. Patnaik

2021 ◽  
Vol 12 ◽  
Author(s):  
Santu Rana ◽  
Wei Luo ◽  
Truyen Tran ◽  
Svetha Venkatesh ◽  
Paul Talman ◽  
...  

Aim: To use available electronic administrative records to identify data reliability, predict discharge destination, and identify risk factors associated with specific outcomes following hospital admission with stroke, compared to stroke specific clinical factors, using machine learning techniques.Method: The study included 2,531 patients having at least one admission with a confirmed diagnosis of stroke, collected from a regional hospital in Australia within 2009–2013. Using machine learning (penalized regression with Lasso) techniques, patients having their index admission between June 2009 and July 2012 were used to derive predictive models, and patients having their index admission between July 2012 and June 2013 were used for validation. Three different stroke types [intracerebral hemorrhage (ICH), ischemic stroke, transient ischemic attack (TIA)] were considered and five different comparison outcome settings were considered. Our electronic administrative record based predictive model was compared with a predictive model composed of “baseline” clinical features, more specific for stroke, such as age, gender, smoking habits, co-morbidities (high cholesterol, hypertension, atrial fibrillation, and ischemic heart disease), types of imaging done (CT scan, MRI, etc.), and occurrence of in-hospital pneumonia. Risk factors associated with likelihood of negative outcomes were identified.Results: The data was highly reliable at predicting discharge to rehabilitation and all other outcomes vs. death for ICH (AUC 0.85 and 0.825, respectively), all discharge outcomes except home vs. rehabilitation for ischemic stroke, and discharge home vs. others and home vs. rehabilitation for TIA (AUC 0.948 and 0.873, respectively). Electronic health record data appeared to provide improved prediction of outcomes over stroke specific clinical factors from the machine learning models. Common risk factors associated with a negative impact on expected outcomes appeared clinically intuitive, and included older age groups, prior ventilatory support, urinary incontinence, need for imaging, and need for allied health input.Conclusion: Electronic administrative records from this cohort produced reliable outcome prediction and identified clinically appropriate factors negatively impacting most outcome variables following hospital admission with stroke. This presents a means of future identification of modifiable factors associated with patient discharge destination. This may potentially aid in patient selection for certain interventions and aid in better patient and clinician education regarding expected discharge outcomes.


Sign in / Sign up

Export Citation Format

Share Document