Improving survival prediction of high-grade glioma via machine learning techniques based on MRI radiomic, genetic and clinical risk factors

Background:Heterogeneity in disease populations complicates discovery of risk factors. To identify risk factors for subpopulations of diseases, we need analytical methods that can deal with unidentified disease subgroups.Objectives:Inspired by successful approaches from the Big Data field, we developed a high-throughput approach to identify subpopulations within patients with heterogeneous, complex diseases using the wealth of information available in Electronic Medical Records (EMRs).Methods:We extracted longitudinal healthcare-interaction records coded by 1,853 PheCodes[1] of the 64,819 patients from the Boston’s Partners-Biobank. Through dimensionality reduction using t-SNE[2] we created a 2D embedding of 32,424 of these patients (set A). We then identified distinct clusters post-t-SNE using DBscan[3] and visualized the relative importance of individual PheCodes within them using specialized spectrographs. We replicated this procedure in the remaining 32,395 records (set B).Results:Summary statistics of both sets were comparable (Table 1).Table 1.Summary statistics of the total Partners Biobank dataset and the 2 partitions.Set-Aset-BTotalEntries12,200,31112,177,13124,377,442Patients32,42432,39564,819Patientyears369,546.33368,597.92738,144.2unique ICD codes25,05624,95326,305unique Phecodes1,8511,8531,853We found 284 clusters in set A and 295 in set B, of which 63.4% from set A could be mapped to a cluster in set B with a median (range) correlation of 0.24 (0.03 – 0.58).Clusters represented similar yet distinct clinical phenotypes; e.g. patients diagnosed with “other headache syndrome” were separated into four distinct clusters characterized by migraines, neurofibromatosis, epilepsy or brain cancer, all resulting in patients presenting with headaches (Fig. 1 & 2). Though EMR databases tend to be noisy, our method was also able to differentiate misclassification from true cases; SLE patients with RA codes clustered separately from true RA cases.Figure 1.Two dimensional representation of Set A generated using dimensionality reduction (tSNE) and clustering (DBScan).Figure 2.Phenotype Spectrographs (PheSpecs) of four clusters characterized by “Other headache syndromes”, driven by codes relating to migraine, epilepsy, neurofibromatosis or brain cancer.Conclusion:We have shown that EMR data can be used to identify and visualize latent structure in patient categorizations, using an approach based on dimension reduction and clustering machine learning techniques. Our method can identify misclassified patients as well as separate patients with similar problems into subsets with different associated medical problems. Our approach adds a new and powerful tool to aid in the discovery of novel risk factors in complex, heterogeneous diseases.References:[1] Denny, J.C. et al. Bioinformatics (2010)[2]van der Maaten et al. Journal of Machine Learning Research (2008)[3] Ester, M. et al. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. (1996)Disclosure of Interests:Marc Maurits: None declared, Thomas Huizinga Grant/research support from: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Consultant of: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Marcel Reinders: None declared, Soumya Raychaudhuri: None declared, Elizabeth Karlson: None declared, Erik van den Akker: None declared, Rachel Knevel: None declared

Download Full-text

A machine learning‐based survival prediction model of high grade glioma by integration of clinical and dose‐volume histogram parameters

Cancer Medicine ◽

10.1002/cam4.3838 ◽

2021 ◽

Vol 10 (8) ◽

pp. 2774-2786

Author(s):

Haiyan Chen ◽

Chao Li ◽

Lin Zheng ◽

Wei Lu ◽

Yanlin Li ◽

...

Keyword(s):

Machine Learning ◽

Prediction Model ◽

High Grade Glioma ◽

Survival Prediction ◽

High Grade ◽

Dose Volume Histogram ◽

Dose Volume

Download Full-text

Machine learning analysis of multispectral imaging and clinical risk factors to predict amputation wound healing

Journal of Vascular Surgery ◽

10.1016/j.jvs.2021.06.478 ◽

2021 ◽

Author(s):

John J. Squiers ◽

Jeffrey E. Thatcher ◽

David Bastawros ◽

Andrew J. Applewhite ◽

Ronald D. Baxter ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Wound Healing ◽

Multispectral Imaging ◽

Clinical Risk Factors ◽

Clinical Risk ◽

Learning Analysis

Download Full-text

Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques

10.21203/rs.3.rs-22670/v3 ◽

2020 ◽

Author(s):

Georgios Kantidakis ◽

Hein Putter ◽

Carlo Lancia ◽

Jacob de Boer ◽

Andries E Braat ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Liver Transplantation ◽

Prediction Models ◽

Machine Learning Techniques ◽

Brier Score ◽

Survival Prediction ◽

Cox Models ◽

Learning Techniques ◽

Random Survival Forest

Abstract Background: Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.Methods: In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.Results: Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.Conclusion: In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.

Download Full-text

Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques

BMC Medical Research Methodology ◽

10.1186/s12874-020-01153-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Georgios Kantidakis ◽

Hein Putter ◽

Carlo Lancia ◽

Jacob de Boer ◽

Andries E. Braat ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Neural Networks ◽

Liver Transplantation ◽

Prediction Models ◽

Machine Learning Techniques ◽

Brier Score ◽

Cox Models ◽

Learning Techniques ◽

Random Survival Forest

Abstract Background Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians. Methods In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques. Results Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years. Conclusion In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables. Trial registration Retrospective data were provided by the Scientific Registry of Transplant Recipients under Data Use Agreement number 9477 for analysis of risk factors after liver transplantation.

Download Full-text

Machine Learning Based on a Multiparametric and Multiregional Radiomics Signature Predicts Radiotherapeutic Response in Patients with Glioblastoma

Behavioural Neurology ◽

10.1155/2020/1712604 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Zi-Qi Pan ◽

Shu-Jun Zhang ◽

Xiang-Lian Wang ◽

Yu-Xin Jiao ◽

Jian-Jian Qiu

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Cox Regression ◽

Clinical Risk Factors ◽

Cox Regression Analysis ◽

Clinical Risk ◽

Independent Test ◽

Independent Test Dataset ◽

Multiple Regions ◽

Radiomics Signature

Background and Objective. Although radiotherapy has become one of the main treatment methods for cancer, there is no noninvasive method to predict the radiotherapeutic response of individual glioblastoma (GBM) patients before surgery. The purpose of this study is to develop and validate a machine learning-based radiomics signature to predict the radiotherapeutic response of GBM patients. Methods. The MRI images, genetic data, and clinical data of 152 patients with GBM were analyzed. 122 patients from the TCIA dataset (training set: n = 82 ; validation set: n = 40 ) and 30 patients from local hospitals were used as an independent test dataset. Radiomics features were extracted from multiple regions of multiparameter MRI. Kaplan-Meier survival analysis was used to verify the ability of the imaging signature to predict the response of GBM patients to radiotherapy before an operation. Multivariate Cox regression including radiomics signature and preoperative clinical risk factors was used to further improve the ability to predict the overall survival (OS) of individual GBM patients, which was presented in the form of a nomogram. Results. The radiomics signature was built by eight selected features. The C -index of the radiomics signature in the TCIA and independent test cohorts was 0.703 ( P < 0.001 ) and 0.757 ( P = 0.001 ), respectively. Multivariate Cox regression analysis confirmed that the radiomics signature (HR: 0.290, P < 0.001 ), age (HR: 1.023, P = 0.01 ), and KPS (HR: 0.968, P < 0.001 ) were independent risk factors for OS in GBM patients before surgery. When the radiomics signature and preoperative clinical risk factors were combined, the radiomics nomogram further improved the performance of OS prediction in individual patients ( C ‐ index = 0.764 and 0.758 in the TCIA and test cohorts, respectively). Conclusion. This study developed a radiomics signature that can predict the response of individual GBM patients to radiotherapy and may be a new supplement for precise GBM radiotherapy.

Download Full-text

Identification of an Immune-Related Gene Signature to Improve Prognosis Prediction in Colorectal Cancer Patients

Frontiers in Genetics ◽

10.3389/fgene.2020.607009 ◽

2020 ◽

Vol 11 ◽

Author(s):

Siqi Dai ◽

Shuang Xu ◽

Yao Ye ◽

Kefeng Ding

Keyword(s):

Colorectal Cancer ◽

Risk Factors ◽

Expression Profiles ◽

Gene Signature ◽

Clinical Risk Factors ◽

Survival Prediction ◽

Related Gene ◽

Cancer Management ◽

Clinical Risk ◽

Immune Related Gene

BackgroundDespite recent advance in immune therapy, great heterogeneity exists in the outcomes of colorectal cancer (CRC) patients. In this study, we aimed to analyze the immune-related gene (IRG) expression profiles from three independent public databases and develop an effective signature to forecast patient’s prognosis.MethodsIRGs were collected from the ImmPort database. The CRC dataset from The Cancer Genome Atlas (TCGA) database was used to identify a prognostic gene signature, which was verified in another two CRC datasets from the Gene Expression Omnibus (GEO). Gene function enrichment analysis was conducted. A prognostic nomogram was built incorporating the IRG signature with clinical risk factors.ResultsThe three datasets had 487, 579, and 224 patients, respectively. A prognostic six-gene-signature (CCL22, LIMK1, MAPKAPK3, FLOT1, GPRC5B, and IL20RB) was developed through feature selection that showed good differentiation between the low- and high-risk groups in the training set (p < 0.001), which was later confirmed in the two validation groups (log-rank p < 0.05). The signature outperformed tumor TNM staging for survival prediction. GO and KEGG functional annotation analysis suggested that the signature was significantly enriched in metabolic processes and regulation of immunity (p < 0.05). When combined with clinical risk factors, the model showed robust prediction capability.ConclusionThe immune-related six-gene signature is a reliable prognostic indicator for CRC patients and could provide insight for personalized cancer management.

Download Full-text

Classification of Neurodegenerative Disorders Based on Major Risk Factors Employing Machine Learning Techniques

International Journal of Engineering and Technology ◽

10.7763/ijet.2010.v2.146 ◽

2010 ◽

Vol 2 (4) ◽

pp. 350-355 ◽

Cited By ~ 5

Author(s):

Sandhya Joshi ◽

P. Deepa Shenoy ◽

Vibhudendra Simha G.G. ◽

Venugopal K. R ◽

L.M. Patnaik

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Neurodegenerative Disorders ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Application of Machine Learning Techniques to Identify Data Reliability and Factors Affecting Outcome After Stroke Using Electronic Administrative Records

Frontiers in Neurology ◽

10.3389/fneur.2021.670379 ◽

2021 ◽

Vol 12 ◽

Author(s):

Santu Rana ◽

Wei Luo ◽

Truyen Tran ◽

Svetha Venkatesh ◽

Paul Talman ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Ischemic Stroke ◽

Machine Learning Techniques ◽

Discharge Destination ◽

Clinical Factors ◽

Administrative Records ◽

Factors Associated ◽

Learning Techniques ◽

Discharge Outcomes

Aim: To use available electronic administrative records to identify data reliability, predict discharge destination, and identify risk factors associated with specific outcomes following hospital admission with stroke, compared to stroke specific clinical factors, using machine learning techniques.Method: The study included 2,531 patients having at least one admission with a confirmed diagnosis of stroke, collected from a regional hospital in Australia within 2009–2013. Using machine learning (penalized regression with Lasso) techniques, patients having their index admission between June 2009 and July 2012 were used to derive predictive models, and patients having their index admission between July 2012 and June 2013 were used for validation. Three different stroke types [intracerebral hemorrhage (ICH), ischemic stroke, transient ischemic attack (TIA)] were considered and five different comparison outcome settings were considered. Our electronic administrative record based predictive model was compared with a predictive model composed of “baseline” clinical features, more specific for stroke, such as age, gender, smoking habits, co-morbidities (high cholesterol, hypertension, atrial fibrillation, and ischemic heart disease), types of imaging done (CT scan, MRI, etc.), and occurrence of in-hospital pneumonia. Risk factors associated with likelihood of negative outcomes were identified.Results: The data was highly reliable at predicting discharge to rehabilitation and all other outcomes vs. death for ICH (AUC 0.85 and 0.825, respectively), all discharge outcomes except home vs. rehabilitation for ischemic stroke, and discharge home vs. others and home vs. rehabilitation for TIA (AUC 0.948 and 0.873, respectively). Electronic health record data appeared to provide improved prediction of outcomes over stroke specific clinical factors from the machine learning models. Common risk factors associated with a negative impact on expected outcomes appeared clinically intuitive, and included older age groups, prior ventilatory support, urinary incontinence, need for imaging, and need for allied health input.Conclusion: Electronic administrative records from this cohort produced reliable outcome prediction and identified clinically appropriate factors negatively impacting most outcome variables following hospital admission with stroke. This presents a means of future identification of modifiable factors associated with patient discharge destination. This may potentially aid in patient selection for certain interventions and aid in better patient and clinician education regarding expected discharge outcomes.

Download Full-text