Feeding the machine: challenges to reproducible predictive modeling in network neuroscience

2021 ◽  
Author(s):  
Andrew Cwiek ◽  
Sarah Rajtmajer ◽  
Brad Wyble ◽  
Vasant Honavar ◽  
Frank Hillary

Machine learning offers a promising set of prediction tools that have enjoyed more recent application in network neuroscience. In this NETN Perspectives, we examine the current application of predictive models, e.g., classifiers trained using machine learning (ML), within the clinical network neurosciences. Our review covers 118 studies published using ML and functional MRI (fMRI) to infer various dimensions of the human functional connectome. We identify several important methodological challenges in this literature. For example, more than half of the studies focused almost exclusively on maximizing the accuracy of classifying brain functional connectomes into one of several predetermined categories (e.g., disease versus healthy), with significantly less emphasis on reproducibility and generalizability of the findings.. . There was also a concerning lack of transparency across many of the key steps in training and evaluating predictive models using machine learning. The summary of this literature underscores the importance of external validation (i.e., lockbox or test-set data) and highlights several methodological pitfalls that can be addressed by the imaging community. We offer recommendations for the principled application of machine learning in the clinical neurosciences to advance imaging biomarkers, understand causative determinants for health risks and track the trajectory of heterogeneous patient outcomes.

2021 ◽  
pp. 1-44
Author(s):  
Andrew Cwiek ◽  
Sarah M. Rajtmajer ◽  
Bradley Wyble ◽  
Vasant Honavar ◽  
Emily Grossner ◽  
...  

Abstract In this critical review, we examine the application of predictive models, e.g. classifiers, trained using Machine Learning (ML) to assist in interpretation of functional neuroimaging data. Our primary goal is to summarize how ML is being applied and critically assess common practices. Our review covers 250 studies published using ML and resting-state functional MRI (fMRI) to infer various dimensions of the human functional connectome. Results for hold-out (“lockbox”) performance was, on average, ~13% less accurate than performance measured through cross-validation alone, highlighting the importance of lockbox data which was included in only 16% of the studies. There was also a concerning lack of transparency across the key steps in training and evaluating predictive models. The summary of this literature underscores the importance of the use of a lockbox and highlights several methodological pitfalls that can be addressed by the imaging community. We argue that, ideally, studies are motivated both by the reproducibility and generalizability of findings as well as the potential clinical significance of the insights. We offer recommendations for principled integration of machine learning into the clinical neurosciences with the goal of advancing imaging biomarkers of brain disorders, understanding causative determinants for health risks, and parsing heterogeneous patient outcomes.


Neurosurgery ◽  
2018 ◽  
Vol 85 (3) ◽  
pp. 384-393 ◽  
Author(s):  
Whitney E Muhlestein ◽  
Dallin S Akagi ◽  
Jason M Davies ◽  
Lola B Chambless

Abstract BACKGROUND Current outcomes prediction tools are largely based on and limited by regression methods. Utilization of machine learning (ML) methods that can handle multiple diverse inputs could strengthen predictive abilities and improve patient outcomes. Inpatient length of stay (LOS) is one such outcome that serves as a surrogate for patient disease severity and resource utilization. OBJECTIVE To develop a novel method to systematically rank, select, and combine ML algorithms to build a model that predicts LOS following craniotomy for brain tumor. METHODS A training dataset of 41 222 patients who underwent craniotomy for brain tumor was created from the National Inpatient Sample. Twenty-nine ML algorithms were trained on 26 preoperative variables to predict LOS. Trained algorithms were ranked by calculating the root mean square logarithmic error (RMSLE) and top performing algorithms combined to form an ensemble. The ensemble was externally validated using a dataset of 4592 patients from the National Surgical Quality Improvement Program. Additional analyses identified variables that most strongly influence the ensemble model predictions. RESULTS The ensemble model predicted LOS with RMSLE of .555 (95% confidence interval, .553-.557) on internal validation and .631 on external validation. Nonelective surgery, preoperative pneumonia, sodium abnormality, or weight loss, and non-White race were the strongest predictors of increased LOS. CONCLUSION An ML ensemble model predicts LOS with good performance on internal and external validation, and yields clinical insights that may potentially improve patient outcomes. This systematic ML method can be applied to a broad range of clinical problems to improve patient care.


2021 ◽  
Author(s):  
Quincy A Hathaway ◽  
Naveena Yanamala ◽  
Matthew J Budoff ◽  
Partho P Sengupta ◽  
Irfan Zeb

Background: There is growing interest in utilizing machine learning techniques for routine atherosclerotic cardiovascular disease (ASCVD) risk prediction. We investigated whether novel deep learning survival models can augment ASCVD risk prediction over existing statistical and machine learning approaches. Methods: 6,814 participants from the Multi-Ethnic Study of Atherosclerosis (MESA) were followed over 16 years to assess incidence of all-cause mortality (mortality) or a composite of major adverse events (MAE). Features were evaluated within the categories of traditional risk factors, inflammatory biomarkers, and imaging markers. Data was split into an internal training/testing (four centers) and external validation (two centers). Both machine learning (COXPH, RSF, and lSVM) and deep learning (nMTLR and DeepSurv) models were evaluated. Results: In comparison to the COXPH model, DeepSurv significantly improved ASCVD risk prediction for MAE (AUC: 0.82 vs. 0.79, P≤0.001) and mortality (AUC: 0.86 vs. 0.80, P≤0.001) with traditional risk factors alone. Implementing non-categorical NRI, we noted a 65% increase in correct reclassification compared to the COXPH model for both MAE and mortality (P≤0.05). Assessing the relative risk of participants, DeepSurv was the only learning algorithm to develop a significantly improved risk score criteria, which outcompeted COXPH for both MAE (4.07 vs. 2.66, P≤0.001) and mortality (6.28 vs. 4.67, P=0.014). The addition of inflammatory or imaging biomarkers to traditional risk factors showed minimal/no significant improvement in model prediction. Conclusion: DeepSurv can leverage simple office-based clinical features alone to accurately predict ASCVD risk and cardiovascular outcomes, without the need for additional features, such as inflammatory and imaging biomarkers.


Author(s):  
Adrian Haimovich ◽  
Neal G. Ravindra ◽  
Stoytcho Stoytchev ◽  
H. Patrick Young ◽  
F. Perry Wilson ◽  
...  

AbstractObjectiveThe goal of this study was to create a predictive model of early hospital respiratory decompensation among patients with COVID-19.DesignObservational, retrospective cohort study.SettingNine-hospital health system within the Northeastern United States.PopulationsAdult patients (≥ 18 years) admitted from the emergency department who tested positive for SARS-CoV-2 (COVID-19) up to 24 hours after initial presentation. Patients meeting criteria for critical respiratory illness within 4 hours of arrival were excluded.Main outcome and performance measuresWe used a composite endpoint of respiratory critical illness as defined by oxygen requirement beyond low-flow nasal cannula (e.g., non-rebreather mask, high-flow nasal cannula, bi-level positive pressure ventilation), intubation, or death within the first 24 hours of hospitalization. We developed predictive models using patient demographic and clinical data collected during those first 4 hours. Eight hospitals were used for development and internal validation (n = 932) and 1 hospital for model external validation (n = 240). Predictive variables were identified using an ensemble approach that included univariate regression, random forest, logistic regression with LASSO, Chi-square testing, gradient boosting information gain, and gradient boosting Shapley additive explanation (SHAP) values prior to manual curation. We generated two predictive models, a quick COVID-19 severity index (qCSI) that uses only exam and vital sign measurements, and a COVID-19 severity index (CSI) machine learning model. Using area under receiver operating characteristic (AU-ROC), precision-recall curves (AU-PRC) and calibration metrics, we compare the qCSI and CSI to three illness scoring systems: Elixhauser mortality score, qSOFA, and CURB-65. We present performance of qCSI and CSI on an external validation cohort.ResultsDuring the study period from March 1, 2020 to April 27, 2020, 1,792 patients were admitted with COVID-19. Six-hundred and twenty patients were excluded based on age or critical illness within the first 4 hours, yielding 1172 patients in the final cohort. Of these patients, 144 (12.3%) met the composite endpoint within the first 24 hours. The qCSI (AU-ROC: 0.90 [0.85-0.96]) comprised of nasal cannula flow rate, respiratory rate, and minimum documented pulse oximetry outperformed the baseline models (qSOFA: 0.76 [0.69-0.85]; Elixhauser: 0.70 [0.62-0.80]; CURB-65: AU-ROC 0.66 [0.58-0.77]) and was validated on an external cohort (AU-ROC: 0.82). The machine learning-based CSI had superior performance on the training cohort (AU-ROC: 0.91 [0.86-0.97]), but was unlikely to provide practical improvements in clinical settings.ConclusionsA significant proportion of admitted COVID-19 patients decompensate within 24 hours of hospital presentation and these events are accurately predicted using respiratory exam findings within a simple scoring system.


Author(s):  
Giulia Lorenzoni ◽  
Nicolò Sella ◽  
Annalisa Boscolo ◽  
Danila Azzolina ◽  
Patrizia Bartolotta ◽  
...  

Abstract Background Since the beginning of coronavirus disease 2019 (COVID-19), the development of predictive models has sparked relevant interest due to the initial lack of knowledge about diagnosis, treatment, and prognosis. The present study aimed at developing a model, through a machine learning approach, to predict intensive care unit (ICU) mortality in COVID-19 patients based on predefined clinical parameters. Results Observational multicenter cohort study. All COVID-19 adult patients admitted to 25 ICUs belonging to the VENETO ICU network (February 28th 2020-april 4th 2021) were enrolled. Patients admitted to the ICUs before 4th March 2021 were used for model training (“training set”), while patients admitted after the 5th of March 2021 were used for external validation (“test set 1”). A further group of patients (“test set 2”), admitted to the ICU of IRCCS Ca’ Granda Ospedale Maggiore Policlinico of Milan, was used for external validation. A SuperLearner machine learning algorithm was applied for model development, and both internal and external validation was performed. Clinical variables available for the model were (i) age, gender, sequential organ failure assessment score, Charlson Comorbidity Index score (not adjusted for age), Palliative Performance Score; (ii) need of invasive mechanical ventilation, non-invasive mechanical ventilation, O2 therapy, vasoactive agents, extracorporeal membrane oxygenation, continuous venous-venous hemofiltration, tracheostomy, re-intubation, prone position during ICU stay; and (iii) re-admission in ICU. One thousand two hundred ninety-three (80%) patients were included in the “training set”, while 124 (8%) and 199 (12%) patients were included in the “test set 1” and “test set 2,” respectively. Three different predictive models were developed. Each model included different sets of clinical variables. The three models showed similar predictive performances, with a training balanced accuracy that ranged between 0.72 and 0.90, while the cross-validation performance ranged from 0.75 to 0.85. Age was the leading predictor for all the considered models. Conclusions Our study provides a useful and reliable tool, through a machine learning approach, for predicting ICU mortality in COVID-19 patients. In all the estimated models, age was the variable showing the most important impact on mortality.


2021 ◽  
Author(s):  
Chen Zhu ◽  
Zidu Xu ◽  
Yaowen Gu ◽  
Si Zheng ◽  
Xiangyu Sun ◽  
...  

BACKGROUND Poststroke immobility gets patients more vulnerable to stroke-relevant complications. Urinary tract infection (UTI) is one of major nosocomial infections significantly affecting the outcomes of immobile stroke patients. Previous studies have identified several risk factors, but it is still challenging to accurately estimate personal UTI risk due to unclear interaction of various factors and variability of individual characteristics. This calls for more precise and trust-worthy predictive models to assist with potential UTI identification. OBJECTIVE The aim of this study was to develop predictive models for UTI risk identification for immobile stroke patients. A prospective analysis was conducted to evaluate the effectiveness and clinical interpretability of the models. METHODS The data used in this study were collected from the Common Complications of Bedridden Patients and the Construction of Standardized Nursing Intervention Model (CCBPC). Derivation cohort included data of 3982 immobile stroke patients collected during CCBPC-I, from November 1, 2015 to June 30, 2016; external validation cohort included data of 3837 immobile stroke patients collected during CCBPC-II, from November 1, 2016 to July 30, 2017. 6 machine learning models and an ensemble learning model were derived based on 80% of derivation cohort and its effectiveness was evaluated with the remaining 20% of derivation cohort data. We further compared the effectiveness of predictive models in external validation cohort. The performance of logistic regression without regularization was used as a reference. We used Shapley additive explanation values to determine feature importance and examine the clinical significance of prediction models. Shapely values of the factors were calculated to represent the magnitude, prevalence, and direction of their effects, and were further visualized in a summary plot. RESULTS A total of 103(2.59%) patients were diagnosed with UTI in derivation cohort(N=3982); the internal validation cohort (N=797) shared the same incidence. The external validation cohort had a UTI incidence of 1.38% (N=53). Evaluation results showed that the ensemble learning model performed the best in area under the receiver operating characteristic (ROC) curve in internal validation, up to 82.2%; second best in external validation, 80.8%. In addition, the ensemble learning model performed the best sensitivity in both internal and external validation sets (80.9% and 81.1%, respectively). We also identified seven UTI risk factors (pneumonia, glucocorticoid use, female sex, mixed cerebrovascular disease, increased age, prolonged length of stay, and duration of catheterization) contributing most to the predictive model, thus demonstrating the clinical interpretability of model. CONCLUSIONS Our ensemble learning model demonstrated promising performance. Identifying UTI risk and detecting high risk factors among immobile stroke patients would allow more selective and effective use of preventive interventions, thus improving clinical outcomes. Future work should focus on developing a more concise scoring tool and prospectively examining the model in practical use.


2021 ◽  
Author(s):  
Norberto Sánchez-Cruz ◽  
Jose L. Medina-Franco

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>


Author(s):  
Laure Fournier ◽  
Lena Costaridou ◽  
Luc Bidaut ◽  
Nicolas Michoux ◽  
Frederic E. Lecouvet ◽  
...  

Abstract Existing quantitative imaging biomarkers (QIBs) are associated with known biological tissue characteristics and follow a well-understood path of technical, biological and clinical validation before incorporation into clinical trials. In radiomics, novel data-driven processes extract numerous visually imperceptible statistical features from the imaging data with no a priori assumptions on their correlation with biological processes. The selection of relevant features (radiomic signature) and incorporation into clinical trials therefore requires additional considerations to ensure meaningful imaging endpoints. Also, the number of radiomic features tested means that power calculations would result in sample sizes impossible to achieve within clinical trials. This article examines how the process of standardising and validating data-driven imaging biomarkers differs from those based on biological associations. Radiomic signatures are best developed initially on datasets that represent diversity of acquisition protocols as well as diversity of disease and of normal findings, rather than within clinical trials with standardised and optimised protocols as this would risk the selection of radiomic features being linked to the imaging process rather than the pathology. Normalisation through discretisation and feature harmonisation are essential pre-processing steps. Biological correlation may be performed after the technical and clinical validity of a radiomic signature is established, but is not mandatory. Feature selection may be part of discovery within a radiomics-specific trial or represent exploratory endpoints within an established trial; a previously validated radiomic signature may even be used as a primary/secondary endpoint, particularly if associations are demonstrated with specific biological processes and pathways being targeted within clinical trials. Key Points • Data-driven processes like radiomics risk false discoveries due to high-dimensionality of the dataset compared to sample size, making adequate diversity of the data, cross-validation and external validation essential to mitigate the risks of spurious associations and overfitting. • Use of radiomic signatures within clinical trials requires multistep standardisation of image acquisition, image analysis and data mining processes. • Biological correlation may be established after clinical validation but is not mandatory.


2021 ◽  
Vol 188 ◽  
pp. 105264
Author(s):  
M. Pilar Romero ◽  
Yu-Mei Chang ◽  
Lucy A. Brunton ◽  
Alison Prosser ◽  
Paul Upton ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document