Multiple additive regression trees with hybrid loss for classification tasks across heterogeneous clinical data in distributed environments: a case study

Background:Axial spondyloarthritis (axSpA) is a chronic rheumatic disease that encompasses various clinical presentations: inflammatory chronic back pain, peripheral manifestations and extra-articular manifestations. The current nomenclature divides axSpA in radiographic (in the presence of radiographic sacroiliitis) and non-radiographic (in the absence of radiographic sacroiliitis, with or without MRI sacroiliitis. Given that the functional burden of the disease appears to be greater in patients with radiographic forms, it seems crucial to be able to predict which patients will be more likely to develop structural damage over time. Predictive factors for radiographic progression in axSpA have been identified through use of traditional statistical models like logistic regression. However, these models present some limitations. In order to overcome these limitations and to improve the predictive performance, machine learning (ML) methods have been developed.Objectives:To compare ML models to traditional models to predict radiographic progression in patients with early axSpA.Methods:Study design: prospective French multicentric cohort study (DESIR cohort) with 5years of follow-up. Patients: all patients included in the cohort, i.e. 708 patients with inflammatory back pain for >3 months but <3 years, highly suggestive of axSpA. Data on the first 5 years of follow-up was used. Statistical analyses: radiographic progression was defined as progression either at the spine (increase of at least 1 point per 2 years of mSASSS scores) or at the sacroiliac joint (worsening of at least one grade of the mNY score between 2 visits). Traditional modelling: we first performed a bivariate analysis between our outcome (radiographic progression) and explanatory variables at baseline to select the variables to be included in our models and then built a logistic regression model (M1). Variable selection for traditional models was performed with 2 different methods: stepwise selection based on Akaike Information Criterion (stepAIC) method (M2), and the Least Absolute Shrinkage and Selection Operator (LASSO) method (M3). We also performed sensitivity analysis on all patients with manual backward method (M4) after multiple imputation of missing data. Machine learning modelling: using the “SuperLearner” package on R, we modelled radiographic progression with stepAIC, LASSO, random forest, Discrete Bayesian Additive Regression Trees Samplers (DBARTS), Generalized Additive Models (GAM), multivariate adaptive polynomial spline regression (polymars), Recursive Partitioning And Regression Trees (RPART) and Super Learner. Finally, the accuracy of traditional and ML models was compared based on their 10-foldcross-validated AUC (cv-AUC).Results:10-fold cv-AUC for traditional models were 0.79 and 0.78 for M2 and M3, respectively. The 3 best models in the ML algorithm were the GAM, the DBARTS and the Super Learner models, with 10-fold cv-AUC of: 0.77, 0.76 and 0.74, respectively (Table 1).Table 1.Comparison of 10-fold cross-validated AUC between best traditional and machine learning models.Best modelsCross-validated AUCTraditional models M2 (step AIC method)0.79 M3 (LASSO method)0.78Machine learning approach SL Discrete Bayesian Additive Regression Trees Samplers (DBARTS)0.76 SL Generalized Additive Models (GAM)0.77 Super Learner0.74AUC: Area Under the Curve; AIC: Akaike Information Criterion; LASSO: Least Absolute Shrinkage and Selection Operator; SL: SuperLearner. N = 295.Conclusion:Traditional models predicted better radiographic progression than ML models in this early axSpA population. Further ML algorithms image-based or with other artificial intelligence methods (e.g. deep learning) might perform better than traditional models in this setting.Acknowledgments:Thanks to the French National Society of Rheumatology and the DESIR cohort.Disclosure of Interests:Romain Garofoli: None declared, Matthieu resche-rigon: None declared, Maxime Dougados Grant/research support from: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Consultant of: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Speakers bureau: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Désirée van der Heijde Consultant of: AbbVie, Amgen, Astellas, AstraZeneca, BMS, Boehringer Ingelheim, Celgene, Cyxone, Daiichi, Eisai, Eli-Lilly, Galapagos, Gilead Sciences, Inc., Glaxo-Smith-Kline, Janssen, Merck, Novartis, Pfizer, Regeneron, Roche, Sanofi, Takeda, UCB Pharma; Director of Imaging Rheumatology BV, Christian Roux: None declared, Anna Moltó Grant/research support from: Pfizer, UCB, Consultant of: Abbvie, BMS, MSD, Novartis, Pfizer, UCB

Download Full-text

Incorporating external data into the analysis of clinical trials via Bayesian additive regression trees

Statistics in Medicine ◽

10.1002/sim.9191 ◽

2021 ◽

Author(s):

Tianjian Zhou ◽

Yuan Ji

Keyword(s):

Clinical Trials ◽

Regression Trees ◽

External Data ◽

Additive Regression ◽

Bayesian Additive Regression Trees

Download Full-text

BART: Bayesian additive regression trees

The Annals of Applied Statistics ◽

10.1214/09-aoas285 ◽

2010 ◽

Vol 4 (1) ◽

pp. 266-298 ◽

Cited By ~ 440

Author(s):

Hugh A. Chipman ◽

Edward I. George ◽

Robert E. McCulloch

Keyword(s):

Regression Trees ◽

Additive Regression ◽

Bayesian Additive Regression Trees

Download Full-text

Information Extraction for Clinical Data Mining: A Mammography Case Study

2009 IEEE International Conference on Data Mining Workshops ◽

10.1109/icdmw.2009.63 ◽

2009 ◽

Cited By ~ 18

Author(s):

Houssam Nassif ◽

Ryan Woods ◽

Elizabeth Burnside ◽

Mehmet Ayvaci ◽

Jude Shavlik ◽

...

Keyword(s):

Data Mining ◽

Information Extraction ◽

Clinical Data ◽

Clinical Data Mining

Download Full-text

Fusion of Clinical Data: A Case Study to Predict the Type of Treatment of Bone Fractures

Communications in Computer and Information Science - New Trends in Databases and Information Systems ◽

10.1007/978-3-319-67162-8_29 ◽

2017 ◽

pp. 294-301

Author(s):

Anam Haq ◽

Szymon Wilk

Keyword(s):

Clinical Data ◽

Bone Fractures

Download Full-text

Variable Selection and Interaction Detection with Bayesian Additive Regression Trees

10.1201/9781003089018-17 ◽

2021 ◽

pp. 395-414

Author(s):

Carlos M. Carvalho ◽

Edward I. George ◽

P. Richard Hahn ◽

Robert E. McCulloch

Keyword(s):

Variable Selection ◽

Regression Trees ◽

Interaction Detection ◽

Additive Regression ◽

Bayesian Additive Regression Trees

Download Full-text

Estimation of causal effects of multiple treatments in observational studies with a binary outcome

Statistical Methods in Medical Research ◽

10.1177/0962280220921909 ◽

2020 ◽

Vol 29 (11) ◽

pp. 3218-3234 ◽

Cited By ~ 7

Author(s):

Liangyuan Hu ◽

Chenyang Gu ◽

Michael Lopez ◽

Jiayi Ji ◽

Juan Wisnivesky

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimator ◽

Regression Trees ◽

Likelihood Estimator ◽

Inverse Probability ◽

Common Support ◽

Targeted Maximum Likelihood ◽

Additive Regression ◽

Multiple Treatments ◽

Bayesian Additive Regression Trees

There is a dearth of robust methods to estimate the causal effects of multiple treatments when the outcome is binary. This paper uses two unique sets of simulations to propose and evaluate the use of Bayesian additive regression trees in such settings. First, we compare Bayesian additive regression trees to several approaches that have been proposed for continuous outcomes, including inverse probability of treatment weighting, targeted maximum likelihood estimator, vector matching, and regression adjustment. Results suggest that under conditions of non-linearity and non-additivity of both the treatment assignment and outcome generating mechanisms, Bayesian additive regression trees, targeted maximum likelihood estimator, and inverse probability of treatment weighting using generalized boosted models provide better bias reduction and smaller root mean squared error. Bayesian additive regression trees and targeted maximum likelihood estimator provide more consistent 95% confidence interval coverage and better large-sample convergence property. Second, we supply Bayesian additive regression trees with a strategy to identify a common support region for retaining inferential units and for avoiding extrapolating over areas of the covariate space where common support does not exist. Bayesian additive regression trees retain more inferential units than the generalized propensity score-based strategy, and shows lower bias, compared to targeted maximum likelihood estimator or generalized boosted model, in a variety of scenarios differing by the degree of covariate overlap. A case study examining the effects of three surgical approaches for non-small cell lung cancer demonstrates the methods.

Download Full-text

5106A pattern-discovery-based outcome predictive tool integrated with clinical data repository: design and a case study on contrast related acute kidney injury

European Heart Journal ◽

10.1093/eurheartj/ehz746.0042 ◽

2019 ◽

Vol 40 (Supplement_1) ◽

Author(s):

Y X Li ◽

J Jiang ◽

Y Zhang ◽

J P Li ◽

Y Huo

Keyword(s):

Acute Kidney Injury ◽

Clinical Data ◽

Visual Analytics ◽

Outcome Prediction ◽

Pattern Discovery ◽

Kidney Injury ◽

Data Driven ◽

Prediction Tool ◽

Pure Data

Abstract Introduction Clinical data repositories (CDR) including electronic health record (EHR) data have great potential for outcome prediction and risk modeling. However, most CDRs were only used for data displaying, and using data from CDR for outcome prediction often requires careful study design and sophisticated modeling techniques before a hypothesis can be tested. Purpose We built a prediction tool integrated with CDR based on pattern discovery aiming to bridge the above gap and demonstrated a case study on contrast related acute kidney injury (AKI) with the system. Methods A cardiovascular CDR integrated with multiple hospital informatics systems was established. For the case study on AKI, we included patients undergoing cardiac catheterization from January 13, 2015 to April 27, 2017, excluding those with dialysis, end-stage renal disease, renal transplant, and missing pre- or post-procedural creatinine. To handle missing data, a prior-history-note composer was designed to fill in structured data of 14 diseases related to cardiovascular problem. Crucial data such as ejective fraction was extracted from the structured reports. AKI was defined according to Acute Kidney Injury Network by increase of serum creatinine from most recent baseline to the post-procedure 7-day peak. To build predictive modeling, we selected 17 variables covered in existing AKI models. Pattern discovery was recently developed as an interpretable predictive model which works on incomplete noisy data. In this study, we developed a pattern discovery based visual analytics tool, and trained it on 70% data up to August 2016 with three interactive knowledge incorporation modes to develop 3 models: 1) pure data-driven, 2) domain knowledge, and 3) clinician-interactive. In last two modes, a physician using the visual analytics could change the variables and further refine the model, respectively. We tested and compared it with other models on the 30% consecutive patients dated afterwards, which is shown in Figure 1. Results Among 2,560 patients in the final dataset with 17 pre-procedure variables derived from CDR data, 169 (7.3%) had AKI. We measured 4 existing models, whose areas under curves (AUCs) of receiver operating characteristics curve for the test set were 0.70 (Mehran's), 0.72 (Chen's), 0.67 (Gao's) and 0.62 (AGEF), respectively. A pure data-driven machine learning method achieves AUC of 0.72 (Easy Ensemble). The AUCs of our 3 models are 0.77, 0.80, 0.82, respectively, with the last being top where physician knowledge is incorporated. Demo and demonstration Conclusions We developed a novel pattern-discovery-based outcome prediction tool integrated with CDR and purely using EHR data. On the case of predicting contrast related AKI, the tool showed user-friendliness by physicians, and demonstrated a competitive performance in comparison with the state-of-the-art models.

Download Full-text