SAT0587 MACHINE-LEARNING DERIVED ALGORITHMS FOR OUTCOMES PREDICTION IN RHEUMATIC DISEASES: APPLICATION TO RADIOGRAPHIC PROGRESSION IN EARLY AXIAL SPONDYLOARTHRITIS

Background:Axial spondyloarthritis (axSpA) is a chronic rheumatic disease that encompasses various clinical presentations: inflammatory chronic back pain, peripheral manifestations and extra-articular manifestations. The current nomenclature divides axSpA in radiographic (in the presence of radiographic sacroiliitis) and non-radiographic (in the absence of radiographic sacroiliitis, with or without MRI sacroiliitis. Given that the functional burden of the disease appears to be greater in patients with radiographic forms, it seems crucial to be able to predict which patients will be more likely to develop structural damage over time. Predictive factors for radiographic progression in axSpA have been identified through use of traditional statistical models like logistic regression. However, these models present some limitations. In order to overcome these limitations and to improve the predictive performance, machine learning (ML) methods have been developed.Objectives:To compare ML models to traditional models to predict radiographic progression in patients with early axSpA.Methods:Study design: prospective French multicentric cohort study (DESIR cohort) with 5years of follow-up. Patients: all patients included in the cohort, i.e. 708 patients with inflammatory back pain for >3 months but <3 years, highly suggestive of axSpA. Data on the first 5 years of follow-up was used. Statistical analyses: radiographic progression was defined as progression either at the spine (increase of at least 1 point per 2 years of mSASSS scores) or at the sacroiliac joint (worsening of at least one grade of the mNY score between 2 visits). Traditional modelling: we first performed a bivariate analysis between our outcome (radiographic progression) and explanatory variables at baseline to select the variables to be included in our models and then built a logistic regression model (M1). Variable selection for traditional models was performed with 2 different methods: stepwise selection based on Akaike Information Criterion (stepAIC) method (M2), and the Least Absolute Shrinkage and Selection Operator (LASSO) method (M3). We also performed sensitivity analysis on all patients with manual backward method (M4) after multiple imputation of missing data. Machine learning modelling: using the “SuperLearner” package on R, we modelled radiographic progression with stepAIC, LASSO, random forest, Discrete Bayesian Additive Regression Trees Samplers (DBARTS), Generalized Additive Models (GAM), multivariate adaptive polynomial spline regression (polymars), Recursive Partitioning And Regression Trees (RPART) and Super Learner. Finally, the accuracy of traditional and ML models was compared based on their 10-foldcross-validated AUC (cv-AUC).Results:10-fold cv-AUC for traditional models were 0.79 and 0.78 for M2 and M3, respectively. The 3 best models in the ML algorithm were the GAM, the DBARTS and the Super Learner models, with 10-fold cv-AUC of: 0.77, 0.76 and 0.74, respectively (Table 1).Table 1.Comparison of 10-fold cross-validated AUC between best traditional and machine learning models.Best modelsCross-validated AUCTraditional models M2 (step AIC method)0.79 M3 (LASSO method)0.78Machine learning approach SL Discrete Bayesian Additive Regression Trees Samplers (DBARTS)0.76 SL Generalized Additive Models (GAM)0.77 Super Learner0.74AUC: Area Under the Curve; AIC: Akaike Information Criterion; LASSO: Least Absolute Shrinkage and Selection Operator; SL: SuperLearner. N = 295.Conclusion:Traditional models predicted better radiographic progression than ML models in this early axSpA population. Further ML algorithms image-based or with other artificial intelligence methods (e.g. deep learning) might perform better than traditional models in this setting.Acknowledgments:Thanks to the French National Society of Rheumatology and the DESIR cohort.Disclosure of Interests:Romain Garofoli: None declared, Matthieu resche-rigon: None declared, Maxime Dougados Grant/research support from: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Consultant of: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Speakers bureau: AbbVie, Eli Lilly, Merck, Novartis, Pfizer and UCB Pharma, Désirée van der Heijde Consultant of: AbbVie, Amgen, Astellas, AstraZeneca, BMS, Boehringer Ingelheim, Celgene, Cyxone, Daiichi, Eisai, Eli-Lilly, Galapagos, Gilead Sciences, Inc., Glaxo-Smith-Kline, Janssen, Merck, Novartis, Pfizer, Regeneron, Roche, Sanofi, Takeda, UCB Pharma; Director of Imaging Rheumatology BV, Christian Roux: None declared, Anna Moltó Grant/research support from: Pfizer, UCB, Consultant of: Abbvie, BMS, MSD, Novartis, Pfizer, UCB

Download Full-text

The Relationship between Mobility and COVID-19 in Germany: Modeling Case Occurrence using Apple's Mobility Trends Data

Methods of Information in Medicine ◽

10.1055/s-0041-1726276 ◽

2021 ◽

Author(s):

Mark David Walker ◽

Mihály Sulyok

Keyword(s):

Generalized Additive Models ◽

Information Criterion ◽

Additive Models ◽

Online Data ◽

German Government ◽

Explanatory Variables ◽

Mobility Data ◽

Community Mobility ◽

The Relationship ◽

Potential Use

Abstract Background Restrictions on social interaction and movement were implemented by the German government in March 2020 to reduce the transmission of coronavirus disease 2019 (COVID-19). Apple's “Mobility Trends” (AMT) data details levels of community mobility; it is a novel resource of potential use to epidemiologists. Objective The aim of the study is to use AMT data to examine the relationship between mobility and COVID-19 case occurrence for Germany. Is a change in mobility apparent following COVID-19 and the implementation of social restrictions? Is there a relationship between mobility and COVID-19 occurrence in Germany? Methods AMT data illustrates mobility levels throughout the epidemic, allowing the relationship between mobility and disease to be examined. Generalized additive models (GAMs) were established for Germany, with mobility categories, and date, as explanatory variables, and case numbers as response. Results Clear reductions in mobility occurred following the implementation of movement restrictions. There was a negative correlation between mobility and confirmed case numbers. GAM using all three categories of mobility data accounted for case occurrence as well and was favorable (AIC or Akaike Information Criterion: 2504) to models using categories separately (AIC with “driving,” 2511. “transit,” 2513. “walking,” 2508). Conclusion These results suggest an association between mobility and case occurrence. Further examination of the relationship between movement restrictions and COVID-19 transmission may be pertinent. The study shows how new sources of online data can be used to investigate problems in epidemiology.

Download Full-text

A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality

Statistics in Medicine ◽

10.1002/sim.2770 ◽

2007 ◽

Vol 26 (15) ◽

pp. 2937-2957 ◽

Cited By ~ 98

Author(s):

Peter C. Austin

Keyword(s):

Logistic Regression ◽

Generalized Additive Models ◽

Regression Trees ◽

Additive Models ◽

Multivariate Adaptive Regression Splines ◽

Regression Splines ◽

Adaptive Regression ◽

Adaptive Regression Splines

Download Full-text

Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: The BART R Package

Journal of Statistical Software ◽

10.18637/jss.v097.i01 ◽

2021 ◽

Vol 97 (1) ◽

Author(s):

Rodney Sparapani ◽

Charles Spanbauer ◽

Robert McCulloch

Keyword(s):

Machine Learning ◽

Regression Trees ◽

R Package ◽

Efficient Computation ◽

Additive Regression ◽

Bayesian Additive Regression Trees

Download Full-text

Comparative performance of generalized additive models and boosted regression trees for statistical modeling of incidental catch of wahoo (Acanthocybium solandri) in the Mexican tuna purse-seine fishery

Ecological Modelling ◽

10.1016/j.ecolmodel.2012.03.006 ◽

2012 ◽

Vol 233 ◽

pp. 20-25 ◽

Cited By ~ 31

Author(s):

Raul O. Martínez-Rincón ◽

Sofía Ortega-García ◽

Juan G. Vaca-Rodríguez

Keyword(s):

Statistical Modeling ◽

Generalized Additive Models ◽

Regression Trees ◽

Additive Models ◽

Boosted Regression Trees ◽

Purse Seine ◽

Comparative Performance ◽

Incidental Catch ◽

Purse Seine Fishery

Download Full-text

Nonparametric machine learning for precision medicine with longitudinal clinical trials and Bayesian additive regression trees with mixed models

Statistics in Medicine ◽

10.1002/sim.8924 ◽

2021 ◽

Author(s):

Charles Spanbauer ◽

Rodney Sparapani

Keyword(s):

Machine Learning ◽

Clinical Trials ◽

Precision Medicine ◽

Mixed Models ◽

Regression Trees ◽

Additive Regression ◽

Bayesian Additive Regression Trees

Download Full-text

TeleGam: Combining Visualization and Verbalization for Interpretable Machine Learning

10.31219/osf.io/p3wnm ◽

2019 ◽

Author(s):

Fred Hohman ◽

Arjun Srinivasan ◽

Steven M. Drucker

Keyword(s):

Machine Learning ◽

Generalized Additive Models ◽

Additive Models ◽

Prototype System ◽

Future Studies ◽

Interpretable Machine Learning ◽

User Expectations ◽

Potential Benefits ◽

Hard Problems ◽

Key Aspects

While machine learning (ML) continues to find success in solving previously-thought hard problems, interpreting and exploring ML models remains challenging. Recent work has shown that visualizations are a powerful tool to aid debugging, analyzing, and interpreting ML models. However, depending on the complexity of the model (e.g., number of features), interpreting these visualizations can be difficult and may require additional expertise. Alternatively, textual descriptions, or verbalizations, can be a simple, yet effective way to communicate or summarize key aspects about a model, such as the overall trend in a model’s predictions or comparisons between pairs of data instances. With the potential benefits of visualizations and verbalizations in mind, we explore how the two can be combined to aid ML interpretability. Specifically, we present a prototype system, TeleGam, that demonstrates how visualizations and verbalizations can collectively support interactive exploration of ML models, for example, generalized additive models (GAMs). We describe TeleGam’s interface and underlying heuristics to generate the verbalizations. We conclude by discussing how TeleGam can serve as a platform to conduct future studies for understanding user expectations and designing novel interfaces for interpretable ML.

Download Full-text

Modelling the occurrence of convective hazards using ERA5 reanalysis data

10.5194/egusphere-egu21-15165 ◽

2021 ◽

Author(s):

Francesco Battaglioli ◽

Pieter Groenemeijer ◽

Tomas Pucik ◽

Uwe Ulbrich ◽

Henning Rust ◽

...

Keyword(s):

Weather Forecasting ◽

Generalized Additive Models ◽

Reanalysis Data ◽

Severe Weather ◽

Additive Models ◽

Ensemble Forecasts ◽

Probabilistic Forecasts ◽

Medium Range ◽

Additive Regression

Convective hazards such as large hail, severe wind gusts, tornadoes, and heavy rainfall cause high economic damages, fatalities, and injuries across Europe.&#160;There are insufficient observations to determine whether trends in such local phenomena exist, however recent studies suggest that the conditions supporting such hazards have become more frequent across large parts of Europe in recent decades.We model the occurrence of these hazards using Generalized Additive Models (GAM) to investigate the existence of such long-term trends, and to enable objective probabilistic forecasts of these hazards. The models are trained with storm reports from the European Severe Weather Database (ESWD), lightning observations from the EUCLID network, and predictor parameters derived from the ERA5 reanalysis. Our work is based on the framework AR-CHaMo (Additive Regression Convective Hazard Models), previously developed at ESSL.Preliminary results include a spatial depiction of the environmental conditions giving rise to convective hazards at a higher resolution than was possible before. The skill of hail models developed using AR-CHaMo has been shown to be superior to composite parameters used by weather forecasters for the occurrence of large hail, such as the Supercell Composite Parameter (SCP) and the Significant Hail Parameter (SHP). Likewise, for tornadoes, more skillful models can be constructed using the AR-CHaMo framework than predictors such as the Significant Tornado Parameter (STP).The developed models have use both in climate studies and in medium-range severe weather forecasting. We will report on initial efforts to detect long term (1979-2019) trends of convective hazards and present how these models can be used to support severe weather forecasting using medium-range ensemble forecasts.

Download Full-text

Machine learning methods for empirical streamflow simulation: a comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds

Hydrology and Earth System Sciences ◽

10.5194/hess-20-2611-2016 ◽

2016 ◽

Vol 20 (7) ◽

pp. 2611-2628 ◽

Cited By ~ 70

Author(s):

Julie E. Shortridge ◽

Seth D. Guikema ◽

Benjamin F. Zaitchik

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Generalized Additive Models ◽

Additive Models ◽

Multivariate Adaptive Regression Splines ◽

Regression Splines ◽

Climate Conditions ◽

Machine Learning Methods ◽

Adaptive Regression ◽

Adaptive Regression Splines

Abstract. In the past decade, machine learning methods for empirical rainfall–runoff modeling have seen extensive development and been proposed as a useful complement to physical hydrologic models, particularly in basins where data to support process-based models are limited. However, the majority of research has focused on a small number of methods, such as artificial neural networks, despite the development of multiple other approaches for non-parametric regression in recent years. Furthermore, this work has often evaluated model performance based on predictive accuracy alone, while not considering broader objectives, such as model interpretability and uncertainty, that are important if such methods are to be used for planning and management decisions. In this paper, we use multiple regression and machine learning approaches (including generalized additive models, multivariate adaptive regression splines, artificial neural networks, random forests, and M5 cubist models) to simulate monthly streamflow in five highly seasonal rivers in the highlands of Ethiopia and compare their performance in terms of predictive accuracy, error structure and bias, model interpretability, and uncertainty when faced with extreme climate conditions. While the relative predictive performance of models differed across basins, data-driven approaches were able to achieve reduced errors when compared to physical models developed for the region. Methods such as random forests and generalized additive models may have advantages in terms of visualization and interpretation of model structure, which can be useful in providing insights into physical watershed function. However, the uncertainty associated with model predictions under extreme climate conditions should be carefully evaluated, since certain models (especially generalized additive models and multivariate adaptive regression splines) become highly variable when faced with high temperatures.

Download Full-text

Interpretable Machine Learning with Bitonic Generalized Additive Models and Automatic Feature Construction

Discovery Science - Lecture Notes in Computer Science ◽

10.1007/978-3-030-61527-7_26 ◽

2020 ◽

pp. 386-402

Author(s):

Noëlie Cherrier ◽

Michael Mayo ◽

Jean-Philippe Poli ◽

Maxime Defurne ◽

Franck Sabatié

Keyword(s):

Machine Learning ◽

Generalized Additive Models ◽

Additive Models ◽

Feature Construction ◽

Interpretable Machine Learning

Download Full-text

Partitioning environment and space in site-by-species matrices: a comparison of methods for community ecology and macroecology

10.1101/871251 ◽

2019 ◽

Cited By ~ 2

Author(s):

Duarte S. Viana ◽

Petr Keil ◽

Alienor Jeliazkov

Keyword(s):

Machine Learning ◽

Community Ecology ◽

Empirical Test ◽

Variation Partitioning ◽

Regression Trees ◽

Additive Models ◽

Boosted Regression Trees ◽

Spatial Effects ◽

Ecological Data ◽

Constrained Ordination

AbstractCommunity ecologists and macroecologists have long sought to evaluate the importance of environmental conditions and other drivers in determining species composition across sites. Different methods have been used to estimate species-environment relationships while accounting for or partitioning the variation attributed to environment and spatial autocorrelation, but their differences and respective reliability remain poorly known. We compared the performance of four families of statistical methods in estimating the contribution of the environment and space to explain variation in multi-species occurrence and abundance. These methods included distance-based regression (MRM), constrained ordination (RDA and CCA), generalised linear and additive models (GLM, GAM), and treebased machine learning (regression trees, boosted regression trees, and random forests). Depending on the method, the spatial model consisted of either Moran’s Eigenvector Maps (MEM; in constrained ordination and GLM), smooth spatial splines (in GAM), or tree-based non-linear modelling of spatial coordinates (in machine learning). We simulated typical ecological data to assess the methods’ performance in (1) fitting environmental and spatial effects, and (2) partitioning the variation explained by the environmental and spatial effects. Differences in the fitting performance among major model types – (G)LM, GAM, machine learning – were reflected in the variation partitioning performance of the different methods. Machine learning methods, namely boosted regression trees, performed overall better. GAM performed similarly well, though likelihood optimisation did not converge for some empirical test data. The remaining methods performed worse under most simulated data variations (depending on the type of species data, sample size and coverage, autocorrelation range, and response shape). Our results suggest that tree-based machine learning is a robust approach that can be widely used for variation partitioning. Our recommendations apply to single-species niche models, community ecology, and macroecology studies aiming at disentangling the relative contributions of space vs. environment and other drivers of variation in site-by-species matrices.

Download Full-text