scholarly journals Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Menelaos Pavlou ◽  
Gareth Ambler ◽  
Rumana Z. Omar

Abstract Background Clustered data arise in research when patients are clustered within larger units. Generalised Estimating Equations (GEE) and Generalised Linear Models (GLMM) can be used to provide marginal and cluster-specific inference and predictions, respectively. Methods Confounding by Cluster (CBC) and Informative cluster size (ICS) are two complications that may arise when modelling clustered data. CBC can arise when the distribution of a predictor variable (termed ‘exposure’), varies between clusters causing confounding of the exposure-outcome relationship. ICS means that the cluster size conditional on covariates is not independent of the outcome. In both situations, standard GEE and GLMM may provide biased or misleading inference, and modifications have been proposed. However, both CBC and ICS are routinely overlooked in the context of risk prediction, and their impact on the predictive ability of the models has been little explored. We study the effect of CBC and ICS on the predictive ability of risk models for binary outcomes when GEE and GLMM are used. We examine whether two simple approaches to handle CBC and ICS, which involve adjusting for the cluster mean of the exposure and the cluster size, respectively, can improve the accuracy of predictions. Results Both CBC and ICS can be viewed as violations of the assumptions in the standard GLMM; the random effects are correlated with exposure for CBC and cluster size for ICS. Based on these principles, we simulated data subject to CBC/ICS. The simulation studies suggested that the predictive ability of models derived from using standard GLMM and GEE ignoring CBC/ICS was affected. Marginal predictions were found to be mis-calibrated. Adjusting for the cluster-mean of the exposure or the cluster size improved calibration, discrimination and the overall predictive accuracy of marginal predictions, by explaining part of the between cluster variability. The presence of CBC/ICS did not affect the accuracy of conditional predictions. We illustrate these concepts using real data from a multicentre study with potential CBC. Conclusion Ignoring CBC and ICS when developing prediction models for clustered data can affect the accuracy of marginal predictions. Adjusting for the cluster mean of the exposure or the cluster size can improve the predictive accuracy of marginal predictions.

2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Michelle Louise Gatt ◽  
Maria Cassar ◽  
Sandra C. Buttigieg

Purpose The purpose of this paper is to identify and analyse the readmission risk prediction tools reported in the literature and their benefits when it comes to healthcare organisations and management.Design/methodology/approach Readmission risk prediction is a growing topic of interest with the aim of identifying patients in particular those suffering from chronic diseases such as congestive heart failure, chronic obstructive pulmonary disease and diabetes, who are at risk of readmission. Several models have been developed with different levels of predictive ability. A structured and extensive literature search of several databases was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-analysis strategy, and this yielded a total of 48,984 records.Findings Forty-three articles were selected for full-text and extensive review after following the screening process and according to the eligibility criteria. About 34 unique readmission risk prediction models were identified, in which their predictive ability ranged from poor to good (c statistic 0.5–0.86). Readmission rates ranged between 3.1 and 74.1% depending on the risk category. This review shows that readmission risk prediction is a complex process and is still relatively new as a concept and poorly understood. It confirms that readmission prediction models hold significant accuracy at identifying patients at higher risk for such an event within specific context.Research limitations/implications Since most prediction models were developed for specific populations, conditions or hospital settings, the generalisability and transferability of the predictions across wider or other contexts may be difficult to achieve. Therefore, the value of prediction models remains limited to hospital management. Future research is indicated in this regard.Originality/value This review is the first to cover readmission risk prediction tools that have been published in the literature since 2011, thereby providing an assessment of the relevance of this crucial KPI to health organisations and managers.


Stroke ◽  
2015 ◽  
Vol 46 (suppl_1) ◽  
Author(s):  
Blessing Jaja ◽  
Hester Lingsma ◽  
Ewout Steyerberg ◽  
R. Loch Macdonald ◽  

Background: Aneurysmal subarachnoid hemorrhage (SAH) is a cerebrovascular emergency. Currently, clinicians have limited tools to estimate outcomes early after hospitalization. We aimed to develop novel prognostic scores using large cohorts of patients reflecting experience from different settings. Methods: Logistic regression analysis was used to develop prediction models for mortality and unfavorable outcomes according to 3-month Glasgow outcome score after SAH based on readily obtained parameters at hospital admission. The development cohort was derived from 10 prospective studies involving 10936 patients in the Subarachnoid Hemorrhage International Trialists (SAHIT) repository. Model performance was assessed by bootstrap internal validation and by cross validation by omission of each of the 10 studies, using R2 statistic, Area under the receiver operating characteristics curve (AUC), and calibration plots. Prognostic scores were developed from the regression coefficients. Results: Predictor variable with the strongest prognostic strength was neurologic status (partial R2 = 12.03%), followed by age (1.91%), treatment modality (1.25%), Fisher grade of CT clot burden (0.65%), history of hypertension (0.37%), aneurysm size (0.12%) and aneurysm location (0.06%). These predictors were combined to develop 3 sets of hierarchical scores based on the coefficients of the regression models. The AUC at bootstrap validation was 0.79-0.80, and at cross validation was 0.64-0.85. Calibration plots demonstrated satisfactory agreement between predicted and observed probabilities of the outcomes. Conclusions: The novel prognostic scores have good predictive ability and potential for broad application as they have been developed from prospective cohorts reflecting experience from different centers globally.


2020 ◽  
Vol 13 (5) ◽  
pp. 92
Author(s):  
Katarina Valaskova ◽  
Pavol Durana ◽  
Peter Adamko ◽  
Jaroslav Jaros

The risk of corporate financial distress negatively affects the operation of the enterprise itself and can change the financial performance of all other partners that come into close or wider contact. To identify these risks, business entities use early warning systems, prediction models, which help identify the level of corporate financial health. Despite the fact that the relevant financial analyses and financial health predictions are crucial to mitigate or eliminate the potential risks of bankruptcy, the modeling of financial health in emerging countries is mostly based on models which were developed in different economic sectors and countries. However, several prediction models have been introduced in emerging countries (also in Slovakia) in the last few years. Thus, the main purpose of the paper is to verify the predictive ability of the bankruptcy models formed in conditions of the Slovak economy in the sector of agriculture. To compare their predictive accuracy the confusion matrix (cross tables) and the receiver operating characteristic curve are used, which allow more detailed analysis than the mere proportion of correct classifications (predictive accuracy). The results indicate that the models developed in the specific economic sector highly outperform the prediction ability of other models either developed in the same country or abroad, usage of which is then questionable considering the issue of prediction accuracy. The research findings confirm that the highest predictive ability of the bankruptcy prediction models is achieved provided that they are used in the same economic conditions and industrial sector in which they were primarily developed.


2014 ◽  
Vol 33 (30) ◽  
pp. 5371-5387 ◽  
Author(s):  
Shaun Seaman ◽  
Menelaos Pavlou ◽  
Andrew Copas

Author(s):  
Shaan Khurshid ◽  
Uri Kartoun ◽  
Jeffrey M. Ashburner ◽  
Ludovic Trinquart ◽  
Anthony Philippakis ◽  
...  

Background - Atrial fibrillation (AF) is associated with increased risks of stroke and heart failure. Electronic health record (EHR) based AF risk prediction may facilitate efficient deployment of interventions to diagnose or prevent AF altogether. Methods - We externally validated an EHR atrial fibrillation (EHR-AF) score in IBM Explorys Life Sciences, a multi-institutional dataset containing statistically de-identified EHR data for over 21 million individuals ("Explorys Dataset"). We included individuals with complete AF risk data, ≥2 office visits within two years, and no prevalent AF. We compared EHR-AF to existing scores including CHARGE-AF, C 2 HEST, and CHA 2 DS 2 -VASc. We assessed association between AF risk scores and 5-year incident AF, stroke, and heart failure using Cox proportional hazards modeling, 5-year AF discrimination using c-indices, and calibration of predicted AF risk to observed AF incidence. Results - Of 21,825,853 individuals in the Explorys Dataset, 4,508,180 comprised the analysis (age 62.5, 56.3% female). AF risk scores were strongly associated with 5-year incident AF (hazard ratio [HR] per standard deviation [SD] increase 1.85 using CHA 2 DS 2 -VASc to 2.88 using EHR-AF), stroke (1.61 using C 2 HEST to 1.92 using CHARGE-AF), and heart failure (1.91 using CHA 2 DS 2 -VASc to 2.58 using EHR-AF). EHR-AF (c-index 0.808 [95%CI 0.807-0.809]) demonstrated favorable AF discrimination compared to CHARGE-AF (0.806 [0.805-0.807]), C 2 HEST (0.683 [0.682-0.684]), and CHA 2 DS 2 -VASc (0.720 [0.719-0.722]). Of the scores, EHR-AF demonstrated the best calibration to incident AF (calibration slope 1.002 [0.997-1.007]). In subgroup analyses, AF discrimination using EHR-AF was lower in individuals with stroke (c-index 0.696 [0.692-0.700]) and heart failure (0.621 [0.617-0.625]). Conclusions - EHR-AF demonstrates predictive accuracy for incident AF using readily ascertained EHR data. AF risk is associated with incident stroke and heart failure. Use of such risk scores may facilitate decision-support and population health management efforts focused on minimizing AF-related morbidity.


Author(s):  
Oanh K Nguyen ◽  
Anil N Makam ◽  
Christopher Clark ◽  
Song Zhang ◽  
Sandeep R Das ◽  
...  

Background: Readmissions after hospitalization for acute myocardial infarction (AMI) are common, but the few available risk prediction models have poor predictive ability. Including more data from hospitalization may improve risk prediction. Objectives: To assess if an AMI-specific electronic health record (EHR) readmission risk prediction model derived and validated from data through the entire hospital course (‘full stay’ model) outperforms a model using data available only from the first day of hospitalization (‘first day’ model). Methods: EHR data from AMI hospitalizations from 6 diverse hospitals in north Texas from 2009-2010 were used to derive a model predicting all-cause non-elective 30-day readmissions which was then validated using five-fold cross-validation. Results: Of 826 consecutive index AMI admissions, 13% were followed by a 30-day readmission. History of diabetes (AOR 2.41, 95% CI 1.37-4.24), SBP <100 mmHg on admission (AOR 2.18, 95% CI 1.68-2.82), elevated Cr (≥2 mg/dL) on admission (AOR 2.56, 95% CI 2.52-6.08), elevated BNP on admission (AOR 6.36, 95% CI 1.65-24.47) and lack of PCI within 24 hours of admission (AOR 1.31, 95% CI 1.02-1.69) were significant predictors of readmission. Our ‘first-day’ AMI readmissions model based on these predictors had good discrimination ( Table ). Adding three other variables from the hospital course - use of IV diuretics (AOR 1.58, 95% CI 1.07-2.31), anemia (hematocrit ≤ 33%) on discharge (AOR 2.04, 95% CI 1.20-3.46), and discharge to post-acute care (AOR 1.50, 95% CI 0.90-2.50) - improved discrimination of the ‘full stay’ AMI model but only modestly improved net reclassification and calibration. Conclusions: A ‘full-stay’ AMI-specific EHR readmission model modestly outperformed a ‘first-day’ EHR model, a multi-condition EHR model, and the CMS AMI model. Surprisingly, incorporating more hospitalization data improved discrimination of the full-stay AMI model but did not meaningfully improve reclassification compared to the first-day model. Readmissions in AMI may be accurately predicted on the first day of hospitalization; waiting until later in hospitalization does not markedly improve risk prediction.


2019 ◽  
Vol 21 (1) ◽  
Author(s):  
Chang Ming ◽  
Valeria Viassolo ◽  
Nicole Probst-Hensch ◽  
Pierre O. Chappuis ◽  
Ivo D. Dinov ◽  
...  

Abstract Background Comprehensive breast cancer risk prediction models enable identifying and targeting women at high-risk, while reducing interventions in those at low-risk. Breast cancer risk prediction models used in clinical practice have low discriminatory accuracy (0.53–0.64). Machine learning (ML) offers an alternative approach to standard prediction modeling that may address current limitations and improve accuracy of those tools. The purpose of this study was to compare the discriminatory accuracy of ML-based estimates against a pair of established methods—the Breast Cancer Risk Assessment Tool (BCRAT) and Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) models. Methods We quantified and compared the performance of eight different ML methods to the performance of BCRAT and BOADICEA using eight simulated datasets and two retrospective samples: a random population-based sample of U.S. breast cancer patients and their cancer-free female relatives (N = 1143), and a clinical sample of Swiss breast cancer patients and cancer-free women seeking genetic evaluation and/or testing (N = 2481). Results Predictive accuracy (AU-ROC curve) reached 88.28% using ML-Adaptive Boosting and 88.89% using ML-random forest versus 62.40% with BCRAT for the U.S. population-based sample. Predictive accuracy reached 90.17% using ML-adaptive boosting and 89.32% using ML-Markov chain Monte Carlo generalized linear mixed model versus 59.31% with BOADICEA for the Swiss clinic-based sample. Conclusions There was a striking improvement in the accuracy of classification of women with and without breast cancer achieved with ML algorithms compared to the state-of-the-art model-based approaches. High-accuracy prediction techniques are important in personalized medicine because they facilitate stratification of prevention strategies and individualized clinical management.


2019 ◽  
Author(s):  
Gilles Charmet ◽  
Louis Gautier Tran ◽  
Jérôme Auzanneau ◽  
Renaud Rincent ◽  
Sophie Bouchet

AbstractWe developed an integrated R library called BWGS to enable easy computation of Genomic Estimates of Breeding values (GEBV) for genomic selection. BWGS relies on existing R-libraries, all freely available from CRAN servers. The two main functions enable to run 1) replicated random cross validations within a training set of genotyped and phenotyped lines and 2) GEBV prediction, for a set of genotyped-only lines. Options are available for 1) missing data imputation, 2) markers and training set selection and 3) genomic prediction with 15 different methods, either parametric or semi-parametric.The usefulness and efficiency of BWGS are illustrated using a population of wheat lines from a real breeding programme. Adjusted yield data from historical trials (highly unbalanced design) were used for testing the options of BWGS. On the whole, 760 candidate lines with adjusted phenotypes and genotypes for 47 839 robust SNP were used. With a simple desktop computer, we obtained results which compared with previously published results on wheat genomic selection. As predicted by the theory, factors that are most influencing predictive ability, for a given trait of moderate heritability, are the size of the training population and a minimum number of markers for capturing every QTL information. Missing data up to 40%, if randomly distributed, do not degrade predictive ability once imputed, and up to 80% randomly distributed missing data are still acceptable once imputed with Expectation-Maximization method of package rrBLUP. It is worth noticing that selecting markers that are most associated to the trait do improve predictive ability, compared with the whole set of markers, but only when marker selection is made on the whole population. When marker selection is made only on the sampled training set, this advantage nearly disappeared, since it was clearly due to overfitting. Few differences are observed between the 15 prediction models with this dataset. Although non-parametric methods that are supposed to capture non-additive effects have slightly better predictive accuracy, differences remain small. Finally, the GEBV from the 15 prediction models are all highly correlated to each other. These results are encouraging for an efficient use of genomic selection in applied breeding programmes and BWGS is a simple and powerful toolbox to apply in breeding programmes or training activities.


Sign in / Sign up

Export Citation Format

Share Document