scholarly journals Development and external validation of a logistic regression derived algorithm to estimate a 12-month post open defecation free slippage risk

2019 ◽  
Vol 7 (1) ◽  
Author(s):  
Warren Mukelabai Simangolwa

Appropriate open defecation free (ODF) sustainability interventions are key to further mobilise communities to consume sanitation and hygiene products and services that enhance household’s quality of life and embed household behavioural change for heathier communities. This study aims to develop a logistic regression derived risk algorithm to estimate a 12-month ODF slippage risk and externally validate the model in an independent data set. ODF slippage occurs when one or more toilet adequacy parameters are no longer present for one or more toilets in a community. Data in the Zambia district health information software for water sanitation and hygiene management information system for Chungu and Chabula chiefdoms was used for the study. The data was retrieved from the date of chief Chungu and Chabula chiefdoms' attainment of ODF status in October 2016 for 12 months until September 2017 for the development and validation data sets respectively. Data was assumed to be missing completely at random and the complete case analysis approach was used. The events per variables were satisfactory for both the development and validation data sets. Multivariable regression with a backwards selection procedure was used to decide candidate predictor variables with p < 0.05 meriting inclusion. To correct for optimism, the study compared amount of heuristic shrinkage by comparing the model’s apparent C-statistic to the C- statistic computed by nonparametric bootstrap resampling. In the resulting model, an increase in the covariates ‘months after ODF attainment’, ‘village population’ and ‘latrine built after CLTS’, were all associated with a higher probability of ODF slippage. Conversely, an increase in the covariate ‘presence of a handwashing station with soap’, was associated with reduced probability of ODF slippage. The predictive performance of the model was improved by the heuristic shrinkage factor of 0.988. The external validation confirmed good prediction performance with an area under the receiver operating characteristic curve of 0.85 and no significant lack of fit (Hosmer-Lemeshow test: p = 0.246). The results must be interpreted with caution in regions where the ODF definitions, culture and other factors are different from those asserted in the study.

BMJ Open ◽  
2018 ◽  
Vol 8 (11) ◽  
pp. e022802 ◽  
Author(s):  
Michael M Schlussel ◽  
David J Keene ◽  
Gary S Collins ◽  
Jennifer Bostock ◽  
Christopher Byrne ◽  
...  

ObjectivesTo develop and externally validate a prognostic model for poor recovery after ankle sprain.Setting and participantsModel development used secondary data analysis of 584 participants from a UK multicentre randomised clinical trial. External validation used data from 682 participants recruited in 10 UK emergency departments for a prospective observational cohort.Outcome and analysisPoor recovery was defined as presence of pain, functional difficulty or lack of confidence in the ankle at 9 months after injury. Twenty-three baseline candidate predictors were included together in a multivariable logistic regression model to identify the best predictors of poor recovery. Relationships between continuous variables and the outcome were modelled using fractional polynomials. Regression parameters were combined over 50 imputed data sets using Rubin’s rule. To minimise overfitting, regression coefficients were multiplied by a heuristic shrinkage factor and the intercept re-estimated. Incremental value of candidate predictors assessed at 4 weeks after injury was explored using decision curve analysis and the baseline model updated. The final models included predictors selected based on the Akaike information criterion (p<0.157). Model performance was assessed by calibration and discrimination.ResultsOutcome rate was lower in the development (6.7%) than in the external validation data set (19.9%). Mean age (29.9 and 33.6 years), body mass index (BMI; 26.3 and 27.1 kg/m2), pain when resting (37.8 and 38.5 points) or bearing weight on the ankle (75.4 and 71.3 points) were similar in both data sets. Age, BMI, pain when resting, pain bearing weight, ability to bear weight, days from injury until assessment and injury recurrence were the selected predictors. The baseline model had fair discriminatory ability (C-statistic 0.72; 95% CI 0.66 to 0.79) but poor calibration. The updated model presented better discrimination (C-statistic 0.78; 95% CI 0.72 to 0.84), but equivalent calibration.ConclusionsThe models include predictors easy to assess clinically and show benefit when compared with not using any model.Trial registration numberISRCTN12726986; Results.


2019 ◽  
Vol 4 (6) ◽  
pp. e001801
Author(s):  
Sarah Hanieh ◽  
Sabine Braat ◽  
Julie A Simpson ◽  
Tran Thi Thu Ha ◽  
Thach D Tran ◽  
...  

IntroductionGlobally, an estimated 151 million children under 5 years of age still suffer from the adverse effects of stunting. We sought to develop and externally validate an early life predictive model that could be applied in infancy to accurately predict risk of stunting in preschool children.MethodsWe conducted two separate prospective cohort studies in Vietnam that intensively monitored children from early pregnancy until 3 years of age. They included 1168 and 475 live-born infants for model development and validation, respectively. Logistic regression on child stunting at 3 years of age was performed for model development, and the predicted probabilities for stunting were used to evaluate the performance of this model in the validation data set.ResultsStunting prevalence was 16.9% (172 of 1015) in the development data set and 16.4% (70 of 426) in the validation data set. Key predictors included in the final model were paternal and maternal height, maternal weekly weight gain during pregnancy, infant sex, gestational age at birth, and infant weight and length at 6 months of age. The area under the receiver operating characteristic curve in the validation data set was 0.85 (95% Confidence Interval, 0.80–0.90).ConclusionThis tool applied to infants at 6 months of age provided valid prediction of risk of stunting at 3 years of age using a readily available set of parental and infant measures. Further research is required to examine the impact of preventive measures introduced at 6 months of age on those identified as being at risk of growth faltering at 3 years of age.


2020 ◽  
Author(s):  
Yong Li ◽  
Shuzheng Lyu

BACKGROUND Prevention of coronary microvascular obstruction /no-reflow phenomenon(CMVO/NR) is a crucial step in improving prognosis of patients with acute ST segment elevation myocardial infarction (STEMI )during primary percutaneous coronary intervention (PPCI). OBJECTIVE The objective of our study was to develop and externally validate a diagnostic model of CMVO/NR in patients with acute STEMI underwent PPCI. METHODS Design: Multivariate logistic regression of a cohort of acute STEMI patients. Setting: Emergency department ward of a university hospital. Participants: Diagnostic model development: Totally 1232 acute STEMI patients who were consecutively treated with PPCI from November 2007 to December 2013. External validation: Totally 1301 acute STEMI patients who were treated with PPCI from January 2014 to June 2018. Outcomes: CMVO/NR during PPCI. We used logistic regression analysis to analyze the risk factors of CMVO/NR in the development data set. We developed a diagnostic model of CMVO/NR and constructed a nomogram.We assessed the predictive performance of the diagnostic model in the validation data sets by examining measures of discrimination, calibration, and decision curve analysis (DCA). RESULTS A total of 147 out of 1,232 participants (11.9%) presented CMVO/NR in the development dataset.The strongest predictors of CMVO/NR were age, periprocedural bradycardia, using thrombus aspiration devices during procedure and total occlusion of culprit vessel. Logistic regression analysis showed that the differences between two group with and without CMVO/NR in age( odds ratios (OR)1.031; 95% confidence interval(CI), 1.015 ~1.048 ; P <.001), periprocedural bradycardia (OR 2.151;95% CI,1.472~ 3.143 ; P <.001) , total occlusion of the culprit vessel (OR 1.842;95% CI, 1.095~ 3.1 ; P =.021) , and using thrombus aspirationdevices during procedure (OR 1.631; 95% CI, 1.029~ 2.584 ; P =.037).We developed a diagnostic model of CMVO/NR. The area under the receiver operating characteristic curve (AUC) was .6833±.023. We constructed a nomogram. CMVO/NR occurred in 120 out of 1,301 participants (9.2%) in the validation data set. The AUC was .6547±.025. Discrimination, calibration, and DCA were satisfactory. Date of approved by ethic committee:16 May 2019. Date of data collection start: 1 June 2019. Numbers recruited as of submission of the manuscript:2,533. CONCLUSIONS We developed and externally validated a diagnostic model of CMVO/NR during PPCI. CLINICALTRIAL We registered this study with WHO International Clinical Trials Registry Platform on 16 May 2019. Registration number: ChiCTR1900023213. http://www.chictr.org.cn/edit.aspx?pid=39057&htm=4.


2019 ◽  
Vol 115 (3/4) ◽  
Author(s):  
Douw G. Breed ◽  
Tanja Verster

Segmentation of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main types of segmentation and examples of improved performance of predictive models exist for both approaches. However, both focus on a single aspect – either target separation or independent variable distribution – and combining them may deliver better results. This combination approach is called semi-supervised segmentation. Our objective was to explore four new semi-supervised segmentation techniques that may offer alternative strengths. We applied these techniques to six data sets from different domains, and compared the model performance achieved. The original semi-supervised segmentation technique was the best for two of the data sets (as measured by the improvement in validation set Gini), but others outperformed for the other four data sets. Significance: We propose four newly developed semi-supervised segmentation techniques that can be used as additional tools for segmenting data before fitting a logistic regression. In all comparisons, using semi-supervised segmentation before fitting a logistic regression improved the modelling performance (as measured by the Gini coefficient on the validation data set) compared to using unsegmented logistic regression.


Author(s):  
Zhiyi Wang ◽  
Jie Weng ◽  
Zhongwang Li ◽  
Ruonan Hou ◽  
Lebin Zhou ◽  
...  

BackgroundThe COVID-19 virus is an emerging virus rapidly spread worldwide This study aimed to establish an effective diagnostic nomogram for suspected COVID-19 pneumonia patients.METHODSWe used the LASSO aggression and multivariable logistic regression methods to explore the predictive factors associated with COVID-19 pneumonia, and established the diagnostic nomogram for COVID-19 pneumonia using multivariable regression. This diagnostic nomogram was assessed by the internal and external validation data set. Further, we plotted decision curves and clinical impact curve to evaluate the clinical usefulness of this diagnostic nomogram.RESULTSThe predictive factors including the epidemiological history, wedge- shaped or fan-shaped lesion parallel to or near the pleura, bilateral lower lobes, ground glass opacities, crazy paving pattern and white blood cell (WBC) count were contained in the nomogram. In the primary cohort, the C-statistic for predicting the probability of the COVID-19 pneumonia was 0.967, even higher than the C-statistic (0.961) in initial viral nucleic acid nomogram which was established using the univariable regression. The C-statistic was 0.848 in external validation cohort. Good calibration curves were observed for the prediction probability in the internal validation and external validation cohort. The nomogram both performed well in terms of discrimination and calibration. Moreover, decision curve and clinical impact curve were also beneficial for COVID- 19 pneumonia patients.CONCLUSIONOur nomogram can be used to predict COVID-19 pneumonia accurately and favourably.


2020 ◽  
Author(s):  
Zhiyi Wang ◽  
Jie Weng ◽  
Zhongwang Li ◽  
Ruonan Hou ◽  
Lebin Zhou ◽  
...  

Abstract Background The COVID-19 virus is an emerging virus associated with severe respiratory illness first detected in December, 2019, and rapidly spread worldwide. The aim of this study was to establish an effective diagnostic nomogram for suspected COVID-19 pneumonia patients. Methods We used the LASSO aggression and multivariable logistic regression methods to explore the predictive factors associated with COVID-19 pneumonia, and established the diagnostic nomogram for COVID-19 pneumonia using multivariable regression. This diagnostic nomogram was assessed by the internal and external validation data set. Further, we plotted decision curves and clinical impact curve to evaluate the clinical usefulness of this diagnostic nomogram. Results The predictive factors including the epidemiological history, wedge-shaped or fan-shaped lesion parallel to or near the pleura, bilateral lower lobes, ground glass opacities, crazy paving pattern and white blood cell (WBC) count were contained in the nomogram. In the primary cohort, the C-statistic for predicting the probability of the COVID-19 pneumonia was 0.967, even higher than the C-statistic (0.961) in initial viral nucleic acid nomogram which was established using the univariable regression. The C-statistic was 0.848 in external validation cohort. Good calibration curves were observed for the prediction probability in the internal validation and external validation cohort. The nomogram both performed well in terms of discrimination and calibration. Moreover, decision curve and clinical impact curve were also beneficial for COVID-19 pneumonia patients. Conclusion Our nomogram can be used to predict COVID-19 pneumonia accurately and favourably.


BMJ Open ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. e040778
Author(s):  
Vineet Kumar Kamal ◽  
Ravindra Mohan Pandey ◽  
Deepak Agrawal

ObjectiveTo develop and validate a simple risk scores chart to estimate the probability of poor outcomes in patients with severe head injury (HI).DesignRetrospective.SettingLevel-1, government-funded trauma centre, India.ParticipantsPatients with severe HI admitted to the neurosurgery intensive care unit during 19 May 2010–31 December 2011 (n=946) for the model development and further, data from same centre with same inclusion criteria from 1 January 2012 to 31 July 2012 (n=284) for the external validation of the model.Outcome(s)In-hospital mortality and unfavourable outcome at 6 months.ResultsA total of 39.5% and 70.7% had in-hospital mortality and unfavourable outcome, respectively, in the development data set. The multivariable logistic regression analysis of routinely collected admission characteristics revealed that for in-hospital mortality, age (51–60, >60 years), motor score (1, 2, 4), pupillary reactivity (none), presence of hypotension, basal cistern effaced, traumatic subarachnoid haemorrhage/intraventricular haematoma and for unfavourable outcome, age (41–50, 51–60, >60 years), motor score (1–4), pupillary reactivity (none, one), unequal limb movement, presence of hypotension were the independent predictors as its 95% confidence interval (CI) of odds ratio (OR)_did not contain one. The discriminative ability (area under the receiver operating characteristic curve (95% CI)) of the score chart for in-hospital mortality and 6 months outcome was excellent in the development data set (0.890 (0.867 to 912) and 0.894 (0.869 to 0.918), respectively), internal validation data set using bootstrap resampling method (0.889 (0.867 to 909) and 0.893 (0.867 to 0.915), respectively) and external validation data set (0.871 (0.825 to 916) and 0.887 (0.842 to 0.932), respectively). Calibration showed good agreement between observed outcome rates and predicted risks in development and external validation data set (p>0.05).ConclusionFor clinical decision making, we can use of these score charts in predicting outcomes in new patients with severe HI in India and similar settings.


Circulation ◽  
2016 ◽  
Vol 133 (suppl_1) ◽  
Author(s):  
Nina P Paynter ◽  
Raji Balasubramanian ◽  
Shuba Gopal ◽  
Franco Giulianini ◽  
Leslie Tinker ◽  
...  

Background: Prior studies of metabolomic profiles and coronary heart disease (CHD) have been limited by relatively small case numbers and scant data in women. Methods: The discovery set examined 371 metabolites in 400 confirmed, incident CHD cases and 400 controls (frequency matched on age, race/ethnicity, hysterectomy status and time of enrollment) in the Women’s Health Initiative Observational Study (WHI-OS). All selected metabolites were validated in a separate set of 394 cases and 397 matched controls drawn from the placebo arms of the WHI Hormone Therapy trials and the WHI-OS. Discovery used 4 methods: false-discovery rate (FDR) adjusted logistic regression for individual metabolites, permutation corrected least absolute shrinkage and selection operator (LASSO) algorithms, sparse partial least squares discriminant analysis (PLS-DA) algorithms, and random forest algorithms. Each method was performed with matching factors only and with matching plus both medication use (aspirin, statins, anti-diabetics and anti-hypertensives) and traditional CHD risk factors (smoking, systolic blood pressure, diabetes, total and HDL cholesterol). Replication in the validation set was defined as a logistic regression coefficient of p<0.05 for the metabolites selected by 3 or 4 methods (tier 1), or a FDR adjusted p<0.05 for metabolites selected by only 1 or 2 methods (tier 2). Results: Sixty-seven metabolites were selected in the discovery data set (30 tier 1 and 37 tier 2). Twenty-six successfully replicated in the validation data set (21 tier 1 and 5 tier 2), with 25 significant with adjusting for matching factors only and 11 significant after additionally adjusting for medications and CHD risk factors. Validated metabolites included amino acids, sugars, nucleosides, eicosanoids, plasmologens, polyunsaturated phospholipids and highly saturated triglycerides. These include novel metabolites as well as metabolites such as glutamate/glutamine, which have been shown in other populations. Conclusions: Multiple metabolites in important physiological pathways with robust associations for risk of CHD in women were identified and replicated. These results may offer insights into biological mechanisms of CHD as well as identify potential markers of risk.


2019 ◽  
Author(s):  
Guangzhi Wang ◽  
Huihui Wan ◽  
Xingxing Jian ◽  
Yuyu Li ◽  
Jian Ouyang ◽  
...  

AbstractIn silico T-cell epitope prediction plays an important role in immunization experimental design and vaccine preparation. Currently, most epitope prediction research focuses on peptide processing and presentation, e.g. proteasomal cleavage, transporter associated with antigen processing (TAP) and major histocompatibility complex (MHC) combination. To date, however, the mechanism for immunogenicity of epitopes remains unclear. It is generally agreed upon that T-cell immunogenicity may be influenced by the foreignness, accessibility, molecular weight, molecular structure, molecular conformation, chemical properties and physical properties of target peptides to different degrees. In this work, we tried to combine these factors. Firstly, we collected significant experimental HLA-I T-cell immunogenic peptide data, as well as the potential immunogenic amino acid properties. Several characteristics were extracted, including amino acid physicochemical property of epitope sequence, peptide entropy, eluted ligand likelihood percentile rank (EL rank(%)) score and frequency score for immunogenic peptide. Subsequently, a random forest classifier for T cell immunogenic HLA-I presenting antigen epitopes and neoantigens was constructed. The classification results for the antigen epitopes outperformed the previous research (the optimal AUC=0.81, external validation data set AUC=0.77). As mutational epitopes generated by the coding region contain only the alterations of one or two amino acids, we assume that these characteristics might also be applied to the classification of the endogenic mutational neoepitopes also called ‘neoantigens’. Based on mutation information and sequence related amino acid characteristics, a prediction model of neoantigen was established as well (the optimal AUC=0.78). Further, an easy-to-use web-based tool ‘INeo-Epp’ was developed (available at http://www.biostatistics.online/INeo-Epp/neoantigen.php)for the prediction of human immunogenic antigen epitopes and neoantigen epitopes.


Author(s):  
André M. Carrington ◽  
Paul W. Fieguth ◽  
Hammad Qazi ◽  
Andreas Holzinger ◽  
Helen H. Chen ◽  
...  

Abstract Background In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Only part of the ROC curve and AUC are informative however when they are used with imbalanced data. Hence, alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve. However, these alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives. Methods We derive and propose a new concordant partial AUC and a new partial c statistic for ROC data—as foundational measures and methods to help understand and explain parts of the ROC plot and AUC. Our partial measures are continuous and discrete versions of the same measure, are derived from the AUC and c statistic respectively, are validated as equal to each other, and validated as equal in summation to whole measures where expected. Our partial measures are tested for validity on a classic ROC example from Fawcett, a variation thereof, and two real-life benchmark data sets in breast cancer: the Wisconsin and Ljubljana data sets. Interpretation of an example is then provided. Results Results show the expected equalities between our new partial measures and the existing whole measures. The example interpretation illustrates the need for our newly derived partial measures. Conclusions The concordant partial area under the ROC curve was proposed and unlike previous partial measure alternatives, it maintains the characteristics of the AUC. The first partial c statistic for ROC plots was also proposed as an unbiased interpretation for part of an ROC curve. The expected equalities among and between our newly derived partial measures and their existing full measure counterparts are confirmed. These measures may be used with any data set but this paper focuses on imbalanced data with low prevalence. Future work Future work with our proposed measures may: demonstrate their value for imbalanced data with high prevalence, compare them to other measures not based on areas; and combine them with other ROC measures and techniques.


Sign in / Sign up

Export Citation Format

Share Document