scholarly journals Flexibly Accounting for Exposure Misclassification With External Validation Data

2020 ◽  
Vol 189 (8) ◽  
pp. 850-860
Author(s):  
Jessie K Edwards ◽  
Stephen R Cole ◽  
Matthew P Fox

Abstract Measurement error is common in epidemiology, but few studies use quantitative methods to account for bias due to mismeasurement. One potential barrier is that some intuitive approaches that readily combine with methods to account for other sources of bias, like multiple imputation for measurement error (MIME), rely on internal validation data, which are rarely available. Here, we present a reparameterized imputation approach for measurement error (RIME) that can be used with internal or external validation data. We illustrate the advantages of RIME over a naive approach that ignores measurement error and MIME using a hypothetical example and a series of simulation experiments. In both the example and simulations, we combine MIME and RIME with inverse probability weighting to account for confounding when estimating hazard ratios and counterfactual risk functions. MIME and RIME performed similarly when rich external validation data were available and the prevalence of exposure did not vary between the main study and the validation data. However, RIME outperformed MIME when validation data included only true and mismeasured versions of the exposure or when exposure prevalence differed between the data sources. RIME allows investigators to leverage external validation data to account for measurement error in a wide range of scenarios.

2019 ◽  
Vol 27 (3) ◽  
pp. 396-406 ◽  
Author(s):  
Kushan De Silva ◽  
Daniel Jönsson ◽  
Ryan T Demmer

Abstract Objective To identify predictors of prediabetes using feature selection and machine learning on a nationally representative sample of the US population. Materials and Methods We analyzed n = 6346 men and women enrolled in the National Health and Nutrition Examination Survey 2013–2014. Prediabetes was defined using American Diabetes Association guidelines. The sample was randomly partitioned to training (n = 3174) and internal validation (n = 3172) sets. Feature selection algorithms were run on training data containing 156 preselected exposure variables. Four machine learning algorithms were applied on 46 exposure variables in original and resampled training datasets built using 4 resampling methods. Predictive models were tested on internal validation data (n = 3172) and external validation data (n = 3000) prepared from National Health and Nutrition Examination Survey 2011–2012. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC). Predictors were assessed by odds ratios in logistic models and variable importance in others. The Centers for Disease Control (CDC) prediabetes screening tool was the benchmark to compare model performance. Results Prediabetes prevalence was 23.43%. The CDC prediabetes screening tool produced 64.40% AUROC. Seven optimal (≥ 70% AUROC) models identified 25 predictors including 4 potentially novel associations; 20 by both logistic and other nonlinear/ensemble models and 5 solely by the latter. All optimal models outperformed the CDC prediabetes screening tool (P < 0.05). Discussion Combined use of feature selection and machine learning increased predictive performance outperforming the recommended screening tool. A range of predictors of prediabetes was identified. Conclusion This work demonstrated the value of combining feature selection with machine learning to identify a wide range of predictors that could enhance prediabetes prediction and clinical decision-making.


2020 ◽  
pp. 112972982095473
Author(s):  
Gunilla Welander ◽  
Birgitta Sigvant

Background: All Swedish dialysis units register data on vascular access in the Swedish Renal Registry (SRR). This study assessed external and internal validity of vascular access data in the SRR and its use as a tool in clinical practice. Methods: For external validation, all procedures for placed fistulas, open and endovascular reinterventions registered in the SRR in 2011 to 2017 were cross-matched with data from the Swedish National Patient Registry. A two-stage sampling selected 12/60 dialysis units for internal validation. Data on current vascular access for 10 randomly selected patients at each unit were compared with medical record data. SRR data on placed fistulas from 2017 were cross-checked with data from local surgical units. Registrations of central venous catheters (CVCs) as temporary or permanent were used as a proxy for clinical utilization of the registry and analyzed separately. Results: External validity increased from 74% to 83% during the observation period. In all, 1037 datapoints were used in internal validation, with a 95% match between SRR registrations and medical records. Registrations of CVCs, fistulas, and interventions were reliable, with few missing data or mismatches. Vascular access type initiating hemodialysis was missing or incorrect in either the SRR or medical records for 14/120 patients. Registrations of placed fistulas in 2017 matched in all but four (pre-dialysis stage) of 135 cases. Some 35% of the CVCs validated ( n = 49) at 7/12 units were not categorized as temporary or permanent. Conclusion: The SRR provides a reliable resource on current vascular access care.


2016 ◽  
Vol 16 (1) ◽  
Author(s):  
George O. Agogo ◽  
Hilko van der Voet ◽  
Pieter van ’t Veer ◽  
Pietro Ferrari ◽  
David C. Muller ◽  
...  

2021 ◽  
Vol 11 ◽  
Author(s):  
Mingbin Hu ◽  
Xiancai Li ◽  
Weiguo Gu ◽  
Jinhong Mei ◽  
Dewu Liu ◽  
...  

ObjectivesHerein, we purposed to establish and verify a competing risk nomogram for estimating the risk of cancer-specific death (CSD) in Maxillary Sinus Carcinoma (MSC) patients.MethodsThe data of individuals with MSC used in this study was abstracted from the (SEER) Surveillance, Epidemiology, and End Results data resource as well as from the First Affiliated Hospital of Nanchang University (China). The risk predictors linked to CSD were identified using the CIF (cumulative incidence function) along with the Fine-Gray proportional hazards model on the basis of univariate analysis coupled with multivariate analysis implemented in the R-software. After that, a nomogram was created and verified to estimate the three- and five-year CSD probability.ResultsOverall, 478 individuals with MSC were enrolled from the SEER data resource, with a 3- and 5-year cumulative incidence of CSD after diagnosis of 42.1% and 44.3%, respectively. The Fine-Gray analysis illustrated that age, histological type, N stage, grade, surgery, and T stage were independent predictors linked to CSD in the SEER-training data set (n = 343). These variables were incorporated in the prediction nomogram. The nomogram was well calibrated and it demonstrated a remarkable estimation accuracy in the internal validation data set (n = 135) abstracted from the SEER data resource and the external validation data set (n = 200). The nomograms were well-calibrated and had a good discriminative ability with concordance indexes (c-indexes) of 0.810, 0.761, and 0.755 for the 3- and 5-year prognosis prediction of MSC-specific mortality in the training cohort, internal validation, and external validation cohort, respectively.ConclusionsThe competing risk nomogram constructed herein proved to be an optimal assistant tool for estimating CSD in individuals with MSC.


Author(s):  
Zhiyi Wang ◽  
Jie Weng ◽  
Zhongwang Li ◽  
Ruonan Hou ◽  
Lebin Zhou ◽  
...  

BackgroundThe COVID-19 virus is an emerging virus rapidly spread worldwide This study aimed to establish an effective diagnostic nomogram for suspected COVID-19 pneumonia patients.METHODSWe used the LASSO aggression and multivariable logistic regression methods to explore the predictive factors associated with COVID-19 pneumonia, and established the diagnostic nomogram for COVID-19 pneumonia using multivariable regression. This diagnostic nomogram was assessed by the internal and external validation data set. Further, we plotted decision curves and clinical impact curve to evaluate the clinical usefulness of this diagnostic nomogram.RESULTSThe predictive factors including the epidemiological history, wedge- shaped or fan-shaped lesion parallel to or near the pleura, bilateral lower lobes, ground glass opacities, crazy paving pattern and white blood cell (WBC) count were contained in the nomogram. In the primary cohort, the C-statistic for predicting the probability of the COVID-19 pneumonia was 0.967, even higher than the C-statistic (0.961) in initial viral nucleic acid nomogram which was established using the univariable regression. The C-statistic was 0.848 in external validation cohort. Good calibration curves were observed for the prediction probability in the internal validation and external validation cohort. The nomogram both performed well in terms of discrimination and calibration. Moreover, decision curve and clinical impact curve were also beneficial for COVID- 19 pneumonia patients.CONCLUSIONOur nomogram can be used to predict COVID-19 pneumonia accurately and favourably.


Agriculture ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 29
Author(s):  
Odile Carisse ◽  
Mamadou Lamine Fall

Powdery mildew (Podosphaera aphanis) is a major disease in day-neutral strawberry. Up to 30% yield losses have been observed in Eastern Canada. Currently, management of powdery mildew is mostly based on fungicide applications without consideration of risk. The objective of this study is to use P. aphanis inoculum, host ontogenic resistance, and weather predictors to forecast the risk of strawberry powdery mildew using CART models (classification trees). The data used to build the trees were collected in 2006, 2007, and 2008 at one experimental farm and six commercial farms located in two main strawberry-production areas, while external validation data were collected at the same experimental farm in 2015, 2016, and 2018. Data on proportion of leaf area diseased (PLAD) were grouped into four severity classes (1: PLAD = 0; 2: PLAD > 0 and <5%; 3: >5% and <15%; and 4: PLAD > 15%) for a total of 681 and 136 cases for training and external validation, respectively. From the initial 92 weather variables, 21 were selected following clustering. The tree with the best balance between the number of predictors and highest accuracy was built with: airborne inoculum concentration and number of susceptible leaves on the day of sampling, and mean relative humidity, mean daily number of hours at temperature between 18 and 30 °C, and mean daily number of hours at saturation vapor pressure between 10 and 25 mmHg during the previous 6 days. For training, internal validation, and external validation datasets, the sensitivity, specificity, and accuracy ranged from 0.70 to 0.90, 0.87 to 0.98, and 0.82 to 0.97, respectively. The classification rules to estimate strawberry powdery mildew risk can be easily implemented into disease decision support systems and used to treat only when necessary and thus avoid preventable yield losses and unnecessary treatments.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Huizhen Ye ◽  
Youyuan Chen ◽  
Peiyi Ye ◽  
Yu Zhang ◽  
Xiaoyi Liu ◽  
...  

Abstract Background Chronic kidney disease (CKD) is a common health challenge. There are some risk models predicting CKD adverse outcomes, but seldom focus on the Mongoloid population in East Asian. So, we developed a simple but intuitive nomogram model to predict 3-year CKD adverse outcomes for East Asian patients with CKD. Methods The development and internal validation of prediction models used data from the CKD-ROUTE study in Japan, while the external validation set used data collected at the First People’s Hospital of Foshan in southern China from January 2013 to December 2018. Models were developed using the cox proportional hazards model and nomogram with SPSS and R software. Finally, the model discrimination, calibration and clinical value were tested by R software. Results The development and internal validation data-sets included 797 patients (191 with progression [23.96%]) and 341 patients (89 with progression [26.10%]), respectively, while 297 patients (108 with progression [36.36%]) were included in the external validation data set. The nomogram model was developed with age, eGFR, haemoglobin, blood albumin and dipstick proteinuria to predict three-year adverse-outcome-free probability. The C-statistics of this nomogram were 0.90(95% CI, 0.89–0.92) for the development data set, 0.91(95% CI, 0.89–0.94) for the internal validation data set and 0.83(95% CI, 0.78–0.88) for the external validation data-set. The calibration and decision curve analyses were good in this model. Conclusion This visualized predictive nomogram model could accurately predict CKD three-year adverse outcomes for East Asian patients with CKD, providing an easy-to-use and widely applicable tool for clinical practitioners.


2020 ◽  
Author(s):  
Zhiyi Wang ◽  
Jie Weng ◽  
Zhongwang Li ◽  
Ruonan Hou ◽  
Lebin Zhou ◽  
...  

Abstract Background The COVID-19 virus is an emerging virus associated with severe respiratory illness first detected in December, 2019, and rapidly spread worldwide. The aim of this study was to establish an effective diagnostic nomogram for suspected COVID-19 pneumonia patients. Methods We used the LASSO aggression and multivariable logistic regression methods to explore the predictive factors associated with COVID-19 pneumonia, and established the diagnostic nomogram for COVID-19 pneumonia using multivariable regression. This diagnostic nomogram was assessed by the internal and external validation data set. Further, we plotted decision curves and clinical impact curve to evaluate the clinical usefulness of this diagnostic nomogram. Results The predictive factors including the epidemiological history, wedge-shaped or fan-shaped lesion parallel to or near the pleura, bilateral lower lobes, ground glass opacities, crazy paving pattern and white blood cell (WBC) count were contained in the nomogram. In the primary cohort, the C-statistic for predicting the probability of the COVID-19 pneumonia was 0.967, even higher than the C-statistic (0.961) in initial viral nucleic acid nomogram which was established using the univariable regression. The C-statistic was 0.848 in external validation cohort. Good calibration curves were observed for the prediction probability in the internal validation and external validation cohort. The nomogram both performed well in terms of discrimination and calibration. Moreover, decision curve and clinical impact curve were also beneficial for COVID-19 pneumonia patients. Conclusion Our nomogram can be used to predict COVID-19 pneumonia accurately and favourably.


Epidemiology ◽  
2007 ◽  
Vol 18 (3) ◽  
pp. 321-328 ◽  
Author(s):  
Robert H. Lyles ◽  
Fan Zhang ◽  
Carolyn Drews-Botsch

Sign in / Sign up

Export Citation Format

Share Document