The performance of multiple imputation for missing covariates relative to complete case analysis

Background: With the increase in the use of secondary data in epidemiological studies, the inquiry of how to manage missing data has become more relevant. Our study applied imputation techniques on traumatic spinal cord injuries data; a medical problem where data is generally sporadic. Traumatic spinal cord injuries due to blunt force cause widespread physiological impairments, medical and non-medical problems. The effects of spinal cord injuries are a burden not only to the victims but to their families and to the entire health system of a country. This study also evaluated the causes of traumatic spinal cord injuries in patients admitted to the University Teaching Hospital and factors associated with clinical complications in these patients. Methods: The study used data from medical records of patients who were admitted to the University Teaching Hospital in Lusaka, Zambia. Patients presenting with traumatic spinal cord injuries between 1st January 2013 and 31st December 2017 were part of the study. The data was first analysed using complete case analysis, then multiple imputation techniques were applied, to account for the missing data. Thereafter, both descriptive and inferential analyses were performed on the imputed data. Results: During the study period of interest, a total of 176 patients were identified as having suffered from spinal cord injuries. Road traffic accidents accounted for 56% (101) of the injuries. Clinical complications suffered by these patients included paralysis, death, bowel and bladder dysfunction and pressure sores among other things. Eighty-eight (50%) patients had paralysis. Patients with cervical spine injuries compared to patients with thoracic spine injuries had 87% reduced odds of suffering from clinical complications (OR=0.13, 95% CI{0.08, 0.22}p<.0001). Being paraplegic at discharge increased the odds of developing a clinical complication by 8.1 times (OR=8.01, 95% CI{2.74, 23.99}, p<.001). Under-going an operation increased the odds of having a clinical complication (OR=3.71, 95% CI{=1.99, 6.88}, p<.0001). A patient who presented with Frankel Grade C or E had a 96% reduction in the odds of having a clinical complication (OR=.04, 95% CI{0.02, 0.09} and {0.02, 0.12} respectively, p<.0001) compared to a patient who presented with Frankel Grade A. Conclusion: A comparison of estimates obtained from complete case analysis and from multiple imputations revealed that when there are a lot of missing values, estimates obtained from complete case analysis are unreliable and lack power. Efforts should be made to use ideas to deal with missing values such as multiple imputation techniques. The most common cause of traumatic spinal cord injuries was road traffic accidents. Findings suggest that paralysis had the greatest negative effect on clinical complications. When the category of Frankel Grade increased from A-E, the less likely a patient was likely to succumb to clinical complications. No evidence of an association was found between age, sex and developing a clinical complication.

Download Full-text

Missing data reporting in clinical pharmacy research

American Journal of Health-System Pharmacy ◽

10.1093/ajhp/zxz245 ◽

2019 ◽

Vol 76 (24) ◽

pp. 2048-2052

Author(s):

Sujita W Narayan ◽

Kar Yu Ho ◽

Jonathan Penm ◽

Barbara Mintzes ◽

Ardalan Mirzaei ◽

...

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Clinical Pharmacy ◽

Case Analysis ◽

The United States ◽

Complete Case Analysis ◽

Case Control Studies ◽

Complete Case ◽

Cross Sectional ◽

Indicator Method

Abstract Purpose This study aimed to document the ways by which missing data were handled in clinical pharmacy research to provide an insight into the amount of attention paid to the importance of missing data in this field of research. Methods Our cross-sectional descriptive report evaluated 10 journals affiliated with pharmacy organizations in the United States, Canada, the United Kingdom, and Australia. Randomized controlled trials, cohort studies, case-control studies, and cross-sectional studies published in 2018 were included. The primary outcome measure was the proportion of studies that reported the handling of missing data in their methods or results. Results A total of 178 studies were included in the analysis. Of these, 19.7% (n = 35) mentioned missing data either in their methods (3.4%, n = 6), results (15.2%, n = 27), or in both sections (1.1%, n = 2). Only 4.5% (n = 8) of the studies mentioned how they handled missing data, the most common method being multiple imputation (n = 3), followed by indicator (n = 2), complete case analysis (n = 2), and simple imputation (n = 1). One study using multiple imputation and both studies using an indicator method also combined other strategies to account for missing data. One study only used complete case analysis for subgroup analysis, and the other study only used this method if a specific baseline variable was missing. Conclusions Very few studies in clinical pharmacy literature report any handling of missing data. This has the potential to lead to biased results. We advocate that researchers should report how missing data were handled to increase the transparency of findings and minimize bias.

Download Full-text

How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data

SAGE Open Medicine ◽

10.1177/2050312118822912 ◽

2019 ◽

Vol 7 ◽

pp. 205031211882291 ◽

Cited By ~ 7

Author(s):

Marianne Riksheim Stavseth ◽

Thomas Clausen ◽

Jo Røislien

Keyword(s):

Missing Data ◽

Regression Model ◽

Sample Size ◽

Multiple Imputation ◽

Random Forests ◽

Multiple Correspondence Analysis ◽

Case Analysis ◽

Complete Case Analysis ◽

Complete Case ◽

Imputation Methods

Objectives: Missing data is a recurrent issue in many fields of medical research, particularly in questionnaires. The aim of this article is to describe and compare six conceptually different multiple imputation methods, alongside the commonly used complete case analysis, and to explore whether the choice of methodology for handling missing data might impact clinical conclusions drawn from a regression model when data are categorical. Methods: In addition to the commonly used complete case analysis, we tested the following six imputation methods: multiple imputation using expectation–maximization with bootstrapping, multiple imputation using multiple correspondence analysis, multiple imputation using latent class analysis, multiple hot deck imputation and multivariate imputation by chained equations with two different model specifications: logistic regression and random forests. The methods are tested on real data from a questionnaire-based study in the Norwegian opioid maintenance treatment programme. Results: All methods performed relatively well when the sample size was large (n = 1000). For a smaller sample size (n = 200), the regression estimates depend heavily on the level of missing. When the amount of missing was ⩾20%, in particular, complete case analysis, hot deck and random forests had biased estimates with too low coverage. Multiple imputation using multiple correspondence analysis had the best performance all over. Conclusion: The choice of missing handling methodology has a significant impact on the clinical interpretation of the accompanying statistical analyses. With missing data, the choice of whether to impute or not, and choice of imputation method, can influence clinical conclusion drawn from a regression model and should therefore be given sufficient consideration.

Download Full-text

Is using multiple imputation better than complete case analysis for estimating a prevalence (risk) difference in randomized controlled trials when binary outcome observations are missing?

Trials ◽

10.1186/s13063-016-1473-3 ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 15

Author(s):

Mavuto Mukaka ◽

Sarah A. White ◽

Dianne J. Terlouw ◽

Victor Mwapasa ◽

Linda Kalilani-Phiri ◽

...

Keyword(s):

Randomized Controlled Trials ◽

Multiple Imputation ◽

Case Analysis ◽

Risk Difference ◽

Complete Case Analysis ◽

Controlled Trials ◽

Binary Outcome ◽

Complete Case ◽

Randomized Controlled ◽

Better Than

Download Full-text

Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values

Statistics in Medicine ◽

10.1002/sim.3944 ◽

2010 ◽

Vol 29 (28) ◽

pp. 2920-2931 ◽

Cited By ~ 293

Author(s):

Ian R. White ◽

John B. Carlin

Keyword(s):

Multiple Imputation ◽

Case Analysis ◽

Complete Case Analysis ◽

Complete Case

Download Full-text

A Numerical Comparison of Multiple Imputation Method with Complete Case Analysis When Missing on the Response variable Depends on an Intermediate Variable

Japanese Journal of Biometrics ◽

10.5691/jjb.23.67 ◽

2002 ◽

Vol 23 (2) ◽

pp. 67-80

Author(s):

Sachio Ogawa ◽

Yutaka Matsuyama ◽

Tosiya Sato

Keyword(s):

Multiple Imputation ◽

Case Analysis ◽

Imputation Method ◽

Complete Case Analysis ◽

Numerical Comparison ◽

Complete Case ◽

Response Variable ◽

Intermediate Variable ◽

Multiple Imputation Method

Download Full-text

Weighted Optimization with Thresholding for Complete-Case Analysis

Statistical Learning of Complex Data - Studies in Classification, Data Analysis, and Knowledge Organization ◽

10.1007/978-3-030-21140-0_15 ◽

2019 ◽

pp. 143-151

Author(s):

Graziano Vernizzi ◽

Miki Nakai

Keyword(s):

Case Analysis ◽

Complete Case Analysis ◽

Complete Case ◽

Weighted Optimization

Download Full-text

Likelihood Approach for Bayesian Logistic Weighted Model

Cihan University-Erbil Scientific Journal ◽

10.24086/cuesj.v4n2y2020.pp9-12 ◽

2020 ◽

Vol 4 (2) ◽

pp. 9-12

Author(s):

Dler H. Kadir

Keyword(s):

Cognitive Development ◽

Case Analysis ◽

Inverse Probability Weighting ◽

Complete Case Analysis ◽

Parameter Estimates ◽

Probability Weighting ◽

Complete Case ◽

Inverse Probability ◽

Very Preterm ◽

Weighted Model

Increasing the response rate and minimizing non-response rates represent the primary challenges to researchers in performing longitudinal and cohort research. This is most obvious in the area of paediatric medicine. When there are missing data, complete case analysis makes findings biased. Inverse Probability Weighting (IPW) is one of many available approaches for reducing the bias using a complete case analysis. Here, a complete case is weighted by probability inverse of complete cases. The data of this work is collected from the neonatal intensive care unit at Erbil maternity hospital for the years 2012 to 2017. In total, 570 babies (288 male and 282 females) were born very preterm. The aim of this paper is to use inverse probability weighting on the Bayesian logistic model developmental outcome. The Mental Development Index (MDI) approach is used for assessing the cognitive development of those born very preterm. Almost half of the information for the babies was missing, meaning that we do not know whether they have cognitive development issues or they have not. We obtained greater precision in results and standard deviation of parameter estimates which are less in the posterior weighted model in comparison with frequent analysis.

Download Full-text

Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.54 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Tra My Pham ◽

Irene Petersen ◽

James Carpenter ◽

Tim Morris

Keyword(s):

Primary Care ◽

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Case Analysis ◽

Missing At Random ◽

Complete Case ◽

Missing Not At Random ◽

Health Records ◽

Ethnicity Data

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.

Download Full-text