Estimation Bias in Complete-Case Analysis in Crossover Studies with Missing Data

2011 ◽  
Vol 40 (5) ◽  
pp. 812-827 ◽  
Author(s):  
Fang Liu
2020 ◽  
Vol 189 (12) ◽  
pp. 1583-1589
Author(s):  
Rachael K Ross ◽  
Alexander Breskin ◽  
Daniel Westreich

Abstract When estimating causal effects, careful handling of missing data is needed to avoid bias. Complete-case analysis is commonly used in epidemiologic analyses. Previous work has shown that covariate-stratified effect estimates from complete-case analysis are unbiased when missingness is independent of the outcome conditional on the exposure and covariates. Here, we assess the bias of complete-case analysis for adjusted marginal effects when confounding is present under various causal structures of missing data. We show that estimation of the marginal risk difference requires an unbiased estimate of the unconditional joint distribution of confounders and any other covariates required for conditional independence of missingness and outcome. The dependence of missing data on these covariates must be considered to obtain a valid estimate of the covariate distribution. If none of these covariates are effect-measure modifiers on the absolute scale, however, the marginal risk difference will equal the stratified risk differences and the complete-case analysis will be unbiased when the stratified effect estimates are unbiased. Estimation of unbiased marginal effects in complete-case analysis therefore requires close consideration of causal structure and effect-measure modification.


2019 ◽  
Vol 76 (24) ◽  
pp. 2048-2052
Author(s):  
Sujita W Narayan ◽  
Kar Yu Ho ◽  
Jonathan Penm ◽  
Barbara Mintzes ◽  
Ardalan Mirzaei ◽  
...  

Abstract Purpose This study aimed to document the ways by which missing data were handled in clinical pharmacy research to provide an insight into the amount of attention paid to the importance of missing data in this field of research. Methods Our cross-sectional descriptive report evaluated 10 journals affiliated with pharmacy organizations in the United States, Canada, the United Kingdom, and Australia. Randomized controlled trials, cohort studies, case-control studies, and cross-sectional studies published in 2018 were included. The primary outcome measure was the proportion of studies that reported the handling of missing data in their methods or results. Results A total of 178 studies were included in the analysis. Of these, 19.7% (n = 35) mentioned missing data either in their methods (3.4%, n = 6), results (15.2%, n = 27), or in both sections (1.1%, n = 2). Only 4.5% (n = 8) of the studies mentioned how they handled missing data, the most common method being multiple imputation (n = 3), followed by indicator (n = 2), complete case analysis (n = 2), and simple imputation (n = 1). One study using multiple imputation and both studies using an indicator method also combined other strategies to account for missing data. One study only used complete case analysis for subgroup analysis, and the other study only used this method if a specific baseline variable was missing. Conclusions Very few studies in clinical pharmacy literature report any handling of missing data. This has the potential to lead to biased results. We advocate that researchers should report how missing data were handled to increase the transparency of findings and minimize bias.


2019 ◽  
Vol 7 ◽  
pp. 205031211882291 ◽  
Author(s):  
Marianne Riksheim Stavseth ◽  
Thomas Clausen ◽  
Jo Røislien

Objectives: Missing data is a recurrent issue in many fields of medical research, particularly in questionnaires. The aim of this article is to describe and compare six conceptually different multiple imputation methods, alongside the commonly used complete case analysis, and to explore whether the choice of methodology for handling missing data might impact clinical conclusions drawn from a regression model when data are categorical. Methods: In addition to the commonly used complete case analysis, we tested the following six imputation methods: multiple imputation using expectation–maximization with bootstrapping, multiple imputation using multiple correspondence analysis, multiple imputation using latent class analysis, multiple hot deck imputation and multivariate imputation by chained equations with two different model specifications: logistic regression and random forests. The methods are tested on real data from a questionnaire-based study in the Norwegian opioid maintenance treatment programme. Results: All methods performed relatively well when the sample size was large (n = 1000). For a smaller sample size (n = 200), the regression estimates depend heavily on the level of missing. When the amount of missing was ⩾20%, in particular, complete case analysis, hot deck and random forests had biased estimates with too low coverage. Multiple imputation using multiple correspondence analysis had the best performance all over. Conclusion: The choice of missing handling methodology has a significant impact on the clinical interpretation of the accompanying statistical analyses. With missing data, the choice of whether to impute or not, and choice of imputation method, can influence clinical conclusion drawn from a regression model and should therefore be given sufficient consideration.


2021 ◽  
Vol 11 (6) ◽  
pp. 249-262
Author(s):  
Sachit Ganapathy ◽  
Binukumar Bhaskarapillai ◽  
Shailendra Dandge

Background: National Family Health Survey-4 (NFHS-4) revealed a significant improvement in the percentage of complete immunization attained in India. Even though determinants of immunization coverage in India are addressed by some studies, the impact of missing data in such large-scale surveys has not been accounted earlier. The present study aimed to identify the potential factors associated with immunization coverage in India using the complete case analysis (CCA) and multiple imputation by chained equations (MICE) analysis. Materials and methods: We created a dichotomous immunization variable based on the status of all the vaccines given to the child. All relevant variables were summarized using appropriate descriptive statistics along with the proportion of missingness. Further, MICE procedure was performed to impute the missing values after assessing the missing data mechanism. Multiple logistic regression after accounting for the sampling weights were used to report the estimates of odds-ratio (OR) and 95% confidence intervals (CI) for both CCA and MICE analysis and compared. Results: The percentage of children under five years of age who had total immunization was 69%. Further, we observed that female sex and rural habitation had higher odds of getting immunized in both CCA and MICE. Moreover, wealth index, number of antenatal visits, checkup after delivery and place of birth played an important role in the immunization coverage. Conclusion: MICE provided more precise risk estimates on potential factors associated with vaccination coverage compared to CCA, even if the major findings did not alter due to large sample size. Key words: Immunization, Health surveys, missing data, Logistic regression, complete case analysis, MICE.


2019 ◽  
Vol 80 (4) ◽  
pp. 756-774
Author(s):  
David Goretzko ◽  
Christian Heumann ◽  
Markus Bühner

Exploratory factor analysis is a statistical method commonly used in psychological research to investigate latent variables and to develop questionnaires. Although such self-report questionnaires are prone to missing values, there is not much literature on this topic with regard to exploratory factor analysis—and especially the process of factor retention. Determining the correct number of factors is crucial for the analysis, yet little is known about how to deal with missingness in this process. Therefore, in a simulation study, six missing data methods (an expectation–maximization algorithm, predictive mean matching, Bayesian regression, random forest imputation, complete case analysis, and pairwise complete observations) were compared with respect to the accuracy of the parallel analysis chosen as retention criterion. Data were simulated for correlated and uncorrelated factor structures with two, four, or six factors; 12, 24, or 48 variables; 250, 500, or 1,000 observations and three different missing data mechanisms. Two different procedures combining multiply imputed data sets were tested. The results showed that no missing data method was always superior, yet random forest imputation performed best for the majority of conditions—in particular when parallel analysis was applied to the averaged correlation matrix rather than to each imputed data set separately. Complete case analysis and pairwise complete observations were often inferior to multiple imputation.


2021 ◽  
Author(s):  
TINASHE MHIKE ◽  
Jim Todd ◽  
Mark Urassa ◽  
Neema Mosha

Abstract Background Population surveys and demographic studies are the gold standard for estimating HIV prevalence. However, non-response in these surveys is of major concern especially if it is not random and complete case analysis becomes an inappropriate method to analyse the data. Therefore, a comprehensive analysis that will account for the missing data must be used to obtain unbiased HIV prevalence estimates. MethodsSerological samples were collected from participants who were resident in a Demographic Surveillance System (DSS) in Kisesa, Tanzania. HIV prevalence was estimated using three methods. Firstly, using the Complete case analysis (CCA), assuming data were Missing Completely at Random (MCAR). The other two methods, multiple imputations (MI) and inverse probability weighting (IPW), assumed that non-response was missing at random (MAR). For MI, a logistic regression model adjusting for age, sex, residence, and marital status was used to impute 20 datasets to re-estimate the HIV prevalence. Propensity for participating in the sero-survey and being tested for HIV given age, sex, residence, and marital status were generated using logistic regression models. Using the propensity scores, inverse probability weights were derived for participants who were tested for HIV.ResultsThe overall CCA HIV prevalence estimate was 6.6% (95% CI: 6.0-7.2), with 5.4% (95% CI: 4.6-6.3) in males and 7.3% (95% CI: 6.6-8.1) in females. Using MI, the overall HIV prevalence was 6.8% (95% CI: 6.2-7.5), 6.2% (95% CI: 5.1-7.3) in males and 7.4% (95% CI: 6.6-8.2) in females. Using IPW the overall HIV prevalence was 6.7% (95% CI: 6.1-7.4), with 5.5% (95% CI: 4.7-6.5) in males and 7.7% (95% CI: 7.0 - 8.6) in females. HIV prevalence differed significantly between age groups (p<0.001), with the highest estimate in males aged 35-39 and females aged 40-44, and lowest in both males and females aged 15-19 years.DiscussionThe results showed that both MI and IPW are reliable methods for estimating HIV prevalence in the presence of missing data. MI is superior to CCA and the IPW approaches as it had smaller standard errors and narrower 95% confidence intervals. Therefore, we recommend use of MI in estimating HIV prevalence to address the problem of varied types of missing data. However, further research is needed to determine the bias in estimates from MI and IPW.ConclusionsComplete case analysis underestimates HIV prevalence compared to methods that adjust for missing data. The best method to adjust for missing data in population surveys is through the use of multiple imputations.


2020 ◽  
Vol 4 (2) ◽  
pp. 9-12
Author(s):  
Dler H. Kadir

Increasing the response rate and minimizing non-response rates represent the primary challenges to researchers in performing longitudinal and cohort research. This is most obvious in the area of paediatric medicine. When there are missing data, complete case analysis makes findings biased. Inverse Probability Weighting (IPW) is one of many available approaches for reducing the bias using a complete case analysis. Here, a complete case is weighted by probability inverse of complete cases. The data of this work is collected from the neonatal intensive care unit at Erbil maternity hospital for the years 2012 to 2017. In total, 570 babies (288 male and 282 females) were born very preterm. The aim of this paper is to use inverse probability weighting on the Bayesian logistic model developmental outcome. The Mental Development Index (MDI) approach is used for assessing the cognitive development of those born very preterm. Almost half of the information for the babies was missing, meaning that we do not know whether they have cognitive development issues or they have not. We obtained greater precision in results and standard deviation of parameter estimates which are less in the posterior weighted model in comparison with frequent analysis.


Author(s):  
Tra My Pham ◽  
Irene Petersen ◽  
James Carpenter ◽  
Tim Morris

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.


2012 ◽  
Vol 40 (6) ◽  
pp. 3031-3049 ◽  
Author(s):  
Hira L. Koul ◽  
Ursula U. Müller ◽  
Anton Schick

Sign in / Sign up

Export Citation Format

Share Document