scholarly journals Comparison of Methods for Handling Covariate Missingness in Propensity Score Estimation with a Binary Exposure

2020 ◽  
Author(s):  
Donna Coffman ◽  
Jiangxiu Zhou ◽  
Xizhen Cai

Abstract Background: Causal effect estimation with observational data is subject to bias due to confounding, which is often controlled for using propensity scores. One unresolved issue in propensity score estimation is how to handle missing values in covariates.Method: Several approaches have been proposed for handling covariate missingness, including multiple imputation (MI), multiple imputation with missingness pattern (MIMP), and treatment mean imputation. However, there are other potentially useful approaches that have not been evaluated, including single imputation (SI) + prediction error (PE), SI+PE + parameter uncertainty (PU), and Generalized Boosted Modeling (GBM), which is a nonparametric approach for estimating propensity scores in which missing values are automatically handled in the estimation using a surrogate split method. To evaluate the performance of these approaches, a simulation study was conducted. Results: Results suggested that SI+PE, SI+PE+PU, MI, and MIMP perform almost equally well and better than treatment mean imputation and GBM in terms of bias; however, MI and MIMP account for the additional uncertainty of imputing the missingness. Conclusions: Applying GBM to the incomplete data and relying on the surrogate split approach resulted in substantial bias. Imputation prior to implementing GBM is recommended.

2019 ◽  
Author(s):  
Donna Coffman ◽  
Jiangxiu Zhou ◽  
Xizhen Cai

Abstract Background Causal effect estimation with observational data is subject to bias due to confounding, which is often controlled for using propensity scores. One unresolved issue in propensity score estimation is how to handle missing values in covariates.Method Several approaches have been proposed for handling covariate missingness, including multiple imputation (MI), multiple imputation with missingness pattern (MIMP), and treatment mean imputation. However, there are other potentially useful approaches that have not been evaluated, including single imputation (SI) + prediction error (PE), SI+PE + parameter uncertainty (PU), and Generalized Boosted Modeling (GBM), which is a nonparametric approach for estimating propensity scores in which missing values are automatically handled in the estimation using a surrogate split method. To evaluate the performance of these approaches, a simulation study was conducted.Results Results suggested that SI+PE, SI+PE+PU, MI, and MIMP perform almost equally well and better than treatment mean imputation and GBM in terms of bias; however, MI and MIMP account for the additional uncertainty of imputing the missingness.Conclusions Applying GBM to the incomplete data and relying on the surrogate split approach resulted in substantial bias. Imputation prior to implementing GBM is recommended.


2020 ◽  
Vol 10 (1) ◽  
pp. 40
Author(s):  
Tomoshige Nakamura ◽  
Mihoko Minami

In observational studies, the existence of confounding variables should be attended to, and propensity score weighting methods are often used to eliminate their e ects. Although many causal estimators have been proposed based on propensity scores, these estimators generally assume that the propensity scores are properly estimated. However, researchers have found that even a slight misspecification of the propensity score model can result in a bias of estimated treatment effects. Model misspecification problems may occur in practice, and hence, using a robust estimator for causal effect is recommended. One such estimator is a subclassification estimator. Wang, Zhang, Richardson, & Zhou (2020) presented the conditions necessary for subclassification estimators to have $\sqrt{N}$-consistency and to be asymptotically well-defined and suggested an idea how to construct subclasses.


2011 ◽  
Vol 21 (3) ◽  
pp. 273-293 ◽  
Author(s):  
Elizabeth Williamson ◽  
Ruth Morley ◽  
Alan Lucas ◽  
James Carpenter

Estimation of the effect of a binary exposure on an outcome in the presence of confounding is often carried out via outcome regression modelling. An alternative approach is to use propensity score methodology. The propensity score is the conditional probability of receiving the exposure given the observed covariates and can be used, under the assumption of no unmeasured confounders, to estimate the causal effect of the exposure. In this article, we provide a non-technical and intuitive discussion of propensity score methodology, motivating the use of the propensity score approach by analogy with randomised studies, and describe the four main ways in which this methodology can be implemented. We carefully describe the population parameters being estimated — an issue that is frequently overlooked in the medical literature. We illustrate these four methods using data from a study investigating the association between maternal choice to provide breast milk and the infant's subsequent neurodevelopment. We outline useful extensions of propensity score methodology and discuss directions for future research. Propensity score methods remain controversial and there is no consensus as to when, if ever, they should be used in place of traditional outcome regression models. We therefore end with a discussion of the relative advantages and disadvantages of each.


2021 ◽  
Author(s):  
Shuo Feng ◽  
Celestin Hategeka ◽  
Karen Ann Grépin

Abstract Background : Poor data quality is limiting the greater use of data sourced from routine health information systems (RHIS), especially in low and middle-income countries. An important part of this issue comes from missing values, where health facilities, for a variety of reasons, miss their reports into the central system. Methods : Using data from the Health Management Information System (HMIS) and the advent of COVID-19 pandemic in the Democratic Republic of the Congo (DRC) as an illustrative case study, we implemented six commonly-used imputation methods using the DRC’s HMIS datasets and evaluated their performance through various statistical techniques, i.e., simple linear regression, segmented regression which is widely used in interrupted time series studies, and parametric comparisons through t-tests and non-parametric comparisons through Wilcoxon Rank-Sum tests. We also examined the performance of these six imputation methods under different missing mechanisms and tested their stability to changes in the data. Results : For regression analyses, there was no substantial difference found in the results generated from all methods except mean imputation and exclusion & interpolation when the RHIS dataset contained less than 20% missing values. However, as the missing proportion grew, machine learning methods such as missForest and k -NN started to produce biased estimates, and they were found to be also lack of robustness to minimal changes in data or to consecutive missingness. On the other hand, multiple imputation generated the overall most unbiased estimates and was the most robust to all changes in data. For comparing group means through t-tests, the results from mean imputation and exclusion & interpolation disagreed with the true inference obtained using the complete data, suggesting that these two methods would not only lead to biased regression estimates but also generate unreliable t-test results. Conclusions : We recommend the use of multiple imputation in addressing missing values in RHIS datasets. In cases necessary computing resources are unavailable to multiple imputation, one may consider seasonal decomposition as the next best method. Mean imputation and exclusion & interpolation, however, always produced biased and misleading results in the subsequent analyses, and thus their use in the handling of missing values should be discouraged. Keywords : Missing Data; Routine Health Information Systems (RHIS); Health Management Information System (HMIS); Health Services Research; Low and middle-income countries (LMICs); Multiple imputation


2015 ◽  
Vol 7 (2) ◽  
pp. 90
Author(s):  
Priyantha Wijayatunga

Propensity scores are often used for stratification of treatment and control groups of subjects in observational data to remove confounding bias when estimating of  causal effect of the treatment on an outcome in so-called potential outcome causal modeling framework. In this article, we try to get some insights into basic behavior of  the propensity scores in a probabilistic sense. We do a simple analysis of their usage confining to the case of discrete confounding covariates and outcomes. While making clear about behavior of the propensity score our analysis shows how the so-called prognostic score can be derived simultaneously. However the prognostic score is derived in a limited sense in the current literature whereas our derivation is more general and shows all possibilities of having the score. And we call it outcome score. We argue that application of both the propensity score and the outcome score is the most efficient way for  reduction of dimension in the confounding covariates as opposed to current belief that the propensity score alone is the most efficient way.


2017 ◽  
Vol 28 (1) ◽  
pp. 84-101 ◽  
Author(s):  
Yuying Xie ◽  
Yeying Zhu ◽  
Cecilia A Cotton ◽  
Pan Wu

Many approaches, including traditional parametric modeling and machine learning techniques, have been proposed to estimate propensity scores. This paper describes a new model averaging approach to propensity score estimation in which parametric and nonparametric estimates are combined to achieve covariate balance. Simulation studies are conducted across different scenarios varying in the degree of interactions and nonlinearities in the treatment model. The results show that, based on inverse probability weighting, the proposed propensity score estimator produces less bias and smaller standard errors than existing approaches. They also show that a model averaging approach with the objective of minimizing the average Kolmogorov–Smirnov statistic leads to the best performing IPW estimator. The proposed approach is also applied to a real data set in evaluating the causal effect of formula or mixed feeding versus exclusive breastfeeding on a child’s body mass index Z-score at age 4. The data analysis shows that formula or mixed feeding is more likely to lead to obesity at age 4, compared to exclusive breastfeeding.


Author(s):  
Daniele Bottigliengo ◽  
Giulia Lorenzoni ◽  
Honoria Ocagli ◽  
Matteo Martinato ◽  
Paola Berchialla ◽  
...  

(1) Background: Propensity score methods gained popularity in non-interventional clinical studies. As it may often occur in observational datasets, some values in baseline covariates are missing for some patients. The present study aims to compare the performances of popular statistical methods to deal with missing data in propensity score analysis. (2) Methods: Methods that account for missing data during the estimation process and methods based on the imputation of missing values, such as multiple imputations, were considered. The methods were applied on the dataset of an ongoing prospective registry for the treatment of unprotected left main coronary artery disease. The performances were assessed in terms of the overall balance of baseline covariates. (3) Results: Methods that explicitly deal with missing data were superior to classical complete case analysis. The best balance was observed when propensity scores were estimated with a method that accounts for missing data using a stochastic approximation of the expectation-maximization algorithm. (4) Conclusions: If missing at random mechanism is plausible, methods that use missing data to estimate propensity score or impute them should be preferred. Sensitivity analyses are encouraged to evaluate the implications methods used to handle missing data and estimate propensity score.


2020 ◽  
Author(s):  
Matthew Sperrin ◽  
Glen P. Martin

Abstract Background : Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders. Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms. Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. In particular the approach: 1) does not introduce bias in missing (completely) at random scenarios; 2)reduces bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself; and 3) may reduce or increase bias when unmeasured confounding is present. Conclusion : In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.


2020 ◽  
Author(s):  
Matthew Sperrin ◽  
Glen P. Martin

Abstract Background Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias when used inappropriately, its use in conjunction with other imputation approaches may unlock the potential value of missingness to reduce bias and improve prediction.Methods We conducted a simulation study to determine when the use of a missing indicator, combined with an imputation approach, such as multiple imputation, would lead to improved model performance, in terms of minimising bias for causal effect estimation, and improving predictive accuracy, under a range of scenarios with unmeasured variables. We use directed acyclic graphs and structural models to elucidate causal structures of interest. We consider a variety of missingness mechanisms, then handle these using complete case analysis, unconditional mean imputation, regression imputation and multiple imputation. In each case we evaluate supplementing these approaches with missing indicator terms. Results For estimating causal effects, we find that multiple imputation combined with a missing indicator gives minimal bias in most scenarios. For prediction, we find that regression imputation combined with a missing indicator minimises mean squared error.Conclusion In the presence of missing data, careful use of missing indicators, combined with appropriate imputation, can improve both causal estimation and prediction accuracy.


Sign in / Sign up

Export Citation Format

Share Document