Imputation of Missing Covariate Data Prior to Propensity Score Analysis: A Tutorial and Evaluation of the Robustness of Practical Approaches

Background: Propensity score analysis (PSA) is a popular method to remove selection bias due to covariates in quasi-experimental designs, but it requires handling of missing data on covariates before propensity scores are estimated. Multiple imputation (MI) and single imputation (SI) are approaches to handle missing data in PSA. Objectives: The objectives of this study are to review MI-within, MI-across, and SI approaches to handle missing data on covariates prior to PSA, investigate the robustness of MI-across and SI with a Monte Carlo simulation study, and demonstrate the analysis of missing data and PSA with a step-by-step illustrative example. Research design: The Monte Carlo simulation study compared strategies to impute missing data in continuous and categorical covariates for estimation of propensity scores. Manipulated conditions included sample size, the number of covariates, the size of the treatment effect, missing data mechanism, and percentage of missing data. Imputation strategies included MI-across and SI by joint modeling or multivariate imputation by chained equations (MICE). Results: The results indicated that the MI-across method performed well, and SI also performed adequately with smaller percentages of missing data. The illustrative example demonstrated MI and SI, propensity score estimation, calculation of propensity score weights, covariate balance evaluation, estimation of the average treatment effect on the treated, and sensitivity analysis using data from the National Longitudinal Survey of Youth.

Download Full-text

A comparison of missing data methods for hypothesis tests of the treatment effect in substance abuse clinical trials: a Monte-Carlo simulation study

Substance Abuse Treatment Prevention and Policy ◽

10.1186/1747-597x-3-13 ◽

2008 ◽

Vol 3 (1) ◽

Cited By ~ 5

Author(s):

Sarra L Hedden ◽

Robert F Woolson ◽

Robert J Malcolm

Keyword(s):

Substance Abuse ◽

Monte Carlo Simulation ◽

Clinical Trials ◽

Monte Carlo ◽

Missing Data ◽

Simulation Study ◽

Treatment Effect ◽

Hypothesis Tests ◽

Monte Carlo Simulation Study

Download Full-text

DA4 Finding Treatment Effects within Subgroups when Using the propensity Score to Control for Selection Bias: A Monte Carlo Simulation Study

Value in Health ◽

10.1016/j.jval.2011.08.1735 ◽

2011 ◽

Vol 14 (7) ◽

pp. A235 ◽

Cited By ~ 1

Author(s):

H. van Eeren ◽

M.D. Spreeuwenberg ◽

J.G. van Manen ◽

M. de Rooij ◽

T. Stijnen ◽

...

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Propensity Score ◽

Selection Bias ◽

Simulation Study ◽

Treatment Effects ◽

Monte Carlo Simulation Study

Download Full-text

Propensity Score Analysis with Partially Observed Baseline Covariates: A Practical Comparison of Methods for Handling Missing Data

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18136694 ◽

2021 ◽

Vol 18 (13) ◽

pp. 6694

Author(s):

Daniele Bottigliengo ◽

Giulia Lorenzoni ◽

Honoria Ocagli ◽

Matteo Martinato ◽

Paola Berchialla ◽

...

Keyword(s):

Missing Data ◽

Propensity Score ◽

Missing Values ◽

Propensity Scores ◽

Propensity Score Analysis ◽

Expectation Maximization Algorithm ◽

Missing At Random ◽

Sensitivity Analyses ◽

Score Analysis ◽

Baseline Covariates

(1) Background: Propensity score methods gained popularity in non-interventional clinical studies. As it may often occur in observational datasets, some values in baseline covariates are missing for some patients. The present study aims to compare the performances of popular statistical methods to deal with missing data in propensity score analysis. (2) Methods: Methods that account for missing data during the estimation process and methods based on the imputation of missing values, such as multiple imputations, were considered. The methods were applied on the dataset of an ongoing prospective registry for the treatment of unprotected left main coronary artery disease. The performances were assessed in terms of the overall balance of baseline covariates. (3) Results: Methods that explicitly deal with missing data were superior to classical complete case analysis. The best balance was observed when propensity scores were estimated with a method that accounts for missing data using a stochastic approximation of the expectation-maximization algorithm. (4) Conclusions: If missing at random mechanism is plausible, methods that use missing data to estimate propensity score or impute them should be preferred. Sensitivity analyses are encouraged to evaluate the implications methods used to handle missing data and estimate propensity score.

Download Full-text

Comparison of the ability of double-robust estimators to correct bias in propensity score matching analysis. A Monte Carlo simulation study

Pharmacoepidemiology and Drug Safety ◽

10.1002/pds.4325 ◽

2017 ◽

Vol 26 (12) ◽

pp. 1513-1519 ◽

Cited By ~ 4

Author(s):

Tri-Long Nguyen ◽

Gary S. Collins ◽

Jessica Spence ◽

Philip J. Devereaux ◽

Jean-Pierre Daurès ◽

...

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Propensity Score ◽

Propensity Score Matching ◽

Simulation Study ◽

Robust Estimators ◽

Monte Carlo Simulation Study ◽

Propensity Score Matching Analysis ◽

Double Robust

Download Full-text

Propensity score analysis with partially observed covariates: How should multiple imputation be used?

Statistical Methods in Medical Research ◽

10.1177/0962280217713032 ◽

2017 ◽

Vol 28 (1) ◽

pp. 3-19 ◽

Cited By ~ 39

Author(s):

Clémence Leyrat ◽

Shaun R Seaman ◽

Ian R White ◽

Ian Douglas ◽

Liam Smeeth ◽

...

Keyword(s):

Propensity Score ◽

Multiple Imputation ◽

Treatment Effect ◽

Propensity Scores ◽

Propensity Score Analysis ◽

Imputation Model ◽

Complete Case ◽

Inverse Probability ◽

Score Analysis ◽

Partially Observed

Inverse probability of treatment weighting is a popular propensity score-based approach to estimate marginal treatment effects in observational studies at risk of confounding bias. A major issue when estimating the propensity score is the presence of partially observed covariates. Multiple imputation is a natural approach to handle missing data on covariates: covariates are imputed and a propensity score analysis is performed in each imputed dataset to estimate the treatment effect. The treatment effect estimates from each imputed dataset are then combined to obtain an overall estimate. We call this method MIte. However, an alternative approach has been proposed, in which the propensity scores are combined across the imputed datasets (MIps). Therefore, there are remaining uncertainties about how to implement multiple imputation for propensity score analysis: (a) should we apply Rubin’s rules to the inverse probability of treatment weighting treatment effect estimates or to the propensity score estimates themselves? (b) does the outcome have to be included in the imputation model? (c) how should we estimate the variance of the inverse probability of treatment weighting estimator after multiple imputation? We studied the consistency and balancing properties of the MIte and MIps estimators and performed a simulation study to empirically assess their performance for the analysis of a binary outcome. We also compared the performance of these methods to complete case analysis and the missingness pattern approach, which uses a different propensity score model for each pattern of missingness, and a third multiple imputation approach in which the propensity score parameters are combined rather than the propensity scores themselves (MIpar). Under a missing at random mechanism, complete case and missingness pattern analyses were biased in most cases for estimating the marginal treatment effect, whereas multiple imputation approaches were approximately unbiased as long as the outcome was included in the imputation model. Only MIte was unbiased in all the studied scenarios and Rubin’s rules provided good variance estimates for MIte. The propensity score estimated in the MIte approach showed good balancing properties. In conclusion, when using multiple imputation in the inverse probability of treatment weighting context, MIte with the outcome included in the imputation model is the preferred approach.

Download Full-text