Imputation of Missing Covariate Data Prior to Propensity Score Analysis: A Tutorial and Evaluation of the Robustness of Practical Approaches

2021 ◽  
pp. 0193841X2110202
Author(s):  
Walter L. Leite ◽  
Burak Aydin ◽  
Dee D. Cetin-Berber

Background: Propensity score analysis (PSA) is a popular method to remove selection bias due to covariates in quasi-experimental designs, but it requires handling of missing data on covariates before propensity scores are estimated. Multiple imputation (MI) and single imputation (SI) are approaches to handle missing data in PSA. Objectives: The objectives of this study are to review MI-within, MI-across, and SI approaches to handle missing data on covariates prior to PSA, investigate the robustness of MI-across and SI with a Monte Carlo simulation study, and demonstrate the analysis of missing data and PSA with a step-by-step illustrative example. Research design: The Monte Carlo simulation study compared strategies to impute missing data in continuous and categorical covariates for estimation of propensity scores. Manipulated conditions included sample size, the number of covariates, the size of the treatment effect, missing data mechanism, and percentage of missing data. Imputation strategies included MI-across and SI by joint modeling or multivariate imputation by chained equations (MICE). Results: The results indicated that the MI-across method performed well, and SI also performed adequately with smaller percentages of missing data. The illustrative example demonstrated MI and SI, propensity score estimation, calculation of propensity score weights, covariate balance evaluation, estimation of the average treatment effect on the treated, and sensitivity analysis using data from the National Longitudinal Survey of Youth.

Author(s):  
Daniele Bottigliengo ◽  
Giulia Lorenzoni ◽  
Honoria Ocagli ◽  
Matteo Martinato ◽  
Paola Berchialla ◽  
...  

(1) Background: Propensity score methods gained popularity in non-interventional clinical studies. As it may often occur in observational datasets, some values in baseline covariates are missing for some patients. The present study aims to compare the performances of popular statistical methods to deal with missing data in propensity score analysis. (2) Methods: Methods that account for missing data during the estimation process and methods based on the imputation of missing values, such as multiple imputations, were considered. The methods were applied on the dataset of an ongoing prospective registry for the treatment of unprotected left main coronary artery disease. The performances were assessed in terms of the overall balance of baseline covariates. (3) Results: Methods that explicitly deal with missing data were superior to classical complete case analysis. The best balance was observed when propensity scores were estimated with a method that accounts for missing data using a stochastic approximation of the expectation-maximization algorithm. (4) Conclusions: If missing at random mechanism is plausible, methods that use missing data to estimate propensity score or impute them should be preferred. Sensitivity analyses are encouraged to evaluate the implications methods used to handle missing data and estimate propensity score.


2017 ◽  
Vol 28 (1) ◽  
pp. 3-19 ◽  
Author(s):  
Clémence Leyrat ◽  
Shaun R Seaman ◽  
Ian R White ◽  
Ian Douglas ◽  
Liam Smeeth ◽  
...  

Inverse probability of treatment weighting is a popular propensity score-based approach to estimate marginal treatment effects in observational studies at risk of confounding bias. A major issue when estimating the propensity score is the presence of partially observed covariates. Multiple imputation is a natural approach to handle missing data on covariates: covariates are imputed and a propensity score analysis is performed in each imputed dataset to estimate the treatment effect. The treatment effect estimates from each imputed dataset are then combined to obtain an overall estimate. We call this method MIte. However, an alternative approach has been proposed, in which the propensity scores are combined across the imputed datasets (MIps). Therefore, there are remaining uncertainties about how to implement multiple imputation for propensity score analysis: (a) should we apply Rubin’s rules to the inverse probability of treatment weighting treatment effect estimates or to the propensity score estimates themselves? (b) does the outcome have to be included in the imputation model? (c) how should we estimate the variance of the inverse probability of treatment weighting estimator after multiple imputation? We studied the consistency and balancing properties of the MIte and MIps estimators and performed a simulation study to empirically assess their performance for the analysis of a binary outcome. We also compared the performance of these methods to complete case analysis and the missingness pattern approach, which uses a different propensity score model for each pattern of missingness, and a third multiple imputation approach in which the propensity score parameters are combined rather than the propensity scores themselves (MIpar). Under a missing at random mechanism, complete case and missingness pattern analyses were biased in most cases for estimating the marginal treatment effect, whereas multiple imputation approaches were approximately unbiased as long as the outcome was included in the imputation model. Only MIte was unbiased in all the studied scenarios and Rubin’s rules provided good variance estimates for MIte. The propensity score estimated in the MIte approach showed good balancing properties. In conclusion, when using multiple imputation in the inverse probability of treatment weighting context, MIte with the outcome included in the imputation model is the preferred approach.


Sign in / Sign up

Export Citation Format

Share Document