scholarly journals How to Apply Multiple Imputation in Propensity Score Matching with Partially Observed Confounders: A Simulation Study and Practical Recommendations

2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Albee Ling ◽  
Maria Montez-Rath ◽  
Maya Mathur ◽  
Kris Kapphahn ◽  
Manisha Desai

Propensity score matching (PSM) has been widely used to mitigate confounding in observational studies, although complications arise when the covariates used to estimate the PS are only partially observed. Multiple imputation (MI) is a potential solution for handling missing covariates in the estimation of the PS. However, it is not clear how to best apply MI strategies in the context of PSM. We conducted a simulation study to compare the performances of popular non-MI missing data methods and various MI-based strategies under different missing data mechanisms. We found that commonly applied missing data methods resulted in biased and inefficient estimates, and we observed large variation in performance across MI-based strategies. Based on our findings, we recommend 1) estimating the PS after applying MI to impute missing confounders; 2) conducting PSM within each imputed dataset followed by averaging the treatment effects to arrive at one summarized finding; 3) a bootstrapped-based variance to account for uncertainty of PS estimation, matching, and imputation; and 4) inclusion of key auxiliary variables in the imputation model.

2020 ◽  
Author(s):  
Anna-Carolina Haensch ◽  
Bernd Weiß

Many phenomena in the social or the medical sciences can be described as events, meaning that a qualitative change occurs at some particular point in time. Typical research questions focus on whether, when, and under which circumstances events occur. In the social sciences, discrete-time-to-event models are popular (Discrete-Time Survival Analysis Model, DTSAM). Data analyzed through DTSAMs is in the so-called person-period format. The model is a logistic regression model with the event indicator as the dependent variable. However, like many other statistical applications, the practical analysis of discrete-time survival data is challenged by missing data in one or more covariates. Negative consequences of such missing data range from efficiency losses to bias. A popular approach to circumvent these unwanted effects of missing data is multiple imputation (MI). With multiple imputation, it is crucial to include outcome information in the model for imputing partially observed covariates. Unfortunately, this is not straightforward in case of DTSAM, since we (a) usually have a partly observed (left- or right-censored) outcome, (b) do not have only one outcome variable, but two: the event indicator and the time-to-event and (c) have to decide whether to impute while the data set is still in person format or after transformation in person-period format, especially if we look at time-invariant information. Since there is little guidance on how to incorporate the observed outcome information in the imputation model of missing covariates in discrete-time survival analysis, we explore different approaches using fully conditional specification (FCS) (van Buuren 2006) and the newer substantial model compatible (SMC-) FCS MI (Bartlett et al., 2014). These approaches vary in their complexity with which we incorporate the outcome into the imputation model, the FCS algorithm used, and the data format used during the imputation. We compare the methods using Monte Carlo simulations and provide a practical example using data from the German Family Panel pairfam.We confirm the results by White and Royston (2009) and Beesley et al. (2016) that imputing conditional on the (partly imputed) uncensored time-to-event yields high bias. A compatible imputation model for SMC-FCS MI with data in person-period format proves to be the key to imputations with good performance results under different simulation conditions.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Sara Javadi ◽  
Abbas Bahrampour ◽  
Mohammad Mehdi Saber ◽  
Behshid Garrusi ◽  
Mohammad Reza Baneshi

Multiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include variables as linear terms only with no interactions, but omission of interaction terms may lead to biased results. It is investigated, using simulated and real datasets, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. We compared four multiple imputation (MI) methods on a real and a simulated dataset. MI methods included using predictive mean matching with an interaction term in the imputation model in MICE (MICE-interaction), classification and regression tree (CART) for specifying the imputation model in MICE (MICE-CART), the implementation of random forest (RF) in MICE (MICE-RF), and MICE-Stratified method. We first selected secondary data and devised an experimental design that consisted of 40 scenarios (2 × 5 × 4), which differed by the rate of simulated missing data (10%, 20%, 30%, 40%, and 50%), the missing mechanism (MAR and MCAR), and imputation method (MICE-Interaction, MICE-CART, MICE-RF, and MICE-Stratified). First, we randomly drew 700 observations with replacement 300 times, and then the missing data were created. The evaluation was based on raw bias (RB) as well as five other measurements that were averaged over the repetitions. Next, in a simulation study, we generated data 1000 times with a sample size of 700. Then, we created missing data for each dataset once. For all scenarios, the same criteria were used as for real data to evaluate the performance of methods in the simulation study. It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.


2017 ◽  
Vol 28 (1) ◽  
pp. 3-19 ◽  
Author(s):  
Clémence Leyrat ◽  
Shaun R Seaman ◽  
Ian R White ◽  
Ian Douglas ◽  
Liam Smeeth ◽  
...  

Inverse probability of treatment weighting is a popular propensity score-based approach to estimate marginal treatment effects in observational studies at risk of confounding bias. A major issue when estimating the propensity score is the presence of partially observed covariates. Multiple imputation is a natural approach to handle missing data on covariates: covariates are imputed and a propensity score analysis is performed in each imputed dataset to estimate the treatment effect. The treatment effect estimates from each imputed dataset are then combined to obtain an overall estimate. We call this method MIte. However, an alternative approach has been proposed, in which the propensity scores are combined across the imputed datasets (MIps). Therefore, there are remaining uncertainties about how to implement multiple imputation for propensity score analysis: (a) should we apply Rubin’s rules to the inverse probability of treatment weighting treatment effect estimates or to the propensity score estimates themselves? (b) does the outcome have to be included in the imputation model? (c) how should we estimate the variance of the inverse probability of treatment weighting estimator after multiple imputation? We studied the consistency and balancing properties of the MIte and MIps estimators and performed a simulation study to empirically assess their performance for the analysis of a binary outcome. We also compared the performance of these methods to complete case analysis and the missingness pattern approach, which uses a different propensity score model for each pattern of missingness, and a third multiple imputation approach in which the propensity score parameters are combined rather than the propensity scores themselves (MIpar). Under a missing at random mechanism, complete case and missingness pattern analyses were biased in most cases for estimating the marginal treatment effect, whereas multiple imputation approaches were approximately unbiased as long as the outcome was included in the imputation model. Only MIte was unbiased in all the studied scenarios and Rubin’s rules provided good variance estimates for MIte. The propensity score estimated in the MIte approach showed good balancing properties. In conclusion, when using multiple imputation in the inverse probability of treatment weighting context, MIte with the outcome included in the imputation model is the preferred approach.


2021 ◽  
pp. 096228022110370
Author(s):  
Seungbong Han ◽  
Kam-Wah Tsui ◽  
Hui Zhang ◽  
Gi-Ae Kim ◽  
Young-Suk Lim ◽  
...  

Propensity score matching is widely used to determine the effects of treatments in observational studies. Competing risk survival data are common to medical research. However, there is a paucity of propensity score matching studies related to competing risk survival data with missing causes of failure. In this study, we provide guidelines for estimating the treatment effect on the cumulative incidence function when using propensity score matching on competing risk survival data with missing causes of failure. We examined the performances of different methods for imputing the data with missing causes. We then evaluated the gain from the missing cause imputation in an extensive simulation study and applied the proposed data imputation method to the data from a study on the risk of hepatocellular carcinoma in patients with chronic hepatitis B and chronic hepatitis C.


2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Jiaxin Zhang ◽  
S. Ghazaleh Dashti ◽  
John B. Carlin ◽  
Katherine J. Lee ◽  
Margarita Moreno-Betancur

Abstract Background Outcome regression remains widely applied for estimating causal effects in observational studies, in which causal inference is conceptualised as emulating a randomized controlled trial (RCT). Multiple imputation (MI) is a commonly used method for handling missing data, but while in RCTs it has been shown that MI should be conducted by treatment group to reduce bias, whether imputation should be conducted by exposure group in observational studies has not been studied. Methods We conducted a simulation study to evaluate the performance of seven methods for handling missing data: Complete-case analysis (CCA), MI of main effect, MI with interactions (between exposure and: outcome, a strong confounder, outcome and a strong confounder, all incomplete), and MI conducted by exposure group. We simulated data based on an example from the Victorian Adolescent Health Cohort Study. Three exposure prevalences and seven outcome generation models were considered, the latter ranging from no interaction to strong-positive or negative exposure-confounder interaction. Various missingness scenarios were examined: with incomplete outcome only or also incomplete confounders, and three levels of complexity regarding the missingness mechanism. Results For all scenarios, MI by exposure led to the least bias, followed by MI approaches that included exposure-confounder interactions. Conclusions If MI is adopted in outcome regression, we recommend conducting MI by exposure group and, when not feasible, including exposure-confounder interactions in the imputation model. Key messages Similar to RCTs, MI should be conducted by exposure group when estimating average causal effects using outcome regression in observational studies.


Author(s):  
Tra My Pham ◽  
Irene Petersen ◽  
James Carpenter ◽  
Tim Morris

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.


Sign in / Sign up

Export Citation Format

Share Document