Propensity score estimation with missing values using a multiple imputation missingness pattern (MIMP) approach

Abstract Background Causal effect estimation with observational data is subject to bias due to confounding, which is often controlled for using propensity scores. One unresolved issue in propensity score estimation is how to handle missing values in covariates.Method Several approaches have been proposed for handling covariate missingness, including multiple imputation (MI), multiple imputation with missingness pattern (MIMP), and treatment mean imputation. However, there are other potentially useful approaches that have not been evaluated, including single imputation (SI) + prediction error (PE), SI+PE + parameter uncertainty (PU), and Generalized Boosted Modeling (GBM), which is a nonparametric approach for estimating propensity scores in which missing values are automatically handled in the estimation using a surrogate split method. To evaluate the performance of these approaches, a simulation study was conducted.Results Results suggested that SI+PE, SI+PE+PU, MI, and MIMP perform almost equally well and better than treatment mean imputation and GBM in terms of bias; however, MI and MIMP account for the additional uncertainty of imputing the missingness.Conclusions Applying GBM to the incomplete data and relying on the surrogate split approach resulted in substantial bias. Imputation prior to implementing GBM is recommended.

Download Full-text

Comparison of Methods for Handling Covariate Missingness in Propensity Score Estimation with a Binary Exposure

10.21203/rs.2.18726/v2 ◽

2020 ◽

Author(s):

Donna Coffman ◽

Jiangxiu Zhou ◽

Xizhen Cai

Keyword(s):

Propensity Score ◽

Multiple Imputation ◽

Missing Values ◽

Propensity Scores ◽

Causal Effect ◽

Nonparametric Approach ◽

Split Method ◽

Mean Imputation ◽

Substantial Bias ◽

Effect Estimation

Abstract Background: Causal effect estimation with observational data is subject to bias due to confounding, which is often controlled for using propensity scores. One unresolved issue in propensity score estimation is how to handle missing values in covariates.Method: Several approaches have been proposed for handling covariate missingness, including multiple imputation (MI), multiple imputation with missingness pattern (MIMP), and treatment mean imputation. However, there are other potentially useful approaches that have not been evaluated, including single imputation (SI) + prediction error (PE), SI+PE + parameter uncertainty (PU), and Generalized Boosted Modeling (GBM), which is a nonparametric approach for estimating propensity scores in which missing values are automatically handled in the estimation using a surrogate split method. To evaluate the performance of these approaches, a simulation study was conducted. Results: Results suggested that SI+PE, SI+PE+PU, MI, and MIMP perform almost equally well and better than treatment mean imputation and GBM in terms of bias; however, MI and MIMP account for the additional uncertainty of imputing the missingness. Conclusions: Applying GBM to the incomplete data and relying on the surrogate split approach resulted in substantial bias. Imputation prior to implementing GBM is recommended.

Download Full-text

Improving Outcome Predictions for Patients Receiving Mechanical Circulatory Support by Optimizing Imputation of Missing Values

Circulation Cardiovascular Quality and Outcomes ◽

10.1161/circoutcomes.120.007071 ◽

2021 ◽

Author(s):

Byron C. Jaeger ◽

Ryan Cantor ◽

Venkata Sthanam ◽

Rongbing Xie ◽

James K. Kirklin ◽

...

Keyword(s):

Multiple Imputation ◽

Risk Prediction ◽

Random Forests ◽

Missing Values ◽

Prediction Models ◽

Model Performance ◽

Circulatory Support ◽

Risk Prediction Models ◽

Prognostic Accuracy ◽

The Mean

Background: Risk prediction models play an important role in clinical decision making. When developing risk prediction models, practitioners often impute missing values to the mean. We evaluated the impact of applying other strategies to impute missing values on the prognostic accuracy of downstream risk prediction models, that is, models fitted to the imputed data. A secondary objective was to compare the accuracy of imputation methods based on artificially induced missing values. To complete these objectives, we used data from the Interagency Registry for Mechanically Assisted Circulatory Support. Methods: We applied 12 imputation strategies in combination with 2 different modeling strategies for mortality and transplant risk prediction following surgery to receive mechanical circulatory support. Model performance was evaluated using Monte-Carlo cross-validation and measured based on outcomes 6 months following surgery using the scaled Brier score, concordance index, and calibration error. We used Bayesian hierarchical models to compare model performance. Results: Multiple imputation with random forests emerged as a robust strategy to impute missing values, increasing model concordance by 0.0030 (25th–75th percentile: 0.0008–0.0052) compared with imputation to the mean for mortality risk prediction using a downstream proportional hazards model. The posterior probability that single and multiple imputation using random forests would improve concordance versus mean imputation was 0.464 and >0.999, respectively. Conclusions: Selecting an optimal strategy to impute missing values such as random forests and applying multiple imputation can improve the prognostic accuracy of downstream risk prediction models.

Download Full-text

Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18168375 ◽

2021 ◽

Vol 18 (16) ◽

pp. 8375

Author(s):

Thelma Dede Baddoo ◽

Zhijia Li ◽

Samuel Nii Odai ◽

Kenneth Rodolphe Chabi Boni ◽

Isaac Kwesi Nooni ◽

...

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Real World ◽

Missing Values ◽

Total Error ◽

Extensive Study ◽

Error Measurement ◽

Missing Data Imputation ◽

Single Station ◽

Real World Datasets

Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity of missing data reconstruction schemes to obtain the relevant results for a real-world single station streamflow observation to facilitate its further use. This investigation was implemented by applying different missing data mechanisms spanning from univariate algorithms to multiple imputation methods accustomed to multivariate data taking time as an explicit variable. The performance accuracy of these schemes was assessed using the total error measurement (TEM) and a recommended localized error measurement (LEM) in this study. The results show that univariate missing value algorithms, which are specially developed to handle univariate time series, provide satisfactory results, but the ones which provide the best results are usually time and computationally intensive. Also, multiple imputation algorithms which consider the surrounding observed values and/or which can understand the characteristics of the data provide similar results to the univariate missing data algorithms and, in some cases, perform better without the added time and computational downsides when time is taken as an explicit variable. Furthermore, the LEM would be especially useful when the missing data are in specific portions of the dataset or where very large gaps of ‘missingness’ occur. Finally, proper handling of missing values of real-world hydroclimatic datasets depends on imputing and extensive study of the particular dataset to be imputed.

Download Full-text

Comparing Multiple Imputation and Propensity-Score Weighting in Unit-Nonresponse Adjustments

Public Opinion Quarterly ◽

10.1093/poq/nfv029 ◽

2015 ◽

Vol 79 (3) ◽

pp. 635-661 ◽

Cited By ~ 4

Author(s):

Ahu Alanya ◽

Christof Wolf ◽

Cristina Sotto

Keyword(s):

Propensity Score ◽

Multiple Imputation ◽

Propensity Score Weighting ◽

Unit Nonresponse

Download Full-text

Multiple Imputation of Missing Values: Further Update of Ice, with an Emphasis on Interval Censoring

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x0800700401 ◽

2007 ◽

Vol 7 (4) ◽

pp. 445-464 ◽

Cited By ~ 148

Author(s):

Patrick Royston

Keyword(s):

Multiple Imputation ◽

Missing Values ◽

Interval Censoring

Download Full-text

The efficiency of multiple imputation and maximum likelihood methods for estimating missing values

Indian Journal of Science and Technology ◽

10.17485/ijst/2018/v11i16/118701 ◽

2018 ◽

Vol 11 (16) ◽

pp. 1-11

Author(s):

Tlhalitshi Volition Montshiwa ◽

Ntebo Moroke ◽

Elias Munapo ◽

◽

...

Keyword(s):

Maximum Likelihood ◽

Multiple Imputation ◽

Missing Values ◽

Likelihood Methods ◽

Maximum Likelihood Methods

Download Full-text

Long-term outcomes of omentum-preserving versus resecting gastrectomy for locally advanced gastric cancer with propensity score analysis

Scientific Reports ◽

10.1038/s41598-020-73367-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Yusuke Sakimura ◽

Noriyuki Inaki ◽

Toshikatsu Tsuji ◽

Shinichi Kadoya ◽

Hiroyuki Bando

Keyword(s):

Gastric Cancer ◽

Propensity Score ◽

Advanced Gastric Cancer ◽

Missing Values ◽

Propensity Score Analysis ◽

Locally Advanced ◽

R0 Resection ◽

Significant Difference ◽

First Recurrence ◽

The Impact

Abstract Omentectomy is conducted for advanced gastric cancer (AGC) patients as radical surgery without an adequate discussion of the effect. This study was conducted to reveal the impact of omentum-preserving gastrectomy on postoperative outcomes. AGC patients with cT3 and 4 disease who underwent total or distal gastrectomy with R0 resection were identified retrospectively. They were divided into the omentum-preserved group (OPG) and the omentum-resected group (ORG) and matched with propensity score matching with multiple imputation for missing values. Three-year overall survival (OS) and 3-year relapse-free survival (RFS) were compared, and the first recurrence site and complications were analysed. The numbers of eligible patients were 94 in the OPG and 144 in the ORG, and after matching, the number was 73 in each group. No significant difference was found in the 3-year OS rate (OPG: 78.9 vs. ORG: 78.9, P = 0.54) or the 3-year RFS rate (OPG: 77.8 vs. ORG: 68.2, P = 0.24). The proportions of peritoneal carcinomatosis and peritoneal dissemination as the first recurrence site and the rate and severity of complications were similar in the two groups. Omentectomy is not required for radical gastrectomy for AGC.

Download Full-text