Multiple Imputation of Missing Data: A Simulation  Study on a Binary Response

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.

Download Full-text

Multiple Imputation with Missing Indicators as Proxies for Unmeasured Variables: Simulation Study

10.21203/rs.3.rs-24268/v3 ◽

2020 ◽

Author(s):

Matthew Sperrin ◽

Glen P. Martin

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Causal Effect ◽

Missing At Random ◽

Directed Acyclic Graphs ◽

Missing Not At Random ◽

Routinely Collected Health Data ◽

Effect Estimation ◽

Minimal Bias

Abstract Background : Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders. Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms. Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. In particular the approach: 1) does not introduce bias in missing (completely) at random scenarios; 2)reduces bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself; and 3) may reduce or increase bias when unmeasured confounding is present. Conclusion : In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.

Download Full-text

Use of Missing Indicators as Proxies for Unmeasured Variables: Simulation Study

10.21203/rs.3.rs-24268/v1 ◽

2020 ◽

Author(s):

Matthew Sperrin ◽

Glen P. Martin

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Predictive Accuracy ◽

Causal Effect ◽

Model Performance ◽

Directed Acyclic Graphs ◽

Regression Imputation ◽

Effect Estimation ◽

Improved Model

Abstract Background Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias when used inappropriately, its use in conjunction with other imputation approaches may unlock the potential value of missingness to reduce bias and improve prediction.Methods We conducted a simulation study to determine when the use of a missing indicator, combined with an imputation approach, such as multiple imputation, would lead to improved model performance, in terms of minimising bias for causal effect estimation, and improving predictive accuracy, under a range of scenarios with unmeasured variables. We use directed acyclic graphs and structural models to elucidate causal structures of interest. We consider a variety of missingness mechanisms, then handle these using complete case analysis, unconditional mean imputation, regression imputation and multiple imputation. In each case we evaluate supplementing these approaches with missing indicator terms. Results For estimating causal effects, we find that multiple imputation combined with a missing indicator gives minimal bias in most scenarios. For prediction, we find that regression imputation combined with a missing indicator minimises mean squared error.Conclusion In the presence of missing data, careful use of missing indicators, combined with appropriate imputation, can improve both causal estimation and prediction accuracy.

Download Full-text

How to Apply Multiple Imputation in Propensity Score Matching with Partially Observed Confounders: A Simulation Study and Practical Recommendations

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1608552120 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Albee Ling ◽

Maria Montez-Rath ◽

Maya Mathur ◽

Kris Kapphahn ◽

Manisha Desai

Keyword(s):

Missing Data ◽

Propensity Score ◽

Multiple Imputation ◽

Propensity Score Matching ◽

Simulation Study ◽

Observational Studies ◽

Auxiliary Variables ◽

Imputation Model ◽

Partially Observed ◽

Practical Recommendations

Propensity score matching (PSM) has been widely used to mitigate confounding in observational studies, although complications arise when the covariates used to estimate the PS are only partially observed. Multiple imputation (MI) is a potential solution for handling missing covariates in the estimation of the PS. However, it is not clear how to best apply MI strategies in the context of PSM. We conducted a simulation study to compare the performances of popular non-MI missing data methods and various MI-based strategies under different missing data mechanisms. We found that commonly applied missing data methods resulted in biased and inefficient estimates, and we observed large variation in performance across MI-based strategies. Based on our findings, we recommend 1) estimating the PS after applying MI to impute missing confounders; 2) conducting PSM within each imputed dataset followed by averaging the treatment effects to arrive at one summarized finding; 3) a bootstrapped-based variance to account for uncertainty of PS estimation, matching, and imputation; and 4) inclusion of key auxiliary variables in the imputation model.

Download Full-text

Accounting for missing data caused by drug cessation in observational comparative effectiveness research: a simulation study

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2021-221477 ◽

2022 ◽

pp. annrheumdis-2021-221477

Author(s):

Denis Mongin ◽

Kim Lauper ◽

Axel Finckh ◽

Thomas Frisell ◽

Delphine Sophie Courvoisier

Keyword(s):

Missing Data ◽

Disease Activity ◽

Multiple Imputation ◽

Simulation Study ◽

Comparative Effectiveness ◽

Real World Data ◽

Data Set ◽

Multiple Imputations ◽

True Value ◽

The Absolute

ObjectivesTo assess the performance of statistical methods used to compare the effectiveness between drugs in an observational setting in the presence of attrition.MethodsIn this simulation study, we compared the estimations of low disease activity (LDA) at 1 year produced by complete case analysis (CC), last observation carried forward (LOCF), LUNDEX, non-responder imputation (NRI), inverse probability weighting (IPW) and multiple imputations of the outcome. All methods were adjusted for confounders. The reasons to stop the treatments were included in the multiple imputation method (confounder-adjusted response rate with attrition correction, CARRAC) and were either included (IPW2) or not (IPW1) in the IPW method. A realistic simulation data set was generated from a real-world data collection. The amount of missing data caused by attrition and its dependence on the ‘true’ value of the data missing were varied to assess the robustness of each method to these changes.ResultsLUNDEX and NRI strongly underestimated the absolute LDA difference between two treatments, and their estimates were highly sensitive to the amount of attrition. IPW1 and CC overestimated the absolute LDA difference between the two treatments and the overestimation increased with increasing attrition or when missingness depended on disease activity at 1 year. IPW2 and CARRAC produced unbiased estimations, but IPW2 had a greater sensitivity to the missing pattern of data and the amount of attrition than CARRAC.ConclusionsOnly multiple imputation and IPW2, which considered both confounding and treatment cessation reasons, produced accurate comparative effectiveness estimates.

Download Full-text

On comparative performance of multiple imputation methods for moderate to large proportions of missing data in clinical trials: a simulation study

Journal of Medical Statistics and Informatics ◽

10.7243/2053-7662-2-9 ◽

2014 ◽

Vol 2 (1) ◽

pp. 9 ◽

Cited By ~ 5

Author(s):

Sukhdev Mishra ◽

Diwakar Khare

Keyword(s):

Clinical Trials ◽

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Comparative Performance ◽

Imputation Methods

Download Full-text

Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable

Journal of Probability and Statistics ◽

10.1155/2021/6668822 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Sara Javadi ◽

Abbas Bahrampour ◽

Mohammad Mehdi Saber ◽

Behshid Garrusi ◽

Mohammad Reza Baneshi

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Recursive Partitioning ◽

Real Data ◽

Continuous Variable ◽

Outcome Data ◽

Classification And Regression Tree ◽

Parameter Estimates ◽

Imputation Model

Multiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include variables as linear terms only with no interactions, but omission of interaction terms may lead to biased results. It is investigated, using simulated and real datasets, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. We compared four multiple imputation (MI) methods on a real and a simulated dataset. MI methods included using predictive mean matching with an interaction term in the imputation model in MICE (MICE-interaction), classification and regression tree (CART) for specifying the imputation model in MICE (MICE-CART), the implementation of random forest (RF) in MICE (MICE-RF), and MICE-Stratified method. We first selected secondary data and devised an experimental design that consisted of 40 scenarios (2 × 5 × 4), which differed by the rate of simulated missing data (10%, 20%, 30%, 40%, and 50%), the missing mechanism (MAR and MCAR), and imputation method (MICE-Interaction, MICE-CART, MICE-RF, and MICE-Stratified). First, we randomly drew 700 observations with replacement 300 times, and then the missing data were created. The evaluation was based on raw bias (RB) as well as five other measurements that were averaged over the repetitions. Next, in a simulation study, we generated data 1000 times with a sample size of 700. Then, we created missing data for each dataset once. For all scenarios, the same criteria were used as for real data to evaluate the performance of methods in the simulation study. It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.

Download Full-text

Multiple Imputation with Missing Indicators as Proxies for Unmeasured Variables: Simulation Study

10.21203/rs.3.rs-24268/v2 ◽

2020 ◽

Author(s):

Matthew Sperrin ◽

Glen P. Martin

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Causal Effect ◽

Missing At Random ◽

Directed Acyclic Graphs ◽

Missing Not At Random ◽

Routinely Collected Health Data ◽

Effect Estimation ◽

Minimal Bias

Abstract Background: Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders.Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms.Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. It does not introduce bias in missing (completely) at random scenarios, while reducing bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself. The incorporation of a missing indicator can reduce or increase bias when unmeasured confounding is present.Conclusion: In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.

Download Full-text

The Comparison of Elementaπ Teachers' Longitudinal Advice Network Missing Data Analysis: Based on Multiple Imputation when Missing Not At idom(MNAR)

Korean Society for Educational Evaluation ◽

10.31158/jeev.2019.32.4.671 ◽

2019 ◽

Vol 32 (4) ◽

pp. 671-703

Author(s):

Chong Min Kim

Keyword(s):

Data Analysis ◽

Missing Data ◽

Multiple Imputation ◽

Missing Data Analysis

Download Full-text

P189 Comparison of secukinumab versus adalimumab efficacy on skin outcomes in psoriatic arthritis: 52-week results from the EXCEED study

Rheumatology ◽

10.1093/rheumatology/keab247.184 ◽

2021 ◽

Vol 60 (Supplement_1) ◽

Author(s):

Alice Gottlieb ◽

Frank Behrens ◽

Peter Nash ◽

Joseph F Merola ◽

Pascale Pellet ◽

...

Keyword(s):

Psoriatic Arthritis ◽

Missing Data ◽

Multiple Imputation ◽

Plaque Psoriasis ◽

Stock Ownership ◽

Research Support ◽

Acr20 Response ◽

Number Of Patients ◽

Pasi Score ◽

To Receive

Abstract Background/Aims Psoriatic arthritis (PsA) is a heterogeneous disease comprising musculoskeletal and dermatological manifestations, especially plaque psoriasis. Secukinumab, an interleukin17A inhibitor, provided significantly greater PASI75/100 responses in two head-to-head trials versus etanercept or ustekinumab, a tumour necrosis factor inhibitor (TNFi), in patients with moderate-to-severe plaque psoriasis. The EXCEED study (NCT02745080) investigated whether secukinumab was superior to adalimumab, another TNFi, as monotherapy in biologic-naive active PsA patients with active plaque psoriasis (defined as having ≥1 psoriatic plaque of ≥ 2 cm diameter, nail changes consistent with psoriasis or documented history of plaque psoriasis). Here we report the pre-specified skin outcomes from the EXCEED study in the subset of patients with ≥3% body surface area (BSA) affected with psoriasis at baseline. Methods In this head-to-head, Phase 3b, randomised, double-blind, active-controlled, multicentre, parallel-group trial, patients were randomised to receive subcutaneous secukinumab 300 mg at baseline and Weeks 1-4, followed by dosing every 4 weeks until Week 48, or subcutaneous adalimumab 40 mg at baseline followed by the same dosing every 2 weeks until Week 50. The primary endpoint was superiority of secukinumab versus adalimumab on ACR20 response at Week 52. Pre-specified outcomes included the proportion of patients achieving a combined ACR50 and PASI100 response, PASI100 response, and absolute PASI score ≤3. Missing data were handled using multiple imputation. Results Overall, 853 patients were randomised to receive secukinumab (n = 426) or adalimumab (n = 427). At baseline, 215 and 202 patients had at least 3% BSA affected with psoriasis in the secukinumab and adalimumab groups, respectively. At Week 52, more patients achieved simultaneous improvement in ACR50 and PASI100 response with secukinumab versus adalimumab (30.7% versus 19.2%, respectively; P = 0.0087). Greater efficacy was demonstrated for secukinumab versus adalimumab for PASI100 responses and for the proportion of patients achieving absolute PASI score ≤3 (Table 1). Conclusion In this pre-specified analysis, secukinumab provided higher responses compared with adalimumab in achievement of combined improvement in joint and skin disease (combined ACR50 and PASI100 response) and in skin-specific endpoints (PASI100 and absolute PASI score ≤3) at Week 52. P189 Table 1:Skin-specific outcomes at Week 52Endpoints, % responseSEC 300 mg (N = 215)ADA 40 mg (N = 202)P value (unadjusted)PASI10046300.0007Combined ACR50 and PASI10031190.0087Absolute PASI score ≤379650.0015P value vs ADA; unadjusted P values are presented. Multiple imputation was used for handling missing data. ADA, adalimumab; ACR, American College of Rheumatology; N, number of patients in the psoriasis subset; PASI, Psoriasis Area and Severity Index; SEC, secukinumab. Disclosure A. Gottlieb: Grants/research support; A.G. has received research support, consultation fees or speaker honoraria from Pfizer, AbbVie, BMS, Lilly, MSD, Novartis, Roche, Sanofi, Sandoz, Nordic, Celltrion and UCB. F. Behrens: Consultancies; F.B. is a consultant for Pfizer, AbbVie, Sanofi, Lilly, Novartis, Genzyme, Boehringer Ingelheim, Janssen, MSD, Celgene, Roche and Chugai. Grants/research support; F.B. has received grant/research support from Pfizer, Janssen, Chugai, Celgene, Lilly and Roche. P. Nash: Consultancies; P.N. is a consultant for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly, Gilead, Janssen, MSD, Novartis, Pfizer Inc., Roche, Sanofi and UCB. Member of speakers’ bureau; for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly, Gilead, Janssen, MSD, Novartis, Pfizer Inc., Roche, Sanofi and UCB. Grants/research support; P.N. has received research support from AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly and Company, Gilead, Janssen, MSD, Novartis, Pfizer Inc, Roche, Sanofi and UCB. J. Merola: Consultancies; J.F.M. is a consultant for Merck, AbbVie, Dermavant, Eli Lilly, Novartis, Janssen, UCB Pharma, Celgene, Sanofi, Regeneron, Arena, Sun Pharma, Biogen, Pfizer, EMD Sorono, Avotres and LEO Pharma. P. Pellet: Corporate appointments; P.P. is an employee of Novartis. Shareholder/stock ownership; P.P. is a shareholder of Novartis. L. Pricop: Corporate appointments; L.P. is an employee of Novartis. Shareholder/stock ownership; L.P. is a shareholder of Novartis. I. McInnes: Consultancies; I.M. is a consultant for AbbVie, Bristol Myers Squibb, Celgene, Eli Lilly and Company, Gilead, Janssen, Novartis, Pfizer and UCB. Grants/research support; I.M. has received grant/research support from Bristol Myers Squibb, Celgene, Eli Lilly and Company, Janssen and UCB.

Download Full-text