Flexible, Free Software for Multilevel Multiple Imputation: A Review of Blimp and jomo

2019 ◽  
Vol 44 (5) ◽  
pp. 625-641
Author(s):  
Timothy Hayes

Multiple imputation is a popular method for addressing data that are presumed to be missing at random. To obtain accurate results, one’s imputation model must be congenial to (appropriate for) one’s intended analysis model. This article reviews and demonstrates two recent software packages, Blimp and jomo, to multiply impute data in a manner congenial with three prototypical multilevel modeling analyses: (1) a random intercept model, (2) a random slope model, and (3) a cross-level interaction model. Following these analysis examples, I review and discuss both software packages.

2021 ◽  
Vol 22 (17) ◽  
pp. 9650
Author(s):  
Miranda L. Gardner ◽  
Michael A. Freitas

Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of samples and varying rates of “missing not at random” (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.


2019 ◽  
Vol 8 (5) ◽  
pp. 965-989
Author(s):  
M Quartagno ◽  
J R Carpenter ◽  
H Goldstein

Abstract Multiple imputation is now well established as a practical and flexible method for analyzing partially observed data, particularly under the missing at random assumption. However, when the substantive model is a weighted analysis, there is concern about the empirical performance of Rubin’s rules and also about how to appropriately incorporate possible interaction between the weights and the distribution of the study variables. One approach that has been suggested is to include the weights in the imputation model, potentially also allowing for interactions with the other variables. We show that the theoretical criterion justifying this approach can be approximately satisfied if we stratify the weights to define level-two units in our data set and include random intercepts in the imputation model. Further, if we let the covariance matrix of the variables have a random distribution across the level-two units, we also allow imputation to reflect any interaction between weight strata and the distribution of the variables. We evaluate our proposal in a number of simulation scenarios, showing it has promising performance both in terms of coverage levels of the model parameters and bias of the associated Rubin’s variance estimates. We illustrate its application to a weighted analysis of factors predicting reception-year readiness in children in the UK Millennium Cohort Study.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254112
Author(s):  
Faisal Maqbool Zahid ◽  
Shahla Faisal ◽  
Christian Heumann

Multiple Imputation (MI) is always challenging in high dimensional settings. The imputation model with some selected number of predictors can be incompatible with the analysis model leading to inconsistent and biased estimates. Although compatibility in such cases may not be achieved, but one can obtain consistent and unbiased estimates using a semi-compatible imputation model. We propose to relax the lasso penalty for selecting a large set of variables (at most n). The substantive model that also uses some formal variable selection procedure in high-dimensional structures is then expected to be nested in this imputation model. The resulting imputation model will be semi-compatible with high probability. The likelihood estimates can be unstable and can face the convergence issues as the number of variables becomes nearly as large as the sample size. To address these issues, we further propose to use a ridge penalty for obtaining the posterior distribution of the parameters based on the observed data. The proposed technique is compared with the standard MI software and MI techniques available for high-dimensional data in simulation studies and a real life dataset. Our results exhibit the superiority of the proposed approach to the existing MI approaches while addressing the compatibility issue.


2020 ◽  
Author(s):  
Miranda L. Gardner ◽  
Michael A. Freitas

ABSTRACTAnalysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values vary widely when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of samples and varying rates of “missing not at random” (MNAR) depending on inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data, (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.


2021 ◽  
pp. 096228022110473
Author(s):  
Lauren J Beesley ◽  
Irina Bondarenko ◽  
Michael R Elliot ◽  
Allison W Kurian ◽  
Steven J Katz ◽  
...  

Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation, also called chained equations multiple imputation. In this approach, we impute missing values using regression models for each variable, conditional on the other variables in the data. This approach, however, assumes that the missingness mechanism is missing at random, and it is not well-justified under not-at-random missingness without additional modification. In this paper, we describe how we can generalize the sequential regression multiple imputation imputation procedure to handle missingness not at random in the setting where missingness may depend on other variables that are also missing but not on the missing variable itself, conditioning on fully observed variables. We provide algebraic justification for several generalizations of standard sequential regression multiple imputation using Taylor series and other approximations of the target imputation distribution under missingness not at random. Resulting regression model approximations include indicators for missingness, interactions, or other functions of the missingness not at random missingness model and observed data. In a simulation study, we demonstrate that the proposed sequential regression multiple imputation modifications result in reduced bias in the final analysis compared to standard sequential regression multiple imputation, with an approximation strategy involving inclusion of an offset in the imputation model performing the best overall. The method is illustrated in a breast cancer study, where the goal is to estimate the prevalence of a specific genetic pathogenic variant.


2018 ◽  
Vol 36 (6_suppl) ◽  
pp. 234-234
Author(s):  
Raoul Concepcion ◽  
Andrew J. Armstrong ◽  
Lawrence Ivan Karsh ◽  
Stefan Holmstrom ◽  
Cristina Ivanescu ◽  
...  

234 Background: In STRIVE pts with CRPC (M0 n = 139; M1 n = 257), median time to 10-point decrease from baseline in FACT-P total for ENZA vs. BIC was 8.4 vs. 8.3 months (hazard ratio [HR] 0.91; 95% confidence interval [CI] 0.70, 1.19; p = 0.49). That assumed missing data was missing at random (MAR) and censored pts with no deterioration in FACT-P at last assessment. As HRQoL may worsen after progression/adverse events, for all STRIVE pts we replaced the MAR assumption with assumptions more likely to reflect clinically plausible HRQoL decline. Methods: Analyses of HRQoL decline (minimum clinically important difference or higher decrease in FACT-P vs. baseline) used a missing not at random (MNAR) assumption using a pattern mixture model (PMM) via sequential modeling with multiple imputation when imputation varies by reason of treatment discontinuation. Analysis of time to first clinically meaningful deterioration vs. baseline used a piecewise exponential survival multiple imputation model with reason-specific ∆ adjustment patterns similar to PMM analysis. Results: PMM analysis showed differences at week 61 in mean HRQoL change from baseline favoring ENZA vs. BIC for 7 of 10 scores: physical (PWB), functional, emotional (EWB), and social (SWB) well-being; FACT-P trial outcome index; FACT-G total; FACT-P total (all clinically meaningful except PWB). In the piecewise exponential survival imputation model, ENZA had a significantly lower risk of first deterioration in FACT-P total (0.76 [0.60, 0.95]), FACT-G total (0.66 [0.52, 0.83]), Prostate Cancer Subscale (PCS) pain-related (0.78 [0.62, 0.97]), SWB (0.49 [0.38, 0.64]), and EWB (0.58 [0.45, 0.75]) vs. BIC. For remaining domain scores, ENZA reduces risk of first deterioration (HR < 1) but the 95% CI includes 1 (which means not significant); sensitivity analysis showed similar results. Conclusions: In STRIVE pts, declines in all FACT-P scores were smaller for ENZA vs. BIC up to week 61. Comparison of change from baseline at week 61 favored ENZA for 7 of 10 scores (6 clinically meaningful). ENZA had a significantly lower risk of first deterioration in FACT-P or FACT-G total, PCS pain-related, EWB, and SWB. Clinical trial information: NCT01664923.


2016 ◽  
Vol 27 (9) ◽  
pp. 2610-2626 ◽  
Author(s):  
Thomas R Sullivan ◽  
Ian R White ◽  
Amy B Salter ◽  
Philip Ryan ◽  
Katherine J Lee

The use of multiple imputation has increased markedly in recent years, and journal reviewers may expect to see multiple imputation used to handle missing data. However in randomized trials, where treatment group is always observed and independent of baseline covariates, other approaches may be preferable. Using data simulation we evaluated multiple imputation, performed both overall and separately by randomized group, across a range of commonly encountered scenarios. We considered both missing outcome and missing baseline data, with missing outcome data induced under missing at random mechanisms. Provided the analysis model was correctly specified, multiple imputation produced unbiased treatment effect estimates, but alternative unbiased approaches were often more efficient. When the analysis model overlooked an interaction effect involving randomized group, multiple imputation produced biased estimates of the average treatment effect when applied to missing outcome data, unless imputation was performed separately by randomized group. Based on these results, we conclude that multiple imputation should not be seen as the only acceptable way to handle missing data in randomized trials. In settings where multiple imputation is adopted, we recommend that imputation is carried out separately by randomized group.


Author(s):  
Karthika Mohan ◽  
Felix Thoemmes ◽  
Judea Pearl

Traditional methods for handling incomplete data, including Multiple Imputation and Maximum Likelihood, require that the data be Missing At Random (MAR). In most cases, however, missingness in a variable depends on the underlying value of that variable. In this work, we devise model-based methods to consistently estimate mean, variance and covariance given data that are Missing Not At Random (MNAR). While previous work on MNAR data require variables to be discrete, we extend the analysis to continuous variables drawn from Gaussian distributions. We demonstrate the merits of our techniques by comparing it empirically to state of the art software packages.


2020 ◽  
Vol 29 (10) ◽  
pp. 3076-3092 ◽  
Author(s):  
Susan Gachau ◽  
Matteo Quartagno ◽  
Edmund Njeru Njagi ◽  
Nelson Owuor ◽  
Mike English ◽  
...  

Missing information is a major drawback in analyzing data collected in many routine health care settings. Multiple imputation assuming a missing at random mechanism is a popular method to handle missing data. The missing at random assumption cannot be confirmed from the observed data alone, hence the need for sensitivity analysis to assess robustness of inference. However, sensitivity analysis is rarely conducted and reported in practice. We analyzed routine paediatric data collected during a cluster randomized trial conducted in Kenyan hospitals. We imputed missing patient and clinician-level variables assuming the missing at random mechanism. We also imputed missing clinician-level variables assuming a missing not at random mechanism. We incorporated opinions from 15 clinical experts in the form of prior distributions and shift parameters in the delta adjustment method. An interaction between trial intervention arm and follow-up time, hospital, clinician and patient-level factors were included in a proportional odds random-effects analysis model. We performed these analyses using R functions derived from the jomo package. Parameter estimates from multiple imputation under the missing at random mechanism were similar to multiple imputation estimates assuming the missing not at random mechanism. Our inferences were insensitive to departures from the missing at random assumption using either the prior distributions or shift parameters sensitivity analysis approach.


2021 ◽  
Author(s):  
Melissa Middleton ◽  
Cattram Nguyen ◽  
Margarita Moreno-Betancur ◽  
John B Carlin ◽  
Katherine J Lee

Abstract Background In case-cohort studies a random subcohort is selected from the inception cohort and acts as the sample of controls for several outcome investigations. Analysis is conducted using only the cases and the subcohort, with inverse probability weighting (IPW) used to account for the unequal sampling probabilities resulting from the study design. Like all epidemiological studies, case-cohort studies are susceptible to missing data. Multiple imputation (MI) has become increasingly popular for addressing missing data in epidemiological studies. It is currently unclear how best to incorporate the weights from a case-cohort analysis in MI procedures used to address missing covariate data.Method A simulation study was conducted with missingness in two covariates, motivated by a case study within the Barwon Infant Study. MI methods considered were: using the outcome, a proxy for weights in the simple case-cohort design considered, as a predictor in the imputation model, with and without exposure and covariate interactions; imputing separately within each weight category; and using a weighted imputation model. These methods were compared to a complete case analysis (CCA) within the context of a standard IPW analysis model estimating either the risk or odds ratio. The strength of associations, missing data mechanism, proportion of observations with incomplete covariate data, and subcohort selection probability varied across the simulation scenarios. Methods were also applied to the case study.Results There was similar performance in terms of relative bias and precision with all MI methods across the scenarios considered, with expected improvements compared with the CCA. Slight underestimation of the standard error was seen throughout but the nominal level of coverage (95%) was generally achieved. All MI methods showed a similar increase in precision as the subcohort selection probability increased, irrespective of the scenario. A similar pattern of results was seen in the case study.Conclusions How weights were incorporated into the imputation model had minimal effect on the performance of MI; this may be due to case-cohort studies only having two weight categories. In this context, inclusion of the outcome in the imputation model was sufficient to account for the unequal sampling probabilities in the analysis model.


Sign in / Sign up

Export Citation Format

Share Document