scholarly journals Quantile regression-based multiple imputation of missing values — An evaluation and application to corporal punishment data

Methodology ◽  
2021 ◽  
Vol 17 (3) ◽  
pp. 205-230
Author(s):  
Kristian Kleinke ◽  
Markus Fritsch ◽  
Mark Stemmler ◽  
Jost Reinecke ◽  
Friedrich Lösel

Quantile regression (QR) is a valuable tool for data analysis and multiple imputation (MI) of missing values – especially when standard parametric modelling assumptions are violated. Yet, Monte Carlo simulations that systematically evaluate QR-based MI in a variety of different practically relevant settings are still scarce. In this paper, we evaluate the method regarding the imputation of ordinal data and compare the results with other standard and robust imputation methods. We then apply QR-based MI to an empirical dataset, where we seek to identify risk factors for corporal punishment of children by their fathers. We compare the modelling results with previously published findings based on complete cases. Our Monte Carlo results highlight the advantages of QR-based MI over fully parametric imputation models: QR-based MI yields unbiased statistical inferences across large parts of the conditional distribution, when parametric modelling assumptions, such as normal and homoscedastic error terms, are violated. Regarding risk factors for corporal punishment, our MI results support previously published findings based on complete cases. Our empirical results indicate that the identified “missing at random” processes in the investigated dataset are negligible.

Author(s):  
Patrick Royston ◽  
John B. Carlin ◽  
Ian R. White

We present an update of mim, a program for managing multiply imputed datasets and performing inference (estimating parameters) using Rubin's rules for combining estimates from imputed datasets. The new features of particular importance are an option for estimating the Monte Carlo error (due to the sampling variability of the imputation process) in parameter estimates and in related quantities, and a general routine for combining any scalar estimate across imputations.


2018 ◽  
Vol 26 (4) ◽  
pp. 480-488 ◽  
Author(s):  
Thomas B. Pepinsky

This letter compares the performance of multiple imputation and listwise deletion using a simulation approach. The focus is on data that are “missing not at random” (MNAR), in which case both multiple imputation and listwise deletion are known to be biased. In these simulations, multiple imputation yields results that are frequently more biased, less efficient, and with worse coverage than listwise deletion when data are MNAR. This is the case even with very strong correlations between fully observed variables and variables with missing values, such that the data are very nearly “missing at random.” These results recommend caution when comparing the results from multiple imputation and listwise deletion, when the true data generating process is unknown.


2020 ◽  
Vol 8 (1) ◽  
pp. 249-271
Author(s):  
Nathan Corder ◽  
Shu Yang

Abstract The problem of missingness in observational data is ubiquitous. When the confounders are missing at random, multiple imputation is commonly used; however, the method requires congeniality conditions for valid inferences, which may not be satisfied when estimating average causal treatment effects. Alternatively, fractional imputation, proposed by Kim 2011, has been implemented to handling missing values in regression context. In this article, we develop fractional imputation methods for estimating the average treatment effects with confounders missing at random. We show that the fractional imputation estimator of the average treatment effect is asymptotically normal, which permits a consistent variance estimate. Via simulation study, we compare fractional imputation’s accuracy and precision with that of multiple imputation.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Giulia Carreras ◽  
◽  
Guido Miccinesi ◽  
Andrew Wilcock ◽  
Nancy Preston ◽  
...  

Abstract Background Missing data are common in end-of-life care studies, but there is still relatively little exploration of which is the best method to deal with them, and, in particular, if the missing at random (MAR) assumption is valid or missing not at random (MNAR) mechanisms should be assumed. In this paper we investigated this issue through a sensitivity analysis within the ACTION study, a multicenter cluster randomized controlled trial testing advance care planning in patients with advanced lung or colorectal cancer. Methods Multiple imputation procedures under MAR and MNAR assumptions were implemented. Possible violation of the MAR assumption was addressed with reference to variables measuring quality of life and symptoms. The MNAR model assumed that patients with worse health were more likely to have missing questionnaires, making a distinction between single missing items, which were assumed to satisfy the MAR assumption, and missing values due to completely missing questionnaire for which a MNAR mechanism was hypothesized. We explored the sensitivity to possible departures from MAR on gender differences between key indicators and on simple correlations. Results Up to 39% of follow-up data were missing. Results under MAR reflected that missingness was related to poorer health status. Correlations between variables, although very small, changed according to the imputation method, as well as the differences in scores by gender, indicating a certain sensitivity of the results to the violation of the MAR assumption. Conclusions The findings confirmed the importance of undertaking this kind of analysis in end-of-life care studies.


2011 ◽  
Vol 26 (S2) ◽  
pp. 572-572
Author(s):  
N. Resseguier ◽  
H. Verdoux ◽  
F. Clavel-Chapelon ◽  
X. Paoletti

IntroductionThe CES-D scale is commonly used to assess depressive symptoms (DS) in large population-based studies. Missing values in items of the scale may create biases.ObjectivesTo explore reasons for not completing items of the CES-D scale and to perform sensitivity analysis of the prevalence of DS to assess the impact of different missing data hypotheses.Methods71412 women included in the French E3N cohort returned in 2005 a questionnaire containing the CES-D scale. 45% presented at least one missing value in the scale. An interview study was carried out on a random sample of 204 participants to examine the different hypotheses for the missing value mechanism. The prevalence of DS was estimated according to different methods for handling missing values: complete cases analysis, single imputation, multiple imputation under MAR (missing at random) and MNAR (missing not at random) assumptions.ResultsThe interviews showed that participants were not embarrassed to fill in questions about DS. Potential reasons of nonresponse were identified. MAR and MNAR hypotheses remained plausible and were explored.Among complete responders, the prevalence of DS was 26.1%. After multiple imputation under MAR assumption, it was 28.6%, 29.8% and 31.7% among women presenting up to 4, to 10 and to 20 missing values, respectively. The estimates were robust after applying various scenarios of MNAR data for the sensitivity analysis.ConclusionsThe CES-D scale can easily be used to assess DS in large cohorts. Multiple imputation under MAR assumption allows to reliably handle missing values.


2014 ◽  
Vol 26 (2) ◽  
pp. 707-723 ◽  
Author(s):  
Kyoji Furukawa ◽  
Dale L. Preston ◽  
Munechika Misumi ◽  
Harry M. Cullings

While data are unavoidably missing or incomplete in most observational studies, consequences of mishandling such incompleteness in analysis are often overlooked. When time-varying information is collected irregularly and infrequently over a long period, even precisely obtained data may implicitly involve substantial incompleteness. Motivated by an analysis to quantitatively evaluate the effects of smoking and radiation on lung cancer risks among Japanese atomic-bomb survivors, we provide a unique application of multiple imputation to incompletely observed smoking histories under the assumption of missing at random. Predicting missing values for the age of smoking initiation and, given initiation, smoking intensity and cessation age, analyses can be based on complete, though partially imputed, smoking histories. A simulation study shows that multiple imputation appropriately conditioned on the outcome and other relevant variables can produce consistent estimates when data are missing at random. Our approach is particularly appealing in large cohort studies where a considerable amount of time-varying information is incomplete under a mechanism depending in a complex manner on other variables. In application to the motivating example, this approach is expected to reduce estimation bias that might be unavoidable in naive analyses, while keeping efficiency by retaining known information.


2021 ◽  
pp. 096228022110473
Author(s):  
Lauren J Beesley ◽  
Irina Bondarenko ◽  
Michael R Elliot ◽  
Allison W Kurian ◽  
Steven J Katz ◽  
...  

Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation, also called chained equations multiple imputation. In this approach, we impute missing values using regression models for each variable, conditional on the other variables in the data. This approach, however, assumes that the missingness mechanism is missing at random, and it is not well-justified under not-at-random missingness without additional modification. In this paper, we describe how we can generalize the sequential regression multiple imputation imputation procedure to handle missingness not at random in the setting where missingness may depend on other variables that are also missing but not on the missing variable itself, conditioning on fully observed variables. We provide algebraic justification for several generalizations of standard sequential regression multiple imputation using Taylor series and other approximations of the target imputation distribution under missingness not at random. Resulting regression model approximations include indicators for missingness, interactions, or other functions of the missingness not at random missingness model and observed data. In a simulation study, we demonstrate that the proposed sequential regression multiple imputation modifications result in reduced bias in the final analysis compared to standard sequential regression multiple imputation, with an approximation strategy involving inclusion of an offset in the imputation model performing the best overall. The method is illustrated in a breast cancer study, where the goal is to estimate the prevalence of a specific genetic pathogenic variant.


2016 ◽  
Vol 32 (2) ◽  
pp. 596 ◽  
Author(s):  
Urbano Lorenzo-Seva ◽  
Joost R. Van Ginkel

<p>Researchers frequently have to analyze scales in which some participants have failed to respond to some items. In this paper we focus on the exploratory factor analysis of multidimensional scales (i.e., scales that consist of a number of subscales) where each subscale is made up of a number of Likert-type items, and the aim of the analysis is to estimate participants’ scores on the corresponding latent traits. Our approach uses the following steps: (1) multiple imputation creates several copies of the data, in which the missing values are imputed; (2) each copy of the data is subject to independent factor analysis, and the same number of factors is extracted from all copies; (3) all factor solutions are simultaneously orthogonally (or obliquely) rotated so that they are both (a) factorially simple, and (b) as similar to one another as possible; (4) latent trait scores are estimated for ordinal data in each copy; and (5) participants’ scores on the latent traits are estimated as the average of the estimates of the latent traits obtained in the copies. We applied the approach in a real dataset where missing responses were artificially introduced following a real pattern of non-responses and a simulation study based on artificial datasets. The results show that our approach was able to compute factor score estimates even for participants that have missing data.</p>


2014 ◽  
Vol 22 (4) ◽  
pp. 497-519 ◽  
Author(s):  
Jonathan Kropko ◽  
Ben Goodrich ◽  
Andrew Gelman ◽  
Jennifer Hill

We consider the relative performance of two common approaches to multiple imputation (MI): joint multivariate normal (MVN) MI, in which the data are modeled as a sample from a joint MVN distribution; and conditional MI, in which each variable is modeled conditionally on all the others. In order to use the multivariate normal distribution, implementations of joint MVN MI typically assume that categories of discrete variables are probabilistically constructed from continuous values. We use simulations to examine the implications of these assumptions. For each approach, we assess (1) the accuracy of the imputed values; and (2) the accuracy of coefficients and fitted values from a model fit to completed data sets. These simulations consider continuous, binary, ordinal, and unordered-categorical variables. One set of simulations uses multivariate normal data, and one set uses data from the 2008 American National Election Studies. We implement a less restrictive approach than is typical when evaluating methods using simulations in the missing data literature: in each case, missing values are generated by carefully following the conditions necessary for missingness to be “missing at random” (MAR). We find that in these situations conditional MI is more accurate than joint MVN MI whenever the data include categorical variables.


Sign in / Sign up

Export Citation Format

Share Document