Statistical inference with missing data

2008 ◽  
Vol 3 (2) ◽  
pp. 87-87 ◽  
Author(s):  
Ma del Mar Rueda
Author(s):  
Roderick J. Little

I review assumptions about the missing-data mechanism that underlie methods for the statistical analysis of data with missing values. I describe Rubin's original definition of missing at random, (MAR), its motivation and criticisms, and his sufficient conditions for ignoring the missingness mechanism for likelihood-based, Bayesian, and frequentist inference. Related definitions, including missing completely at random, always MAR, always missing completely at random, and partially MAR are also covered. I present a formal argument for weakening Rubin's sufficient conditions for frequentist maximum likelihood inference with precision based on the observed information. Some simple examples of MAR are described, together with an example where the missingness mechanism can be ignored even though MAR does not hold. Alternative approaches to statistical inference based on the likelihood function are reviewed, along with non-likelihood frequentist approaches, including weighted generalized estimating equations. Connections with the causal inference literature are also discussed. Finally, alternatives to Rubin's MAR definition are discussed, including informative missingness, informative censoring, and coarsening at random. The intent is to provide a relatively nontechnical discussion, although some of the underlying issues are challenging and touch on fundamental questions of statistical inference. Expected final online publication date for the Annual Review of Statistics, Volume 8 is March 7, 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2019 ◽  
Author(s):  
Anni Hämäläinen ◽  
Paul Mick

Missing data can be a significant problem for statistical inference in many disciplines when information is not missing completely at random. In the worst case, it can lead to biased results when participants or subjects with certain characteristics contribute more data than other participants. Multiple imputation methods can be used to alleviate the loss of sample size and correct for this potential bias. Multiple imputation entails filling in the missing data using information from the same and other participants on the variables of interest and potentially other available data that correlate with the variables of interest. The missing data estimates and uncertainty associated with their estimation may then be taken into account in statistical inference from those variables. A complication may arise when using compound variables, such as principal component loadings (PC), which draw on a number of raw variables that themselves have non-overlapping missing data. Here, we propose a sequential multiple imputation approach to facilitate the use of all available data in the raw variables contained in compound variables in a way that conforms to the specifications of the multiple imputation framework. We first use multiple imputation to impute missing data for the subset of raw variables used in a principal component analysis (PCA) and perform the PCA with the imputed data; then, use the factor loadings to calculate PC scores for each individual with complete raw data. Finally, we include these PC scores as part of a global multiple imputation approach to estimate a final statistical model. We demonstrate (including annotated Stata code) the use of this approach by examining which sensory, health, social and cognitive factors explain self-reported sensory difficulties in the Canadian Longitudinal Study of Aging (CLSA) Comprehensive Cohort. The proposed sequential multiple imputation approach allows us to deal with the issue of having large cumulative amount of data that is missing (not completely at random) among a large number of variables, including composite cognitive scores derived from a battery of cognitive tests. We examine the resulting parameter estimates using a range of recommended diagnostic tools to highlight the potential and consequences of the approach to the statistical results.


2020 ◽  
Vol 39 (28) ◽  
pp. 4325-4333
Author(s):  
Yang Zhao

2019 ◽  
Vol 29 (3) ◽  
pp. 478-490 ◽  
Author(s):  
Guogen Shan ◽  
Alan Hutson ◽  
Gregory E Wilding ◽  
Changxing Ma ◽  
Guo-Liang Tian

Sign in / Sign up

Export Citation Format

Share Document