Auxiliary Variables in Multiple Imputation When Data Are Missing Not at Random

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.

Download Full-text

Survival Analysis Using Auxiliary Variables Via Multiple Imputation, with Application to AIDS Clinical Trial Data

Biometrics ◽

10.1111/j.0006-341x.2002.00037.x ◽

2002 ◽

Vol 58 (1) ◽

pp. 37-47 ◽

Cited By ~ 41

Author(s):

Cheryl L. Faucett ◽

Nathaniel Schenker ◽

Jeremy M. G. Taylor

Keyword(s):

Clinical Trial ◽

Survival Analysis ◽

Multiple Imputation ◽

Clinical Trial Data ◽

Trial Data ◽

Auxiliary Variables

Download Full-text

The Effects of Auxiliary Variables on Coefficient Bias and Efficiency in Multiple Imputation

Sociological Methods & Research ◽

10.1177/0049124112452392 ◽

2012 ◽

Vol 41 (2) ◽

pp. 335-361 ◽

Cited By ~ 10

Author(s):

Sarah Mustillo

Keyword(s):

Multiple Imputation ◽

Auxiliary Variables

Download Full-text

Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics

International Journal of Molecular Sciences ◽

10.3390/ijms22179650 ◽

2021 ◽

Vol 22 (17) ◽

pp. 9650

Author(s):

Miranda L. Gardner ◽

Michael A. Freitas

Keyword(s):

Multiple Imputation ◽

Missing At Random ◽

Data Sets ◽

Proteomics Data ◽

Missing Not At Random ◽

Differential Abundance ◽

Missing Value ◽

Bottom Up ◽

Missing Value Imputation ◽

Impute Data

Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of samples and varying rates of “missing not at random” (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.

Download Full-text

Data Missing Not at Random in Mobile Health Research: Assessment of the Problem and a Case for Sensitivity Analyses

Journal of Medical Internet Research ◽

10.2196/26749 ◽

2021 ◽

Vol 23 (6) ◽

pp. e26749

Author(s):

Simon B Goldberg ◽

Daniel M Bolt ◽

Richard J Davidson

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Maximum Likelihood ◽

Multiple Imputation ◽

Mixture Model ◽

Mobile Health ◽

Sensitivity Analyses ◽

Research Assessment ◽

Missing Not At Random ◽

Pattern Mixture Model

Background Missing data are common in mobile health (mHealth) research. There has been little systematic investigation of how missingness is handled statistically in mHealth randomized controlled trials (RCTs). Although some missing data patterns (ie, missing at random [MAR]) may be adequately addressed using modern missing data methods such as multiple imputation and maximum likelihood techniques, these methods do not address bias when data are missing not at random (MNAR). It is typically not possible to determine whether the missing data are MAR. However, higher attrition in active (ie, intervention) versus passive (ie, waitlist or no treatment) conditions in mHealth RCTs raise a strong likelihood of MNAR, such as if active participants who benefit less from the intervention are more likely to drop out. Objective This study aims to systematically evaluate differential attrition and methods used for handling missingness in a sample of mHealth RCTs comparing active and passive control conditions. We also aim to illustrate a modern model-based sensitivity analysis and a simpler fixed-value replacement approach that can be used to evaluate the influence of MNAR. Methods We reanalyzed attrition rates and predictors of differential attrition in a sample of 36 mHealth RCTs drawn from a recent meta-analysis of smartphone-based mental health interventions. We systematically evaluated the design features related to missingness and its handling. Data from a recent mHealth RCT were used to illustrate 2 sensitivity analysis approaches (pattern-mixture model and fixed-value replacement approach). Results Attrition in active conditions was, on average, roughly twice that of passive controls. Differential attrition was higher in larger studies and was associated with the use of MAR-based multiple imputation or maximum likelihood methods. Half of the studies (18/36, 50%) used these modern missing data techniques. None of the 36 mHealth RCTs reviewed conducted a sensitivity analysis to evaluate the possible consequences of data MNAR. A pattern-mixture model and fixed-value replacement sensitivity analysis approaches were introduced. Results from a recent mHealth RCT were shown to be robust to missing data, reflecting worse outcomes in missing versus nonmissing scores in some but not all scenarios. A review of such scenarios helps to qualify the observations of significant treatment effects. Conclusions MNAR data because of differential attrition are likely in mHealth RCTs using passive controls. Sensitivity analyses are recommended to allow researchers to assess the potential impact of MNAR on trial results.

Download Full-text

A multiple imputation‐based sensitivity analysis approach for data subject to missing not at random

Statistics in Medicine ◽

10.1002/sim.8691 ◽

2020 ◽

Vol 39 (26) ◽

pp. 3756-3771

Author(s):

Chiu‐Hsieh Hsu ◽

Yulei He ◽

Chengcheng Hu ◽

Wei Zhou

Keyword(s):

Sensitivity Analysis ◽

Multiple Imputation ◽

Analysis Approach ◽

Missing Not At Random ◽

Data Subject

Download Full-text

Multiple imputation of binary multilevel missing not at random data

Journal of the Royal Statistical Society Series C (Applied Statistics) ◽

10.1111/rssc.12401 ◽

2020 ◽

Vol 69 (3) ◽

pp. 547-564 ◽

Cited By ~ 2

Author(s):

Angelina Hammon ◽

Sabine Zinn

Keyword(s):

Multiple Imputation ◽

Random Data ◽

Missing Not At Random

Download Full-text

A Note on Listwise Deletion versus Multiple Imputation

Political Analysis ◽

10.1017/pan.2018.18 ◽

2018 ◽

Vol 26 (4) ◽

pp. 480-488 ◽

Cited By ~ 14

Author(s):

Thomas B. Pepinsky

Keyword(s):

Multiple Imputation ◽

Missing Values ◽

Missing At Random ◽

Strong Correlations ◽

Simulation Approach ◽

Missing Not At Random ◽

Listwise Deletion ◽

Data Generating Process

This letter compares the performance of multiple imputation and listwise deletion using a simulation approach. The focus is on data that are “missing not at random” (MNAR), in which case both multiple imputation and listwise deletion are known to be biased. In these simulations, multiple imputation yields results that are frequently more biased, less efficient, and with worse coverage than listwise deletion when data are MNAR. This is the case even with very strong correlations between fully observed variables and variables with missing values, such that the data are very nearly “missing at random.” These results recommend caution when comparing the results from multiple imputation and listwise deletion, when the true data generating process is unknown.

Download Full-text

Handling missing data in an FFQ: multiple imputation and nutrient intake estimates

Public Health Nutrition ◽

10.1017/s1368980019000168 ◽

2019 ◽

Vol 22 (8) ◽

pp. 1351-1360 ◽

Cited By ~ 1

Author(s):

Mari Ichikawa ◽

Akihiro Hosono ◽

Yuya Tamai ◽

Miki Watanabe ◽

Kiyoshi Shibata ◽

...

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Nutrient Intake ◽

Missing Values ◽

Personal Characteristics ◽

Missing Not At Random ◽

Food Items ◽

Self Administered Questionnaire ◽

Better Than

AbstractObjectiveWe aimed to examine missing data in FFQ and to assess the effects on estimating dietary intake by comparing between multiple imputation and zero imputation.DesignWe used data from the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study. A self-administered questionnaire including an FFQ was implemented at baseline (FFQ1) and 5-year follow-up (FFQ2). Missing values in FFQ2 were replaced by corresponding FFQ1 values, multiple imputation and zero imputation.SettingA methodological sub-study of the Okazaki J-MICC study.ParticipantsOf a total of 7585 men and women aged 35–79 years at baseline, we analysed data for 5120 participants who answered all items in FFQ1 and at least 50% of items in FFQ2.ResultsAmong 5120 participants, the proportion of missing data was 3·7%. The increasing number of missing food items in FFQ2 varied with personal characteristics. Missing food items not eaten often in FFQ2 were likely to represent zero intake in FFQ1. Most food items showed that the observed proportion of zero intake was likely to be similar to the probability that the missing value is zero intake. Compared with FFQ1 values, multiple imputation had smaller differences of total energy and nutrient estimates, except for alcohol, than zero imputation.ConclusionsOur results indicate that missing values due to zero intake, namely missing not at random, in FFQ can be predicted reasonably well from observed data. Multiple imputation performed better than zero imputation for most nutrients and may be applied to FFQ data when missing is low.

Download Full-text

Missing not at random in end of life care studies: multiple imputation and sensitivity analysis on data from the ACTION study

BMC Medical Research Methodology ◽

10.1186/s12874-020-01180-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Giulia Carreras ◽

◽

Guido Miccinesi ◽

Andrew Wilcock ◽

Nancy Preston ◽

...

Keyword(s):

Sensitivity Analysis ◽

Multiple Imputation ◽

End Of Life ◽

End Of Life Care ◽

Missing Values ◽

Controlled Trial ◽

Missing At Random ◽

Life Care ◽

Missing Not At Random ◽

Cluster Randomized

Abstract Background Missing data are common in end-of-life care studies, but there is still relatively little exploration of which is the best method to deal with them, and, in particular, if the missing at random (MAR) assumption is valid or missing not at random (MNAR) mechanisms should be assumed. In this paper we investigated this issue through a sensitivity analysis within the ACTION study, a multicenter cluster randomized controlled trial testing advance care planning in patients with advanced lung or colorectal cancer. Methods Multiple imputation procedures under MAR and MNAR assumptions were implemented. Possible violation of the MAR assumption was addressed with reference to variables measuring quality of life and symptoms. The MNAR model assumed that patients with worse health were more likely to have missing questionnaires, making a distinction between single missing items, which were assumed to satisfy the MAR assumption, and missing values due to completely missing questionnaire for which a MNAR mechanism was hypothesized. We explored the sensitivity to possible departures from MAR on gender differences between key indicators and on simple correlations. Results Up to 39% of follow-up data were missing. Results under MAR reflected that missingness was related to poorer health status. Correlations between variables, although very small, changed according to the imputation method, as well as the differences in scores by gender, indicating a certain sensitivity of the results to the violation of the MAR assumption. Conclusions The findings confirmed the importance of undertaking this kind of analysis in end-of-life care studies.

Download Full-text