Missing Not at Random

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.

Download Full-text

Inference for partial correlation when data are missing not at random

Statistics & Probability Letters ◽

10.1016/j.spl.2018.05.027 ◽

2018 ◽

Vol 141 ◽

pp. 82-89 ◽

Cited By ~ 3

Author(s):

Tetiana Gorbach ◽

Xavier de Luna

Keyword(s):

Partial Correlation ◽

Missing Not At Random

Download Full-text

Cox regression of clustered event times with covariates missing not at random

Scandinavian Journal of Statistics ◽

10.1111/sjos.12409 ◽

2019 ◽

Vol 46 (4) ◽

pp. 1315-1346 ◽

Cited By ~ 1

Author(s):

Li Liu ◽

Yanyan Liu ◽

Yi Xiong ◽

X. Joan Hu

Keyword(s):

Cox Regression ◽

Missing Not At Random ◽

Event Times

Download Full-text

Mean score equation and instrumental variables: Another look at estimating the volume under the receiver operating characteristic surface when data are missing not at random

Stat ◽

10.1002/sta4.259 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Duc Khanh To ◽

Gianfranco Adimari ◽

Monica Chiogna

Keyword(s):

Receiver Operating Characteristic ◽

Instrumental Variables ◽

Operating Characteristic ◽

Characteristic Surface ◽

Missing Not At Random ◽

Receiver Operating

Download Full-text

Implementation of Instrumental Variable Bounds for Data Missing Not at Random

Epidemiology ◽

10.1097/ede.0000000000000811 ◽

2018 ◽

Vol 29 (3) ◽

pp. 364-368 ◽

Cited By ~ 4

Author(s):

Jessica R. Marden ◽

Linbo Wang ◽

Eric J. Tchetgen Tchetgen ◽

Stefan Walter ◽

M. Maria Glymour ◽

...

Keyword(s):

Instrumental Variable ◽

Missing Not At Random ◽

Data Missing

Download Full-text

Asymptotic distribution theory on pseudo semiparametric maximum likelihood estimator with covariates missing not at random

Communication in Statistics- Theory and Methods ◽

10.1080/03610926.2019.1678639 ◽

2019 ◽

pp. 1-12

Author(s):

Linghui Jin ◽

Yanyan Liu ◽

Lisha Guo

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimator ◽

Asymptotic Distribution ◽

Distribution Theory ◽

Likelihood Estimator ◽

Missing Not At Random ◽

Asymptotic Distribution Theory

Download Full-text

Auxiliary Variables in Multiple Imputation When Data Are Missing Not at Random

Journal of Mathematical Sociology ◽

10.1080/0022250x.2013.877898 ◽

2014 ◽

Vol 39 (2) ◽

pp. 73-91 ◽

Cited By ~ 10

Author(s):

Sarah Mustillo ◽

Soyoung Kwon

Keyword(s):

Multiple Imputation ◽

Auxiliary Variables ◽

Missing Not At Random

Download Full-text

An ad hoc method for dual adjusting for measurement errors and nonresponse bias for estimating prevalence in survey data: Application to Iranian mental health survey on any illicit drug use

Statistical Methods in Medical Research ◽

10.1177/0962280217690939 ◽

2017 ◽

Vol 27 (10) ◽

pp. 3062-3076 ◽

Cited By ~ 1

Author(s):

Kazem Khalagi ◽

Mohammad Ali Mansournia ◽

Seyed-Abbas Motevalian ◽

Keramat Nourijelyani ◽

Afarin Rahimi-Movaghar ◽

...

Keyword(s):

Mental Health ◽

Drug Use ◽

Measurement Errors ◽

Latent Class ◽

Illicit Drug ◽

Illicit Drug Use ◽

Nonresponse Bias ◽

Missing Not At Random ◽

Mental Health Survey ◽

Log Linear

Purpose The prevalence estimates of binary variables in sample surveys are often subject to two systematic errors: measurement error and nonresponse bias. A multiple-bias analysis is essential to adjust for both biases. Methods In this paper, we linked the latent class log-linear and proxy pattern-mixture models to adjust jointly for measurement errors and nonresponse bias with missing not at random mechanism. These methods were employed to estimate the prevalence of any illicit drug use based on Iranian Mental Health Survey data. Results After jointly adjusting for measurement errors and nonresponse bias in this data, the prevalence (95% confidence interval) estimate of any illicit drug use changed from 3.41 (3.00, 3.81)% to 27.03 (9.02, 38.76)%, 27.42 (9.04, 38.91)%, and 27.18 (9.03, 38.82)% under “missing at random,” “missing not at random,” and an intermediate mode, respectively. Conclusions Under certain assumptions, a combination of the latent class log-linear and binary-outcome proxy pattern-mixture models can be used to jointly adjust for both measurement errors and nonresponse bias in the prevalence estimation of binary variables in surveys.

Download Full-text

Regularized approach for data missing not at random

Statistical Methods in Medical Research ◽

10.1177/0962280217717760 ◽

2017 ◽

Vol 28 (1) ◽

pp. 134-150 ◽

Cited By ~ 2

Author(s):

Chi-hong Tseng ◽

Yi-Hau Chen

Keyword(s):

Clinical Trial ◽

Missing Data ◽

Cross Validation ◽

Regularization Parameter ◽

Simulation Studies ◽

Missing Not At Random ◽

Mechanism Model ◽

Missed Visits ◽

Validation Procedure ◽

Data Missing

It is common in longitudinal studies that missing data occur due to subjects’ no response, missed visits, dropout, death or other reasons during the course of study. To perform valid analysis in this setting, data missing not at random (MNAR) have to be considered. However, models for data MNAR often suffer from the identifiability issue and hence result in difficulty in estimation and computational convergence. To ameliorate this issue, we propose the LASSO and ridge-regularized selection models that regularize the missing data mechanism model to handle data MNAR, with the regularization parameter selected via a cross-validation procedure. The proposed models can be also employed for sensitivity analysis to examine the effects on inference of different assumptions about the missing data mechanism. We illustrate the performance of the proposed models via simulation studies and the analysis of data from a randomized clinical trial.

Download Full-text

Cox Regression with Covariates Missing Not at Random

Statistics in Biosciences ◽

10.1007/s12561-010-9031-0 ◽

2010 ◽

Vol 3 (2) ◽

pp. 208-222 ◽

Cited By ~ 8

Author(s):

Victoria J. Cook ◽

X. Joan Hu ◽

Tim B. Swartz

Keyword(s):

Cox Regression ◽

Missing Not At Random

Download Full-text