Missing Not at Random

2021 ◽  
pp. 3243-3243
Author(s):  
Tra My Pham ◽  
Irene Petersen ◽  
James Carpenter ◽  
Tim Morris

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.


2019 ◽  
Vol 46 (4) ◽  
pp. 1315-1346 ◽  
Author(s):  
Li Liu ◽  
Yanyan Liu ◽  
Yi Xiong ◽  
X. Joan Hu

Epidemiology ◽  
2018 ◽  
Vol 29 (3) ◽  
pp. 364-368 ◽  
Author(s):  
Jessica R. Marden ◽  
Linbo Wang ◽  
Eric J. Tchetgen Tchetgen ◽  
Stefan Walter ◽  
M. Maria Glymour ◽  
...  

2017 ◽  
Vol 27 (10) ◽  
pp. 3062-3076 ◽  
Author(s):  
Kazem Khalagi ◽  
Mohammad Ali Mansournia ◽  
Seyed-Abbas Motevalian ◽  
Keramat Nourijelyani ◽  
Afarin Rahimi-Movaghar ◽  
...  

Purpose The prevalence estimates of binary variables in sample surveys are often subject to two systematic errors: measurement error and nonresponse bias. A multiple-bias analysis is essential to adjust for both biases. Methods In this paper, we linked the latent class log-linear and proxy pattern-mixture models to adjust jointly for measurement errors and nonresponse bias with missing not at random mechanism. These methods were employed to estimate the prevalence of any illicit drug use based on Iranian Mental Health Survey data. Results After jointly adjusting for measurement errors and nonresponse bias in this data, the prevalence (95% confidence interval) estimate of any illicit drug use changed from 3.41 (3.00, 3.81)% to 27.03 (9.02, 38.76)%, 27.42 (9.04, 38.91)%, and 27.18 (9.03, 38.82)% under “missing at random,” “missing not at random,” and an intermediate mode, respectively. Conclusions Under certain assumptions, a combination of the latent class log-linear and binary-outcome proxy pattern-mixture models can be used to jointly adjust for both measurement errors and nonresponse bias in the prevalence estimation of binary variables in surveys.


2017 ◽  
Vol 28 (1) ◽  
pp. 134-150 ◽  
Author(s):  
Chi-hong Tseng ◽  
Yi-Hau Chen

It is common in longitudinal studies that missing data occur due to subjects’ no response, missed visits, dropout, death or other reasons during the course of study. To perform valid analysis in this setting, data missing not at random (MNAR) have to be considered. However, models for data MNAR often suffer from the identifiability issue and hence result in difficulty in estimation and computational convergence. To ameliorate this issue, we propose the LASSO and ridge-regularized selection models that regularize the missing data mechanism model to handle data MNAR, with the regularization parameter selected via a cross-validation procedure. The proposed models can be also employed for sensitivity analysis to examine the effects on inference of different assumptions about the missing data mechanism. We illustrate the performance of the proposed models via simulation studies and the analysis of data from a randomized clinical trial.


2010 ◽  
Vol 3 (2) ◽  
pp. 208-222 ◽  
Author(s):  
Victoria J. Cook ◽  
X. Joan Hu ◽  
Tim B. Swartz

Sign in / Sign up

Export Citation Format

Share Document