Supplemental Material for Subtypes of the Missing Not at Random Missing Data Mechanism

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.

Download Full-text

Regularized approach for data missing not at random

Statistical Methods in Medical Research ◽

10.1177/0962280217717760 ◽

2017 ◽

Vol 28 (1) ◽

pp. 134-150 ◽

Cited By ~ 2

Author(s):

Chi-hong Tseng ◽

Yi-Hau Chen

Keyword(s):

Clinical Trial ◽

Missing Data ◽

Cross Validation ◽

Regularization Parameter ◽

Simulation Studies ◽

Missing Not At Random ◽

Mechanism Model ◽

Missed Visits ◽

Validation Procedure ◽

Data Missing

It is common in longitudinal studies that missing data occur due to subjects’ no response, missed visits, dropout, death or other reasons during the course of study. To perform valid analysis in this setting, data missing not at random (MNAR) have to be considered. However, models for data MNAR often suffer from the identifiability issue and hence result in difficulty in estimation and computational convergence. To ameliorate this issue, we propose the LASSO and ridge-regularized selection models that regularize the missing data mechanism model to handle data MNAR, with the regularization parameter selected via a cross-validation procedure. The proposed models can be also employed for sensitivity analysis to examine the effects on inference of different assumptions about the missing data mechanism. We illustrate the performance of the proposed models via simulation studies and the analysis of data from a randomized clinical trial.

Download Full-text

Data Missing Not at Random in Mobile Health Research: Assessment of the Problem and a Case for Sensitivity Analyses

Journal of Medical Internet Research ◽

10.2196/26749 ◽

2021 ◽

Vol 23 (6) ◽

pp. e26749

Author(s):

Simon B Goldberg ◽

Daniel M Bolt ◽

Richard J Davidson

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Maximum Likelihood ◽

Multiple Imputation ◽

Mixture Model ◽

Mobile Health ◽

Sensitivity Analyses ◽

Research Assessment ◽

Missing Not At Random ◽

Pattern Mixture Model

Background Missing data are common in mobile health (mHealth) research. There has been little systematic investigation of how missingness is handled statistically in mHealth randomized controlled trials (RCTs). Although some missing data patterns (ie, missing at random [MAR]) may be adequately addressed using modern missing data methods such as multiple imputation and maximum likelihood techniques, these methods do not address bias when data are missing not at random (MNAR). It is typically not possible to determine whether the missing data are MAR. However, higher attrition in active (ie, intervention) versus passive (ie, waitlist or no treatment) conditions in mHealth RCTs raise a strong likelihood of MNAR, such as if active participants who benefit less from the intervention are more likely to drop out. Objective This study aims to systematically evaluate differential attrition and methods used for handling missingness in a sample of mHealth RCTs comparing active and passive control conditions. We also aim to illustrate a modern model-based sensitivity analysis and a simpler fixed-value replacement approach that can be used to evaluate the influence of MNAR. Methods We reanalyzed attrition rates and predictors of differential attrition in a sample of 36 mHealth RCTs drawn from a recent meta-analysis of smartphone-based mental health interventions. We systematically evaluated the design features related to missingness and its handling. Data from a recent mHealth RCT were used to illustrate 2 sensitivity analysis approaches (pattern-mixture model and fixed-value replacement approach). Results Attrition in active conditions was, on average, roughly twice that of passive controls. Differential attrition was higher in larger studies and was associated with the use of MAR-based multiple imputation or maximum likelihood methods. Half of the studies (18/36, 50%) used these modern missing data techniques. None of the 36 mHealth RCTs reviewed conducted a sensitivity analysis to evaluate the possible consequences of data MNAR. A pattern-mixture model and fixed-value replacement sensitivity analysis approaches were introduced. Results from a recent mHealth RCT were shown to be robust to missing data, reflecting worse outcomes in missing versus nonmissing scores in some but not all scenarios. A review of such scenarios helps to qualify the observations of significant treatment effects. Conclusions MNAR data because of differential attrition are likely in mHealth RCTs using passive controls. Sensitivity analyses are recommended to allow researchers to assess the potential impact of MNAR on trial results.

Download Full-text

A four-step strategy for handling missing outcome data in randomised trials affected by a pandemic

10.21203/rs.3.rs-32455/v2 ◽

2020 ◽

Author(s):

Suzie Cro ◽

Tim P Morris ◽

Brennan C Kahan ◽

Victoria R Cornelius ◽

James R Carpenter

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Treatment Effect ◽

Missing At Random ◽

Outcome Data ◽

Sensitivity Analyses ◽

Free World ◽

Randomised Trials ◽

Primary Analysis ◽

Missing Not At Random

Abstract Background: The coronavirus pandemic (Covid-19) presents a variety of challenges for ongoing clinical trials, including an inevitably higher rate of missing outcome data, with new and non-standard reasons for missingness. International drug trial guidelines recommend trialists review plans for handling missing data in the conduct and statistical analysis, but clear recommendations are lacking.Methods: We present a four-step strategy for handling missing outcome data in the analysis of randomised trials that are ongoing during a pandemic. We consider handling missing data arising due to (i) participant infection, (ii) treatment disruptions and (iii) loss to follow-up. We consider both settings where treatment effects for a ‘pandemic-free world’ and ‘world including a pandemic’ are of interest. Results: In any trial, investigators should; (1) Clarify the treatment estimand of interest with respect to the occurrence of the pandemic; (2) Establish what data are missing for the chosen estimand; (3) Perform primary analysis under the most plausible missing data assumptions followed by; (4) Sensitivity analysis under alternative plausible assumptions. To obtain an estimate of the treatment effect in a ‘pandemic-free world’, participant data that are clinically affected by the pandemic (directly due to infection or indirectly via treatment disruptions) are not relevant and can be set to missing. For primary analysis, a missing-at-random assumption that conditions on all observed data that are expected to be associated with both the outcome and missingness may be most plausible. For the treatment effect in the ‘world including a pandemic’, all participant data is relevant and should be included in the analysis. For primary analysis, a missing-at-random assumption – potentially incorporating a pandemic time-period indicator and participant infection status – or a missing-not-at-random assumption with a poorer response may be most relevant, depending on the setting. In all scenarios, sensitivity analysis under credible missing-not-at-random assumptions should be used to evaluate the robustness of results. We highlight controlled multiple imputation as an accessible tool for conducting sensitivity analyses.Conclusions: Missing data problems will be exacerbated for trials active during the Covid-19 pandemic. This four-step strategy will facilitate clear thinking about the appropriate analysis for relevant questions of interest.

Download Full-text

Evaluating the Performance of CART-Based Missing Data Methods Under a Missing Not at Random Mechanism

Multivariate Behavioral Research ◽

10.1080/00273171.2016.1264287 ◽

2017 ◽

Vol 52 (1) ◽

pp. 113-114

Author(s):

Timothy Hayes ◽

John J. McArdle

Keyword(s):

Missing Data ◽

Missing Not At Random ◽

Random Mechanism

Download Full-text

Handling missing data in an FFQ: multiple imputation and nutrient intake estimates

Public Health Nutrition ◽

10.1017/s1368980019000168 ◽

2019 ◽

Vol 22 (8) ◽

pp. 1351-1360 ◽

Cited By ~ 1

Author(s):

Mari Ichikawa ◽

Akihiro Hosono ◽

Yuya Tamai ◽

Miki Watanabe ◽

Kiyoshi Shibata ◽

...

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Nutrient Intake ◽

Missing Values ◽

Personal Characteristics ◽

Missing Not At Random ◽

Food Items ◽

Self Administered Questionnaire ◽

Better Than

AbstractObjectiveWe aimed to examine missing data in FFQ and to assess the effects on estimating dietary intake by comparing between multiple imputation and zero imputation.DesignWe used data from the Okazaki Japan Multi-Institutional Collaborative Cohort (J-MICC) study. A self-administered questionnaire including an FFQ was implemented at baseline (FFQ1) and 5-year follow-up (FFQ2). Missing values in FFQ2 were replaced by corresponding FFQ1 values, multiple imputation and zero imputation.SettingA methodological sub-study of the Okazaki J-MICC study.ParticipantsOf a total of 7585 men and women aged 35–79 years at baseline, we analysed data for 5120 participants who answered all items in FFQ1 and at least 50% of items in FFQ2.ResultsAmong 5120 participants, the proportion of missing data was 3·7%. The increasing number of missing food items in FFQ2 varied with personal characteristics. Missing food items not eaten often in FFQ2 were likely to represent zero intake in FFQ1. Most food items showed that the observed proportion of zero intake was likely to be similar to the probability that the missing value is zero intake. Compared with FFQ1 values, multiple imputation had smaller differences of total energy and nutrient estimates, except for alcohol, than zero imputation.ConclusionsOur results indicate that missing values due to zero intake, namely missing not at random, in FFQ can be predicted reasonably well from observed data. Multiple imputation performed better than zero imputation for most nutrients and may be applied to FFQ data when missing is low.

Download Full-text

Multiple Imputation with Missing Indicators as Proxies for Unmeasured Variables: Simulation Study

10.21203/rs.3.rs-24268/v3 ◽

2020 ◽

Author(s):

Matthew Sperrin ◽

Glen P. Martin

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Causal Effect ◽

Missing At Random ◽

Directed Acyclic Graphs ◽

Missing Not At Random ◽

Routinely Collected Health Data ◽

Effect Estimation ◽

Minimal Bias

Abstract Background : Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders. Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms. Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. In particular the approach: 1) does not introduce bias in missing (completely) at random scenarios; 2)reduces bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself; and 3) may reduce or increase bias when unmeasured confounding is present. Conclusion : In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.

Download Full-text

JMASM 54: A Comparison of Four Different Estimation Approaches for Prognostic Survival Oral Cancer Model

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1594045320 ◽

2020 ◽

Vol 18 (2) ◽

pp. 2-6

Author(s):

Thomas R. Knapp

Keyword(s):

Missing Data ◽

Oral Cancer ◽

Missing At Random ◽

Cancer Model ◽

Missing Not At Random ◽

Opposing View ◽

Missing Completely At Random ◽

Almost All

Rubin (1976, and elsewhere) claimed that there are three kinds of “missingness”: missing completely at random; missing at random; and missing not at random. He gave examples of each. The article that now follows takes an opposing view by arguing that almost all missing data are missing not at random.

Download Full-text

Data Missing Not at Random in Mobile Health Research: Assessment of the Problem and a Case for Sensitivity Analyses (Preprint)

10.2196/preprints.26749 ◽

2020 ◽

Author(s):

Simon B Goldberg ◽

Daniel M Bolt ◽

Richard J Davidson

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Maximum Likelihood ◽

Multiple Imputation ◽

Mixture Model ◽

Mobile Health ◽

Sensitivity Analyses ◽

Research Assessment ◽

Missing Not At Random ◽

Pattern Mixture Model

BACKGROUND Missing data are common in mobile health (mHealth) research. There has been little systematic investigation of how missingness is handled statistically in mHealth randomized controlled trials (RCTs). Although some missing data patterns (ie, missing at random [MAR]) may be adequately addressed using modern missing data methods such as multiple imputation and maximum likelihood techniques, these methods do not address bias when data are missing not at random (MNAR). It is typically not possible to determine whether the missing data are MAR. However, higher attrition in active (ie, intervention) versus passive (ie, waitlist or no treatment) conditions in mHealth RCTs raise a strong likelihood of MNAR, such as if active participants who benefit less from the intervention are more likely to drop out. OBJECTIVE This study aims to systematically evaluate differential attrition and methods used for handling missingness in a sample of mHealth RCTs comparing active and passive control conditions. We also aim to illustrate a modern model-based sensitivity analysis and a simpler fixed-value replacement approach that can be used to evaluate the influence of MNAR. METHODS We reanalyzed attrition rates and predictors of differential attrition in a sample of 36 mHealth RCTs drawn from a recent meta-analysis of smartphone-based mental health interventions. We systematically evaluated the design features related to missingness and its handling. Data from a recent mHealth RCT were used to illustrate 2 sensitivity analysis approaches (pattern-mixture model and fixed-value replacement approach). RESULTS Attrition in active conditions was, on average, roughly twice that of passive controls. Differential attrition was higher in larger studies and was associated with the use of MAR-based multiple imputation or maximum likelihood methods. Half of the studies (18/36, 50%) used these modern missing data techniques. None of the 36 mHealth RCTs reviewed conducted a sensitivity analysis to evaluate the possible consequences of data MNAR. A pattern-mixture model and fixed-value replacement sensitivity analysis approaches were introduced. Results from a recent mHealth RCT were shown to be robust to missing data, reflecting worse outcomes in missing versus nonmissing scores in some but not all scenarios. A review of such scenarios helps to qualify the observations of significant treatment effects. CONCLUSIONS MNAR data because of differential attrition are likely in mHealth RCTs using passive controls. Sensitivity analyses are recommended to allow researchers to assess the potential impact of MNAR on trial results.

Download Full-text

Improving Top-NRecommendation Performance Using Missing Data

Mathematical Problems in Engineering ◽

10.1155/2015/380472 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Xiangyu Zhao ◽

Zhendong Niu ◽

Kaiyi Wang ◽

Ke Niu ◽

Zhongqiang Liu

Keyword(s):

Missing Data ◽

Recommender Systems ◽

Matrix Factorization ◽

State Of The Art ◽

User Preferences ◽

Missing Not At Random ◽

Main Challenge ◽

Recommendation Algorithms ◽

Random Part ◽

Problem Data

Recommender systems become increasingly significant in solving the information explosion problem. Data sparse is a main challenge in this area. Massive unrated items constitute missing data with only a few observed ratings. Most studies consider missing data as unknown information and only use observed data to learn models and generate recommendations. However, data are missing not at random. Part of missing data is due to the fact that users choose not to rate them. This part of missing data is negative examples of user preferences. Utilizing this information is expected to leverage the performance of recommendation algorithms. Unfortunately, negative examples are mixed with unlabeled positive examples in missing data, and they are hard to be distinguished. In this paper, we propose three schemes to utilize the negative examples in missing data. The schemes are then adapted with SVD++, which is a state-of-the-art matrix factorization recommendation approach, to generate recommendations. Experimental results on two real datasets show that our proposed approaches gain better top-Nperformance than the baseline ones on both accuracy and diversity.

Download Full-text