Data Missing Not at Random in Mobile Health Research: Assessment of the Problem and a Case for Sensitivity Analyses

Background Missing data are common in mobile health (mHealth) research. There has been little systematic investigation of how missingness is handled statistically in mHealth randomized controlled trials (RCTs). Although some missing data patterns (ie, missing at random [MAR]) may be adequately addressed using modern missing data methods such as multiple imputation and maximum likelihood techniques, these methods do not address bias when data are missing not at random (MNAR). It is typically not possible to determine whether the missing data are MAR. However, higher attrition in active (ie, intervention) versus passive (ie, waitlist or no treatment) conditions in mHealth RCTs raise a strong likelihood of MNAR, such as if active participants who benefit less from the intervention are more likely to drop out. Objective This study aims to systematically evaluate differential attrition and methods used for handling missingness in a sample of mHealth RCTs comparing active and passive control conditions. We also aim to illustrate a modern model-based sensitivity analysis and a simpler fixed-value replacement approach that can be used to evaluate the influence of MNAR. Methods We reanalyzed attrition rates and predictors of differential attrition in a sample of 36 mHealth RCTs drawn from a recent meta-analysis of smartphone-based mental health interventions. We systematically evaluated the design features related to missingness and its handling. Data from a recent mHealth RCT were used to illustrate 2 sensitivity analysis approaches (pattern-mixture model and fixed-value replacement approach). Results Attrition in active conditions was, on average, roughly twice that of passive controls. Differential attrition was higher in larger studies and was associated with the use of MAR-based multiple imputation or maximum likelihood methods. Half of the studies (18/36, 50%) used these modern missing data techniques. None of the 36 mHealth RCTs reviewed conducted a sensitivity analysis to evaluate the possible consequences of data MNAR. A pattern-mixture model and fixed-value replacement sensitivity analysis approaches were introduced. Results from a recent mHealth RCT were shown to be robust to missing data, reflecting worse outcomes in missing versus nonmissing scores in some but not all scenarios. A review of such scenarios helps to qualify the observations of significant treatment effects. Conclusions MNAR data because of differential attrition are likely in mHealth RCTs using passive controls. Sensitivity analyses are recommended to allow researchers to assess the potential impact of MNAR on trial results.

Download Full-text

Data Missing Not at Random in Mobile Health Research: Assessment of the Problem and a Case for Sensitivity Analyses (Preprint)

10.2196/preprints.26749 ◽

2020 ◽

Author(s):

Simon B Goldberg ◽

Daniel M Bolt ◽

Richard J Davidson

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Maximum Likelihood ◽

Multiple Imputation ◽

Mixture Model ◽

Mobile Health ◽

Sensitivity Analyses ◽

Research Assessment ◽

Missing Not At Random ◽

Pattern Mixture Model

BACKGROUND Missing data are common in mobile health (mHealth) research. There has been little systematic investigation of how missingness is handled statistically in mHealth randomized controlled trials (RCTs). Although some missing data patterns (ie, missing at random [MAR]) may be adequately addressed using modern missing data methods such as multiple imputation and maximum likelihood techniques, these methods do not address bias when data are missing not at random (MNAR). It is typically not possible to determine whether the missing data are MAR. However, higher attrition in active (ie, intervention) versus passive (ie, waitlist or no treatment) conditions in mHealth RCTs raise a strong likelihood of MNAR, such as if active participants who benefit less from the intervention are more likely to drop out. OBJECTIVE This study aims to systematically evaluate differential attrition and methods used for handling missingness in a sample of mHealth RCTs comparing active and passive control conditions. We also aim to illustrate a modern model-based sensitivity analysis and a simpler fixed-value replacement approach that can be used to evaluate the influence of MNAR. METHODS We reanalyzed attrition rates and predictors of differential attrition in a sample of 36 mHealth RCTs drawn from a recent meta-analysis of smartphone-based mental health interventions. We systematically evaluated the design features related to missingness and its handling. Data from a recent mHealth RCT were used to illustrate 2 sensitivity analysis approaches (pattern-mixture model and fixed-value replacement approach). RESULTS Attrition in active conditions was, on average, roughly twice that of passive controls. Differential attrition was higher in larger studies and was associated with the use of MAR-based multiple imputation or maximum likelihood methods. Half of the studies (18/36, 50%) used these modern missing data techniques. None of the 36 mHealth RCTs reviewed conducted a sensitivity analysis to evaluate the possible consequences of data MNAR. A pattern-mixture model and fixed-value replacement sensitivity analysis approaches were introduced. Results from a recent mHealth RCT were shown to be robust to missing data, reflecting worse outcomes in missing versus nonmissing scores in some but not all scenarios. A review of such scenarios helps to qualify the observations of significant treatment effects. CONCLUSIONS MNAR data because of differential attrition are likely in mHealth RCTs using passive controls. Sensitivity analyses are recommended to allow researchers to assess the potential impact of MNAR on trial results.

Download Full-text

A four-step strategy for handling missing outcome data in randomised trials affected by a pandemic

10.21203/rs.3.rs-32455/v2 ◽

2020 ◽

Author(s):

Suzie Cro ◽

Tim P Morris ◽

Brennan C Kahan ◽

Victoria R Cornelius ◽

James R Carpenter

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Treatment Effect ◽

Missing At Random ◽

Outcome Data ◽

Sensitivity Analyses ◽

Free World ◽

Randomised Trials ◽

Primary Analysis ◽

Missing Not At Random

Abstract Background: The coronavirus pandemic (Covid-19) presents a variety of challenges for ongoing clinical trials, including an inevitably higher rate of missing outcome data, with new and non-standard reasons for missingness. International drug trial guidelines recommend trialists review plans for handling missing data in the conduct and statistical analysis, but clear recommendations are lacking.Methods: We present a four-step strategy for handling missing outcome data in the analysis of randomised trials that are ongoing during a pandemic. We consider handling missing data arising due to (i) participant infection, (ii) treatment disruptions and (iii) loss to follow-up. We consider both settings where treatment effects for a ‘pandemic-free world’ and ‘world including a pandemic’ are of interest. Results: In any trial, investigators should; (1) Clarify the treatment estimand of interest with respect to the occurrence of the pandemic; (2) Establish what data are missing for the chosen estimand; (3) Perform primary analysis under the most plausible missing data assumptions followed by; (4) Sensitivity analysis under alternative plausible assumptions. To obtain an estimate of the treatment effect in a ‘pandemic-free world’, participant data that are clinically affected by the pandemic (directly due to infection or indirectly via treatment disruptions) are not relevant and can be set to missing. For primary analysis, a missing-at-random assumption that conditions on all observed data that are expected to be associated with both the outcome and missingness may be most plausible. For the treatment effect in the ‘world including a pandemic’, all participant data is relevant and should be included in the analysis. For primary analysis, a missing-at-random assumption – potentially incorporating a pandemic time-period indicator and participant infection status – or a missing-not-at-random assumption with a poorer response may be most relevant, depending on the setting. In all scenarios, sensitivity analysis under credible missing-not-at-random assumptions should be used to evaluate the robustness of results. We highlight controlled multiple imputation as an accessible tool for conducting sensitivity analyses.Conclusions: Missing data problems will be exacerbated for trials active during the Covid-19 pandemic. This four-step strategy will facilitate clear thinking about the appropriate analysis for relevant questions of interest.

Download Full-text

A review of the use of controlled multiple imputation in randomised controlled trials with missing outcome data

BMC Medical Research Methodology ◽

10.1186/s12874-021-01261-6 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Ping-Tee Tan ◽

Suzie Cro ◽

Eleanor Van Vogt ◽

Matyas Szigeti ◽

Victoria R. Cornelius

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Multiple Imputation ◽

Randomised Controlled Trials ◽

Missing At Random ◽

Sensitivity Analyses ◽

Controlled Trials ◽

Primary Analysis ◽

Randomised Controlled ◽

The Impact

Abstract Background Missing data are common in randomised controlled trials (RCTs) and can bias results if not handled appropriately. A statistically valid analysis under the primary missing-data assumptions should be conducted, followed by sensitivity analysis under alternative justified assumptions to assess the robustness of results. Controlled Multiple Imputation (MI) procedures, including delta-based and reference-based approaches, have been developed for analysis under missing-not-at-random assumptions. However, it is unclear how often these methods are used, how they are reported, and what their impact is on trial results. This review evaluates the current use and reporting of MI and controlled MI in RCTs. Methods A targeted review of phase II-IV RCTs (non-cluster randomised) published in two leading general medical journals (The Lancet and New England Journal of Medicine) between January 2014 and December 2019 using MI. Data was extracted on imputation methods, analysis status, and reporting of results. Results of primary and sensitivity analyses for trials using controlled MI analyses were compared. Results A total of 118 RCTs (9% of published RCTs) used some form of MI. MI under missing-at-random was used in 110 trials; this was for primary analysis in 43/118 (36%), and in sensitivity analysis for 70/118 (59%) (3 used in both). Sixteen studies performed controlled MI (1.3% of published RCTs), either with a delta-based (n = 9) or reference-based approach (n = 7). Controlled MI was mostly used in sensitivity analysis (n = 14/16). Two trials used controlled MI for primary analysis, including one reporting no sensitivity analysis whilst the other reported similar results without imputation. Of the 14 trials using controlled MI in sensitivity analysis, 12 yielded comparable results to the primary analysis whereas 2 demonstrated contradicting results. Only 5/110 (5%) trials using missing-at-random MI and 5/16 (31%) trials using controlled MI reported complete details on MI methods. Conclusions Controlled MI enabled the impact of accessible contextually relevant missing data assumptions to be examined on trial results. The use of controlled MI is increasing but is still infrequent and poorly reported where used. There is a need for improved reporting on the implementation of MI analyses and choice of controlled MI parameters.

Download Full-text

A four-step strategy for handling missing outcome data in randomised trials affected by a pandemic

10.21203/rs.3.rs-32455/v1 ◽

2020 ◽

Author(s):

Suzie Cro ◽

Tim P Morris ◽

Brennan C Kahan ◽

Victoria R Cornelius ◽

James R Carpenter

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Treatment Effect ◽

Missing At Random ◽

Outcome Data ◽

Sensitivity Analyses ◽

Free World ◽

Randomised Trials ◽

Primary Analysis ◽

Missing Not At Random

Abstract Background The coronavirus pandemic (Covid-19) presents a variety of challenges for ongoing clinical trials, including an inevitably higher rate of missing outcome data, with new and non-standard reasons for missingness. International drug trial guidelines recommend trialists review plans for handling missing data in the conduct and statistical analysis, but clear recommendations are lacking. Methods We present a four-step strategy for handling missing outcome data in the analysis of randomised trials that are ongoing during a pandemic. We consider handling missing data arising due to (i) participant infection, (ii) treatment disruptions and (iii) loss to follow-up. We consider both settings where treatment effects for a ‘pandemic-free world’ and ‘world including a pandemic’ are of interest. Results In any trial, investigators should; (1) Clarify the treatment estimand of interest; (2) Establish what data are missing for the estimand at hand; (3) Perform primary analysis under the most plausible missing data assumptions followed by; (4) Sensitivity analysis under alternative plausible assumptions. To obtain an estimate of the treatment effect in a ‘pandemic-free world’, data from participants clinically affected by the pandemic (directly via infection or indirectly via treatment disruptions) are not relevant and can be set to missing. For primary analysis, a missing-at-random assumption that conditions on all observed data that are expected to be associated with both the outcome and missingness may be most plausible. For the treatment effect in the ‘world including a pandemic’, all participant data is relevant and should be included in the analysis. For primary analysis, a missing-at-random assumption – potentially incorporating a pandemic time-period indicator and participant infection status – or a missing-not-at-random assumption with a poorer response may be most relevant, depending on the setting. In all scenarios, sensitivity analysis under credible missing-not-at-random assumptions should be used to evaluate the robustness of results. We highlight controlled multiple imputation as an accessible tool for conducting sensitivity analyses. Conclusions Missing data problems will be exacerbated for trials active during the Covid-19 pandemic. This four-step strategy will facilitate clear thinking about the appropriate analysis for relevant questions of interest.

Download Full-text

Handling missing data in modelling quality of clinician-prescribed routine care: Sensitivity analysis of departure from missing at random assumption

Statistical Methods in Medical Research ◽

10.1177/0962280220918279 ◽

2020 ◽

Vol 29 (10) ◽

pp. 3076-3092 ◽

Cited By ~ 1

Author(s):

Susan Gachau ◽

Matteo Quartagno ◽

Edmund Njeru Njagi ◽

Nelson Owuor ◽

Mike English ◽

...

Keyword(s):

Sensitivity Analysis ◽

Missing Data ◽

Multiple Imputation ◽

Missing At Random ◽

Parameter Estimates ◽

Analysis Model ◽

Major Drawback ◽

Missing Not At Random ◽

Prior Distributions ◽

Random Mechanism

Missing information is a major drawback in analyzing data collected in many routine health care settings. Multiple imputation assuming a missing at random mechanism is a popular method to handle missing data. The missing at random assumption cannot be confirmed from the observed data alone, hence the need for sensitivity analysis to assess robustness of inference. However, sensitivity analysis is rarely conducted and reported in practice. We analyzed routine paediatric data collected during a cluster randomized trial conducted in Kenyan hospitals. We imputed missing patient and clinician-level variables assuming the missing at random mechanism. We also imputed missing clinician-level variables assuming a missing not at random mechanism. We incorporated opinions from 15 clinical experts in the form of prior distributions and shift parameters in the delta adjustment method. An interaction between trial intervention arm and follow-up time, hospital, clinician and patient-level factors were included in a proportional odds random-effects analysis model. We performed these analyses using R functions derived from the jomo package. Parameter estimates from multiple imputation under the missing at random mechanism were similar to multiple imputation estimates assuming the missing not at random mechanism. Our inferences were insensitive to departures from the missing at random assumption using either the prior distributions or shift parameters sensitivity analysis approach.

Download Full-text

Weighted multiple imputation of ethnicity data that are missing not at random in primary care databases

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.54 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Tra My Pham ◽

Irene Petersen ◽

James Carpenter ◽

Tim Morris

Keyword(s):

Primary Care ◽

Missing Data ◽

Multiple Imputation ◽

Simulation Study ◽

Case Analysis ◽

Missing At Random ◽

Complete Case ◽

Missing Not At Random ◽

Health Records ◽

Ethnicity Data

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.

Download Full-text

Cost-effectiveness of mobile health-based integrated care for atrial fibrillation: Model development and data analysis (Preprint)

10.2196/preprints.29408 ◽

2021 ◽

Author(s):

Xueyan Luo ◽

Wei Xu ◽

Quan Yuan ◽

Han Lai ◽

Chunji Huang

Keyword(s):

Atrial Fibrillation ◽

Sensitivity Analysis ◽

Cost Effectiveness ◽

Integrated Care ◽

Mobile Health ◽

Cost Effective ◽

Sensitivity Analyses ◽

Base Case ◽

Life Years ◽

Mhealth Technology

BACKGROUND Mobile health (mhealth) technology is increasingly used in disease management. Using mhealth tools to integrate and streamline care was found to improve atrial fibrillation (AF) patients’ clinical outcomes. OBJECTIVE This study aimed to investigate the potential clinical and health economic outcomes of mhealth-based integrated care for AF from the perspective of a public healthcare provider in China. METHODS A Markov model was designed to compare outcomes of mhealth-based care and usual care in a hypothetical cohort of AF patients in China. The time horizon was 30 years with monthly cycles. Model outcomes measured were direct medical cost, quality-adjusted life-years (QALYs), and incremental cost-effectiveness ratios (ICERs). Sensitivity analyses were conducted to examine the robustness of base-case results. RESULTS In the base-case analysis, mhealth-based care gained higher QALYs of 0.0818 with an incurred cost of USD1,778. Using USD33,438 per QALY (three times gross domestic product) as the willingness-to-pay threshold, mhealth-based care was cost-effective, with an ICER of USD21,739 per QALY. The one-way sensitivity analysis found compliance to mhealth-based care had the greatest impact on the ICER. In probabilistic sensitivity analysis, mhealth-based care was accepted as cost-effective in 80.91% of 10,000 iterations. CONCLUSIONS This study suggested that the use of mhealth technology in streamlining and integrating care for AF patients was cost-effective in China.

Download Full-text

Best (but oft-forgotten) practices: missing data methods in randomized controlled nutrition trials

American Journal of Clinical Nutrition ◽

10.1093/ajcn/nqy271 ◽

2019 ◽

Vol 109 (3) ◽

pp. 504-508 ◽

Cited By ~ 7

Author(s):

Peng Li ◽

Elizabeth A Stuart

Keyword(s):

Missing Data ◽

Maximum Likelihood ◽

Causal Inference ◽

Randomized Controlled Trials ◽

Multiple Imputation ◽

Controlled Trials ◽

Full Information ◽

Complete Case ◽

Full Information Maximum Likelihood ◽

Randomized Controlled

ABSTRACT Missing data ubiquitously occur in randomized controlled trials and may compromise the causal inference if inappropriately handled. Some problematic missing data methods such as complete case (CC) analysis and last-observation-carried-forward (LOCF) are unfortunately still common in nutrition trials. This situation is partially caused by investigator confusion on missing data assumptions for different methods. In this statistical guidance, we provide a brief introduction of missing data mechanisms and the unreasonable assumptions that underlie CC and LOCF and recommend 2 appropriate missing data methods: multiple imputation and full information maximum likelihood.

Download Full-text