scholarly journals P427 A hybrid approach of handling missing data in inflammatory bowel disease (IBD) trials: results from VISIBLE 1 and VARSITY

2020 ◽  
Vol 14 (Supplement_1) ◽  
pp. S388-S389
Author(s):  
J Chen ◽  
S Hunter ◽  
K Kisfalvi ◽  
R A Lirio

Abstract Background Missing data is common in IBD trials. Depending on the volume and nature of missing data, it can reduce statistical power for detecting treatment difference, introduce potential bias and invalidate conclusions. Non-responder imputation (NRI), where patients (patients) with missing data are considered treatment failures, is widely used to handle missing data for dichotomous efficacy endpoints in IBD trials. However, it does not consider the mechanisms leading to missing data and can potentially underestimate the treatment effect. We proposed a hybrid (HI) approach combining NRI and multiple imputation (MI) as an alternative to NRI in the analyses of two phase 3 trials of vedolizumab (VDZ) in patients with moderate-to-severe UC – VISIBLE 11 and VARSITY2. Methods VISIBLE 1 and VARSITY assessed efficacy using dichotomous endpoints based on complete Mayo score. Full methodologies reported previously.1,2 Our proposed HI approach is aimed at imputing missing Mayo scores, instead of imputing the missing dichotomous efficacy endpoint. To assess the impact of dropouts for different missing data mechanisms (categorised as ‘missing not at random [MNAR]’ and ‘missing at random [MAR]’, HI was implemented as a potential sensitivity analysis, where dropouts owing to safety or lack of efficacy were imputed using NRI (assuming MNAR) and other missing data were imputed using MI (assuming MAR). For MI, each component of the Mayo score was imputed via a multivariate stepwise approach using a fully conditional specification ordinal logistic method. Missing baseline scores were imputed using baseline characteristics data. Missing scores from each subsequent visit were imputed using all previous visits in a stepwise fashion. Fifty imputation datasets were computed for each component of Mayo score. The complete Mayo score and relevant efficacy endpoints were derived subsequently. The analysis was performed within each imputed dataset to determine treatment difference, 95% CI and p-value, which were then combined via Rubin’s rules3. Results Tables 1 and 2 show a comparison of efficacy in the two studies using the primary NRI analysis vs. the alternative HI approach for handling missing data. Conclusion HI and NRI approaches can provide consistent efficacy analyses in IBD trials. The HI approach can serve as a useful sensitivity analysis to assess the impact of dropouts under different missing data mechanisms and evaluate the robustness of efficacy conclusions. Reference

2011 ◽  
Vol 26 (S2) ◽  
pp. 572-572
Author(s):  
N. Resseguier ◽  
H. Verdoux ◽  
F. Clavel-Chapelon ◽  
X. Paoletti

IntroductionThe CES-D scale is commonly used to assess depressive symptoms (DS) in large population-based studies. Missing values in items of the scale may create biases.ObjectivesTo explore reasons for not completing items of the CES-D scale and to perform sensitivity analysis of the prevalence of DS to assess the impact of different missing data hypotheses.Methods71412 women included in the French E3N cohort returned in 2005 a questionnaire containing the CES-D scale. 45% presented at least one missing value in the scale. An interview study was carried out on a random sample of 204 participants to examine the different hypotheses for the missing value mechanism. The prevalence of DS was estimated according to different methods for handling missing values: complete cases analysis, single imputation, multiple imputation under MAR (missing at random) and MNAR (missing not at random) assumptions.ResultsThe interviews showed that participants were not embarrassed to fill in questions about DS. Potential reasons of nonresponse were identified. MAR and MNAR hypotheses remained plausible and were explored.Among complete responders, the prevalence of DS was 26.1%. After multiple imputation under MAR assumption, it was 28.6%, 29.8% and 31.7% among women presenting up to 4, to 10 and to 20 missing values, respectively. The estimates were robust after applying various scenarios of MNAR data for the sensitivity analysis.ConclusionsThe CES-D scale can easily be used to assess DS in large cohorts. Multiple imputation under MAR assumption allows to reliably handle missing values.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ping-Tee Tan ◽  
Suzie Cro ◽  
Eleanor Van Vogt ◽  
Matyas Szigeti ◽  
Victoria R. Cornelius

Abstract Background Missing data are common in randomised controlled trials (RCTs) and can bias results if not handled appropriately. A statistically valid analysis under the primary missing-data assumptions should be conducted, followed by sensitivity analysis under alternative justified assumptions to assess the robustness of results. Controlled Multiple Imputation (MI) procedures, including delta-based and reference-based approaches, have been developed for analysis under missing-not-at-random assumptions. However, it is unclear how often these methods are used, how they are reported, and what their impact is on trial results. This review evaluates the current use and reporting of MI and controlled MI in RCTs. Methods A targeted review of phase II-IV RCTs (non-cluster randomised) published in two leading general medical journals (The Lancet and New England Journal of Medicine) between January 2014 and December 2019 using MI. Data was extracted on imputation methods, analysis status, and reporting of results. Results of primary and sensitivity analyses for trials using controlled MI analyses were compared. Results A total of 118 RCTs (9% of published RCTs) used some form of MI. MI under missing-at-random was used in 110 trials; this was for primary analysis in 43/118 (36%), and in sensitivity analysis for 70/118 (59%) (3 used in both). Sixteen studies performed controlled MI (1.3% of published RCTs), either with a delta-based (n = 9) or reference-based approach (n = 7). Controlled MI was mostly used in sensitivity analysis (n = 14/16). Two trials used controlled MI for primary analysis, including one reporting no sensitivity analysis whilst the other reported similar results without imputation. Of the 14 trials using controlled MI in sensitivity analysis, 12 yielded comparable results to the primary analysis whereas 2 demonstrated contradicting results. Only 5/110 (5%) trials using missing-at-random MI and 5/16 (31%) trials using controlled MI reported complete details on MI methods. Conclusions Controlled MI enabled the impact of accessible contextually relevant missing data assumptions to be examined on trial results. The use of controlled MI is increasing but is still infrequent and poorly reported where used. There is a need for improved reporting on the implementation of MI analyses and choice of controlled MI parameters.


2020 ◽  
Author(s):  
Suzie Cro ◽  
Tim P Morris ◽  
Brennan C Kahan ◽  
Victoria R Cornelius ◽  
James R Carpenter

Abstract Background: The coronavirus pandemic (Covid-19) presents a variety of challenges for ongoing clinical trials, including an inevitably higher rate of missing outcome data, with new and non-standard reasons for missingness. International drug trial guidelines recommend trialists review plans for handling missing data in the conduct and statistical analysis, but clear recommendations are lacking.Methods: We present a four-step strategy for handling missing outcome data in the analysis of randomised trials that are ongoing during a pandemic. We consider handling missing data arising due to (i) participant infection, (ii) treatment disruptions and (iii) loss to follow-up. We consider both settings where treatment effects for a ‘pandemic-free world’ and ‘world including a pandemic’ are of interest. Results: In any trial, investigators should; (1) Clarify the treatment estimand of interest with respect to the occurrence of the pandemic; (2) Establish what data are missing for the chosen estimand; (3) Perform primary analysis under the most plausible missing data assumptions followed by; (4) Sensitivity analysis under alternative plausible assumptions. To obtain an estimate of the treatment effect in a ‘pandemic-free world’, participant data that are clinically affected by the pandemic (directly due to infection or indirectly via treatment disruptions) are not relevant and can be set to missing. For primary analysis, a missing-at-random assumption that conditions on all observed data that are expected to be associated with both the outcome and missingness may be most plausible. For the treatment effect in the ‘world including a pandemic’, all participant data is relevant and should be included in the analysis. For primary analysis, a missing-at-random assumption – potentially incorporating a pandemic time-period indicator and participant infection status – or a missing-not-at-random assumption with a poorer response may be most relevant, depending on the setting. In all scenarios, sensitivity analysis under credible missing-not-at-random assumptions should be used to evaluate the robustness of results. We highlight controlled multiple imputation as an accessible tool for conducting sensitivity analyses.Conclusions: Missing data problems will be exacerbated for trials active during the Covid-19 pandemic. This four-step strategy will facilitate clear thinking about the appropriate analysis for relevant questions of interest.


2020 ◽  
pp. 1471082X2092711
Author(s):  
Grigorios Papageorgiou ◽  
Dimitris Rizopoulos

Dropout is a common complication in longitudinal studies, especially since the distinction between missing not at random (MNAR) and missing at random (MAR) dropout is intractable. Consequently, one starts with an analysis that is valid under MAR and then performs a sensitivity analysis by considering MNAR departures from it. To this end, specific classes of joint models, such as pattern-mixture models (PMMs) and selection models (SeMs), have been proposed. On the contrary, shared-parameter models (SPMs) have received less attention, possibly because they do not embody a characterization of MAR. A few approaches to achieve MAR in SPMs exist, but are difficult to implement in existing software. In this article, we focus on SPMs for incomplete longitudinal and time-to-dropout data and propose an alternative characterization of MAR by exploiting the conditional independence assumption, under which outcome and missingness are independent given a set of random effects. By doing so, the censoring distribution can be utilized to cover a wide range of assumptions for the missing data mechanism on the subject-specific level. This approach offers substantial advantages over its counterparts and can be easily implemented in existing software. More specifically, it offers flexibility over the assumption for the missing data generating mechanism that governs dropout by allowing subject-specific perturbations of the censoring distribution, whereas in PMMs and SeMs dropout is considered MNAR strictly.


2021 ◽  
Author(s):  
Trenton J. Davis ◽  
Tarek R. Firzli ◽  
Emily A. Higgins Keppler ◽  
Matt Richardson ◽  
Heather D. Bean

Missing data is a significant issue in metabolomics that is often neglected when conducting data pre-processing, particularly when it comes to imputation. This can have serious implications for downstream statistical analyses and lead to misleading or uninterpretable inferences. In this study, we aim to identify the primary types of missingness that affect untargeted metab-olomics data and compare strategies for imputation using two real-world comprehensive two-dimensional gas chromatog-raphy (GC×GC) data sets. We also present these goals in the context of experimental replication whereby imputation is con-ducted in a within-replicate-based fashion—the first description and evaluation of this strategy—and introduce an R package MetabImpute to carry out these analyses. Our results conclude that, in these two data sets, missingness was most likely of the missing at-random (MAR) and missing not-at-random (MNAR) types as opposed to missing completely at-random (MCAR). Gibbs sampler imputation and Random Forest gave the best results when imputing MAR and MNAR compared against single-value imputation (zero, minimum, mean, median, and half-minimum) and other more sophisticated approach-es (Bayesian principal components analysis and quantile regression imputation for left-censored data). When samples are replicated, within-replicate imputation approaches led to an increase in the reproducibility of peak quantification compared to imputation that ignores replication, suggesting that imputing with respect to replication may preserve potentially im-portant features in downstream analyses for biomarker discovery.


Author(s):  
Seçil Ömür Sünbül

<p>In this study, it was aimed to investigate the impact of different missing data handling methods on DINA model parameter estimation and classification accuracy. In the study, simulated data were used and the data were generated by manipulating the number of items and sample size. In the generated data, two different missing data mechanisms (missing completely at random and missing at random) were created according to three different amounts of missing data. The generated missing data was completed by using methods of treating missing data as incorrect, person mean imputation, two-way imputation, and expectation-maximization algorithm imputation. As a result, it was observed that both s and g parameter estimations and classification accuracies were effected from, missing data rates, missing data handling methods and missing data mechanisms.</p>


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e15011-e15011
Author(s):  
Qiu Li ◽  
Mengxi Zhang

e15011 Background: Survival benefit of regorafenib and fruquintinib as third-line agents have been respectively demonstrated in patients with treatment-refractory metastatic colorectal cancer. This study tries to explore the cost-effectiveness of the two agents. Methods: A Markov model was performed based on two phase 3 trials, FRESCO and CONCUR. Health outcomes were measured with quality-adjusted life-years (QALYs). The key outcome was incremental cost-effectiveness ratio (ICER). Probabilistic sensitivity and one-way sensitivity analysis were performed to estimate the impact of essential variables on the results of the analysis. Results: No statistical differences were observed in the baseline patient characteristics, except that the CONCUR trial enrolled older patients and higher ratios of prior use of VEGF or EGFR antibodies in comparison with the FRESCO trial.Treatment with fruquintinib was estimated to cost $25,550.15 with an effectiveness gain of 0.54 QALYs, whereas regorafenib resulted in 0.53 QALY at a mean cost of $29,681.52, yielding ICER of $-413,137.00 per QALY. By using treble the Chinese Gross Domestic Product per Capita as willingness-to-pay threshold, the probability for fruquintinib being cost-effective was higher than regorafenib in the probabilistic sensitivity analysis. Conclusions: Fruquintinib provides a more cost-effective option for metastatic colorectal patients compared with regorafenib in the third line treatment.[Table: see text]


2021 ◽  
Vol 10 (21) ◽  
pp. 4897
Author(s):  
Lisa Goudman ◽  
Geert Molenberghs ◽  
Rui V. Duarte ◽  
Maarten Moens

New waveforms have changed the field of Spinal Cord Stimulation (SCS) to optimize therapy outcomes, among which is High-Dose SCS (HD-SCS). Missing observations are often encountered when conducting clinical trials in this field. In this study, different approaches with varying assumptions were constructed to evaluate how conclusions may be influenced by these assumptions. The aim is to perform a tipping point sensitivity analysis to evaluate the influence of missing data on the overall conclusion regarding the effectiveness of HD-SCS on disability. Data from the Discover study were used, in which 185 patients with Failed Back Surgery Syndrome were included. Disability was evaluated before SCS and after 1, 3 and 12 months of HD-SCS. During the second, third and fourth visit, data from 130, 114 and 90 patients were available, respectively. HD-SCS resulted in a significant decrease in disability scores based on the analysis of observed data and with multiple imputations. The tipping point sensitivity analysis revealed that the shift parameter was 17. Thus, the conclusion concerning the time effect under a “missing at random” mechanism is robust when the shift parameter for the disability score is 17. From a clinical point of view, a shift of 17 points on disability is not very plausible. Therefore we tend to consider the conclusions drawn under “missing at random” as being robust.


Blood ◽  
2017 ◽  
Vol 130 (Suppl_1) ◽  
pp. 707-707
Author(s):  
Suzanne F Fustolo-Gunnink ◽  
K Fijnvandraat ◽  
I M Ree ◽  
C Caram-Deelder ◽  
P Andriessen ◽  
...  

Abstract Introduction Limited evidence supports the widely used practice of administering platelet transfusions to prevent major bleeding in preterm thrombocytopenic neonates. Only 1 randomized controlled trial addressed this issue, but used thresholds higher than those currently used in clinical practice. In order to assess the impact of platelet transfusions on bleeding risk, the primary objective of this study was to develop a prediction model for bleeding. Platelet transfusion was included as variable in this model. In these secondary analyses, we further explored the impact of platelet transfusions on bleeding risk. Materials and methods In this multicenter cohort study, neonates with a gestational age (GA) &lt;34 weeks at birth, admitted to a neonatal intensive care unit (NICU) who developed a platelet count &lt;50x109/L were included. The main study endpoint was major bleeding, defined as intraventricular hemorrhage (IVH) grade 3, IVH with parenchymal involvement, other types of intracranial hemorrhage visible on ultrasound scans, pulmonary hemorrhage or any other type of bleeding requiring immediate interventions. The prediction model was developed using landmarking, in which multiple cox models at regular time-points were combined into 1 supermodel. To further explore the impact of platelet transfusions on bleeding risk, we performed 3 sensitivity analyses by selecting specific transfusions (instead of all transfusions). Sensitivity analysis 1 : transfusions according to protocol, defined as transfusions for platelet counts &gt;20x109/L only allowed in case of GA&lt;32 weeks and &lt;1500 grams and presence of NEC, sepsis, or treatment with mechanical ventilation, or in case of invasive procedures. Sensitivity analysis 2: transfusions with fair increments, defined as platelet count ≥50x109/L within 24 hours. Sensitivity analysis 3: transfusion dose 11 ml/kg or higher. Results A total of 640 neonates were included with a median gestational age of 28 weeks. 70 neonates developed a major bleed. IUGR, postnatal age, platelet count and mechanical ventilation were independent predictors of bleeding. The model allowed calculation of two bleeding risks for individual neonates: one in case of platelet transfusion and one in case of no platelet transfusion. 1361 platelet transfusions were administered to 449 of 640 (70%) neonates, of which 87 were hyperconcentrates. The hazard ratio for transfusion in the original model was 1.0, indicating no predictive power. Sensitivity analysis 1: 704 (52%) transfusions were given according to protocol. When selecting these transfusions, the hazard ratio for transfusion changed from 1.0 to 0.5, but the p-value remained &gt; 0.05.Sensitivity analysis 2: 764 (56%) of transfusions resulted in a count &gt;50x109/L within 24 hours. When selecting these transfusions, the hazard ratio for transfusion changed from 1.0 to 0.25, but the p-value remained &gt;0.05. 115 (8%) transfusions did not have a follow up platelet count within 24 hours. Sensitivity analysis 3: of the non-hyperconcentrated platelet transfusions, 517 of 1274 (41%) transfusions were ≥ 11 ml/kg. When selecting these transfusions, the hazard ratio for transfusion changed from 1.0 to 0.1, with a p-value of 0.05. Conclusion With this tool, absolute risk of bleeding in individual preterm thrombocytopenic neonates can be calculated. Additionally, risk of bleeding can be assessed for 2 scenarios: with and without platelet transfusion. This can help clinicians in deciding whether or not to transfuse a patient. In the primary model, platelet transfusion was not a predictor for bleeding risk. However, the findings of the sensitivity analyses suggest that transfusions with a dose &gt; 11ml/kg may have a more profound effect on bleeding risk. Disclosures No relevant conflicts of interest to declare.


2019 ◽  
Vol 9 (19) ◽  
pp. 4103 ◽  
Author(s):  
Hema Sekhar Reddy Rajula ◽  
Veronika Odintsova ◽  
Mirko Manchia ◽  
Vassilios Fanos

Cohorts are instrumental for epidemiologically oriented observational studies. Cohort studies usually observe large groups of individuals for a specific period of time to identify the contributing factors to a specific outcome (for instance an illness) and create associations between risk factors and the outcome under study. In collaborative projects, federated data facilities are meta-database systems that are distributed across multiple locations that permit to analyze, combine, or harmonize data from different sources making them suitable for mega- and meta-analyses. The harmonization of data can increase the statistical power of studies through maximization of sample size, allowing for additional refined statistical analyses, which ultimately lead to answer research questions that could not be addressed while using a single study. Indeed, harmonized data can be analyzed through mega-analysis of raw data or fixed effects meta-analysis. Other types of data might be analyzed by e.g., random-effects meta-analyses or Bayesian evidence synthesis. In this article, we describe some methodological aspects related to the construction of a federated facility to optimize analyses of multiple datasets, the impact of missing data, and some methods for handling missing data in cohort studies.


Sign in / Sign up

Export Citation Format

Share Document