scholarly journals JMASM 54: A Comparison of Four Different Estimation Approaches for Prognostic Survival Oral Cancer Model

2020 ◽  
Vol 18 (2) ◽  
pp. 2-6
Author(s):  
Thomas R. Knapp

Rubin (1976, and elsewhere) claimed that there are three kinds of “missingness”: missing completely at random; missing at random; and missing not at random. He gave examples of each. The article that now follows takes an opposing view by arguing that almost all missing data are missing not at random.

Author(s):  
Tamara Jorquiera ◽  
Hang Lee ◽  
Felipe Fregni ◽  
Andre Brunoni

This chapter discusses the problem of incomplete or missing data. The three types of missing data mechanisms are examined: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). It discusses how to reduce its occurrence using trial design and improving the data collection process. The chapter also provides methods to control this factor during the analysis stage, using some strategies such as not replacing the lost data (complete case analysis), replacing each lost value with a single value (single imputation), replacing the lost data with multiple values for each lost observation (multiple imputation). It then discusses sensitivity analysis, which measures the impact on the results from different methods of handling missing data, and it helps to justify the choice of the particular method applied. Finally, it reviews covariate adjustment as another topic in statistics.


2011 ◽  
Vol 21 (3) ◽  
pp. 243-256 ◽  
Author(s):  
Rhian M Daniel ◽  
Michael G Kenward ◽  
Simon N Cousens ◽  
Bianca L De Stavola

Estimating causal effects from incomplete data requires additional and inherently untestable assumptions regarding the mechanism giving rise to the missing data. We show that using causal diagrams to represent these additional assumptions both complements and clarifies some of the central issues in missing data theory, such as Rubin's classification of missingness mechanisms (as missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR)) and the circumstances in which causal effects can be estimated without bias by analysing only the subjects with complete data. In doing so, we formally extend the back-door criterion of Pearl and others for use in incomplete data examples. These ideas are illustrated with an example drawn from an occupational cohort study of the effect of cosmic radiation on skin cancer incidence.


The missing data is inescapable in clinical research. While individuals with their left out or missing data may balance out when regards to those with no missing data to the extent the aftereffect interest in gauge all around. Missing data is of three types :Missing at random (MAR),Missing completely at random (MCAR), and Missing not at random (MNAR). In a medicalstudy, missing data is, to a great extent, MCAR. Missing information can build up deep troubles in the assessments and perception of results and undermine the authenticity of results to finish.Various strategies have been created for managing missing information. These incorporate total case examinations, missing pointer technique, single worth imputation, and affectability investigations were joining most pessimistic scenario while greate-case situations. Whenever connected the MCAR suspicion, a portion of these techniques can give unprejudiced, however frequently less exact assessments. Single value imputation an elective technique to manage missing information, which records for the vulnerability related to missing information. Single value imputation is actualized in most factual programming under the MAR suspicion and gives fair and substantial evaluations of affiliations dependent on data from the accessible information. The technique influences not just the coefficient gauges for factors with missing information, yet additionally the appraisals for different factors with zero missing information.


Author(s):  
Tra My Pham ◽  
Irene Petersen ◽  
James Carpenter ◽  
Tim Morris

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.


2020 ◽  
Vol 28 (108) ◽  
pp. 599-621
Author(s):  
Maria Eugénia Ferrão ◽  
Paula Prata ◽  
Maria Teresa Gonzaga Alves

Abstract Almost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.


2020 ◽  
Author(s):  
Suzie Cro ◽  
Tim P Morris ◽  
Brennan C Kahan ◽  
Victoria R Cornelius ◽  
James R Carpenter

Abstract Background: The coronavirus pandemic (Covid-19) presents a variety of challenges for ongoing clinical trials, including an inevitably higher rate of missing outcome data, with new and non-standard reasons for missingness. International drug trial guidelines recommend trialists review plans for handling missing data in the conduct and statistical analysis, but clear recommendations are lacking.Methods: We present a four-step strategy for handling missing outcome data in the analysis of randomised trials that are ongoing during a pandemic. We consider handling missing data arising due to (i) participant infection, (ii) treatment disruptions and (iii) loss to follow-up. We consider both settings where treatment effects for a ‘pandemic-free world’ and ‘world including a pandemic’ are of interest. Results: In any trial, investigators should; (1) Clarify the treatment estimand of interest with respect to the occurrence of the pandemic; (2) Establish what data are missing for the chosen estimand; (3) Perform primary analysis under the most plausible missing data assumptions followed by; (4) Sensitivity analysis under alternative plausible assumptions. To obtain an estimate of the treatment effect in a ‘pandemic-free world’, participant data that are clinically affected by the pandemic (directly due to infection or indirectly via treatment disruptions) are not relevant and can be set to missing. For primary analysis, a missing-at-random assumption that conditions on all observed data that are expected to be associated with both the outcome and missingness may be most plausible. For the treatment effect in the ‘world including a pandemic’, all participant data is relevant and should be included in the analysis. For primary analysis, a missing-at-random assumption – potentially incorporating a pandemic time-period indicator and participant infection status – or a missing-not-at-random assumption with a poorer response may be most relevant, depending on the setting. In all scenarios, sensitivity analysis under credible missing-not-at-random assumptions should be used to evaluate the robustness of results. We highlight controlled multiple imputation as an accessible tool for conducting sensitivity analyses.Conclusions: Missing data problems will be exacerbated for trials active during the Covid-19 pandemic. This four-step strategy will facilitate clear thinking about the appropriate analysis for relevant questions of interest.


2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Amal Almohisen ◽  
Robin Henderson ◽  
Arwa M. Alshingiti

In any longitudinal study, a dropout before the final timepoint can rarely be avoided. The chosen dropout model is commonly one of these types: Missing Completely at Random (MCAR), Missing at Random (MAR), Missing Not at Random (MNAR), and Shared Parameter (SP). In this paper we estimate the parameters of the longitudinal model for simulated data and real data using the Linear Mixed Effect (LME) method. We investigate the consequences of misspecifying the missingness mechanism by deriving the so-called least false values. These are the values the parameter estimates converge to, when the assumptions may be wrong. The knowledge of the least false values allows us to conduct a sensitivity analysis, which is illustrated. This method provides an alternative to a local misspecification sensitivity procedure, which has been developed for likelihood-based analysis. We compare the results obtained by the method proposed with the results found by using the local misspecification method. We apply the local misspecification and least false methods to estimate the bias and sensitivity of parameter estimates for a clinical trial example.


2021 ◽  
Author(s):  
Trenton J. Davis ◽  
Tarek R. Firzli ◽  
Emily A. Higgins Keppler ◽  
Matt Richardson ◽  
Heather D. Bean

Missing data is a significant issue in metabolomics that is often neglected when conducting data pre-processing, particularly when it comes to imputation. This can have serious implications for downstream statistical analyses and lead to misleading or uninterpretable inferences. In this study, we aim to identify the primary types of missingness that affect untargeted metab-olomics data and compare strategies for imputation using two real-world comprehensive two-dimensional gas chromatog-raphy (GC×GC) data sets. We also present these goals in the context of experimental replication whereby imputation is con-ducted in a within-replicate-based fashion—the first description and evaluation of this strategy—and introduce an R package MetabImpute to carry out these analyses. Our results conclude that, in these two data sets, missingness was most likely of the missing at-random (MAR) and missing not-at-random (MNAR) types as opposed to missing completely at-random (MCAR). Gibbs sampler imputation and Random Forest gave the best results when imputing MAR and MNAR compared against single-value imputation (zero, minimum, mean, median, and half-minimum) and other more sophisticated approach-es (Bayesian principal components analysis and quantile regression imputation for left-censored data). When samples are replicated, within-replicate imputation approaches led to an increase in the reproducibility of peak quantification compared to imputation that ignores replication, suggesting that imputing with respect to replication may preserve potentially im-portant features in downstream analyses for biomarker discovery.


2020 ◽  
Author(s):  
Matthew Sperrin ◽  
Glen P. Martin

Abstract Background : Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders. Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms. Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. In particular the approach: 1) does not introduce bias in missing (completely) at random scenarios; 2)reduces bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself; and 3) may reduce or increase bias when unmeasured confounding is present. Conclusion : In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.


2007 ◽  
Vol 7 (3) ◽  
pp. 325-338 ◽  
Author(s):  
J. Scott Granberg-Rademacker

This article compares three approaches to handling missing data at the state level under three distinct conditions. Using Monte Carlo simulation experiments, I compare the results from a linear model using listwise deletion (LD), Markov Chain Monte Carlo with the Gibbs sampler algorithm (MCMC), and multiple imputation by chained equations (MICE) as approaches to dealing with different severity levels of missing data: missing completely at random (MCAR), missing at random (MAR), and nonignorable missingness (NI). I compare the results from each of these approaches under each condition for missing data to the results from the fully observed dataset. I conclude that the MICE algorithm performs best under most missing data conditions, MCMC provides the most stable parameter estimates across the missing data conditions (but often produced estimates that were moderately biased), and LD performs worst under most missing data conditions. I conclude with recommendations for handling missing data in state-level analysis.


Sign in / Sign up

Export Citation Format

Share Document