A Comparison of Three Approaches to Handling Incomplete State-Level Data

2007 ◽  
Vol 7 (3) ◽  
pp. 325-338 ◽  
Author(s):  
J. Scott Granberg-Rademacker

This article compares three approaches to handling missing data at the state level under three distinct conditions. Using Monte Carlo simulation experiments, I compare the results from a linear model using listwise deletion (LD), Markov Chain Monte Carlo with the Gibbs sampler algorithm (MCMC), and multiple imputation by chained equations (MICE) as approaches to dealing with different severity levels of missing data: missing completely at random (MCAR), missing at random (MAR), and nonignorable missingness (NI). I compare the results from each of these approaches under each condition for missing data to the results from the fully observed dataset. I conclude that the MICE algorithm performs best under most missing data conditions, MCMC provides the most stable parameter estimates across the missing data conditions (but often produced estimates that were moderately biased), and LD performs worst under most missing data conditions. I conclude with recommendations for handling missing data in state-level analysis.

1989 ◽  
Vol 26 (2) ◽  
pp. 214-221 ◽  
Author(s):  
Subhash Sharma ◽  
Srinivas Durvasula ◽  
William R. Dillon

The authors report some results on the behavior of alternative covariance structure estimation procedures in the presence of non-normal data. They conducted Monté Carlo simulation experiments with a factorial design involving three levels of skewness, three level of kurtosis, and three different sample sizes. For normal data, among all the elliptical estimation techniques, elliptical reweighted least squares (ERLS) was equivalent in performance to ML. However, as expected, for non-normal data parameter estimates were unbiased for ML and the elliptical estimation techniques, whereas the bias in standard errors was substantial for GLS and ML. Among elliptical estimation techniques, ERLS was superior in performance. On the basis of the simulation results, the authors recommend that researchers use ERLS for both normal and non-normal data.


2021 ◽  
Author(s):  
Adrienne D. Woods ◽  
Pamela Davis-Kean ◽  
Max Andrew Halvorson ◽  
Kevin Michael King ◽  
Jessica A. R. Logan ◽  
...  

A common challenge in developmental research is the amount of incomplete and missing data that occurs from respondents failing to complete tasks or questionnaires, as well as from disengaging from the study (i.e., attrition). This missingness can lead to biases in parameter estimates and, hence, in the interpretation of findings. These biases can be addressed through statistical techniques that adjust for missing data, such as multiple imputation. Although this technique is highly effective, it has not been widely adopted by developmental scientists given barriers such as lack of training or misconceptions about imputation methods and instead utilizing default methods within software like listwise deletion. This manuscript is intended to provide practical guidelines for developmental researchers to follow when examining their data for missingness, making decisions about how to handle that missingness, and reporting the extent of missing data biases and specific multiple imputation procedures in publications.


2019 ◽  
Vol 80 (1) ◽  
pp. 41-66 ◽  
Author(s):  
Dexin Shi ◽  
Taehun Lee ◽  
Amanda J. Fairchild ◽  
Alberto Maydeu-Olivares

This study compares two missing data procedures in the context of ordinal factor analysis models: pairwise deletion (PD; the default setting in Mplus) and multiple imputation (MI). We examine which procedure demonstrates parameter estimates and model fit indices closer to those of complete data. The performance of PD and MI are compared under a wide range of conditions, including number of response categories, sample size, percent of missingness, and degree of model misfit. Results indicate that both PD and MI yield parameter estimates similar to those from analysis of complete data under conditions where the data are missing completely at random (MCAR). When the data are missing at random (MAR), PD parameter estimates are shown to be severely biased across parameter combinations in the study. When the percentage of missingness is less than 50%, MI yields parameter estimates that are similar to results from complete data. However, the fit indices (i.e., χ2, RMSEA, and WRMR) yield estimates that suggested a worse fit than results observed in complete data. We recommend that applied researchers use MI when fitting ordinal factor models with missing data. We further recommend interpreting model fit based on the TLI and CFI incremental fit indices.


2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Amal Almohisen ◽  
Robin Henderson ◽  
Arwa M. Alshingiti

In any longitudinal study, a dropout before the final timepoint can rarely be avoided. The chosen dropout model is commonly one of these types: Missing Completely at Random (MCAR), Missing at Random (MAR), Missing Not at Random (MNAR), and Shared Parameter (SP). In this paper we estimate the parameters of the longitudinal model for simulated data and real data using the Linear Mixed Effect (LME) method. We investigate the consequences of misspecifying the missingness mechanism by deriving the so-called least false values. These are the values the parameter estimates converge to, when the assumptions may be wrong. The knowledge of the least false values allows us to conduct a sensitivity analysis, which is illustrated. This method provides an alternative to a local misspecification sensitivity procedure, which has been developed for likelihood-based analysis. We compare the results obtained by the method proposed with the results found by using the local misspecification method. We apply the local misspecification and least false methods to estimate the bias and sensitivity of parameter estimates for a clinical trial example.


2021 ◽  
Vol 4 (4) ◽  
pp. 155-165
Author(s):  
Aminu Suleiman Mohammed ◽  
Badamasi Abba ◽  
Abubakar G. Musa

For proper actualization of the phenomenon contained in some lifetime data sets, a generalization, extension or modification of classical distributions is required. In this paper, we introduce a new generalization of exponential distribution, called the generalized odd generalized exponential-exponential distribution. The proposed distribution can model lifetime data with different failure rates, including the increasing, decreasing, unimodal, bathtub, and decreasing-increasing-decreasing failure rates. Various properties of the model such as quantile function, moment, mean deviations, Renyi entropy, and order statistics.  We provide an approximation for the values of the mean, variance, skewness, kurtosis, and mean deviations using Monte Carlo simulation experiments. Estimating of the distribution parameters is performed using the maximum likelihood method, and Monte Carlo simulation experiments is used to assess the estimation method. The method of maximum likelihood is shown to provide a promising parameter estimates, and hence can be adopted in practice for estimating the parameters of the distribution. An application to real and simulated datasets indicated that the new model is superior to the fits than the other compared distributions


2021 ◽  
Author(s):  
Trenton J. Davis ◽  
Tarek R. Firzli ◽  
Emily A. Higgins Keppler ◽  
Matt Richardson ◽  
Heather D. Bean

Missing data is a significant issue in metabolomics that is often neglected when conducting data pre-processing, particularly when it comes to imputation. This can have serious implications for downstream statistical analyses and lead to misleading or uninterpretable inferences. In this study, we aim to identify the primary types of missingness that affect untargeted metab-olomics data and compare strategies for imputation using two real-world comprehensive two-dimensional gas chromatog-raphy (GC×GC) data sets. We also present these goals in the context of experimental replication whereby imputation is con-ducted in a within-replicate-based fashion—the first description and evaluation of this strategy—and introduce an R package MetabImpute to carry out these analyses. Our results conclude that, in these two data sets, missingness was most likely of the missing at-random (MAR) and missing not-at-random (MNAR) types as opposed to missing completely at-random (MCAR). Gibbs sampler imputation and Random Forest gave the best results when imputing MAR and MNAR compared against single-value imputation (zero, minimum, mean, median, and half-minimum) and other more sophisticated approach-es (Bayesian principal components analysis and quantile regression imputation for left-censored data). When samples are replicated, within-replicate imputation approaches led to an increase in the reproducibility of peak quantification compared to imputation that ignores replication, suggesting that imputing with respect to replication may preserve potentially im-portant features in downstream analyses for biomarker discovery.


2018 ◽  
Vol 19 (2) ◽  
pp. 174-193 ◽  
Author(s):  
José LP da Silva ◽  
Enrico A Colosimo ◽  
Fábio N Demarqui

Generalized estimating equations (GEEs) are a well-known method for the analysis of categorical longitudinal data. This method presents computational simplicity and provides consistent parameter estimates that have a population-averaged interpretation. However, with missing data, the resulting parameter estimates are consistent only under the strong assumption of missing completely at random (MCAR). Some corrections can be done when the missing data mechanism is missing at random (MAR): inverse probability weighting GEE (WGEE) and multiple imputation GEE (MIGEE). A recent method combining ideas of these two approaches has a doubly robust property in the sense that one only needs to correctly specify the weight or the imputation model in order to obtain consistent estimates for the parameters. In this work, a proportional odds model is assumed and a doubly robust estimator is proposed for the analysis of ordinal longitudinal data with intermittently missing responses and covariates under the MAR mechanism. In addition, the association structure is modelled by means of either the correlation coefficient or local odds ratio. The performance of the proposed method is compared to both WGEE and MIGEE through a simulation study. The method is applied to a dataset related to rheumatic mitral stenosis.


2020 ◽  
Vol 18 (2) ◽  
pp. 2-6
Author(s):  
Thomas R. Knapp

Rubin (1976, and elsewhere) claimed that there are three kinds of “missingness”: missing completely at random; missing at random; and missing not at random. He gave examples of each. The article that now follows takes an opposing view by arguing that almost all missing data are missing not at random.


2022 ◽  
pp. 1-20
Author(s):  
Gidong Kim

Abstract I examine the relationship between labor unions and voter turnout in the American states. Though it is well known that unions increase turnout directly, we know less about their indirect effects. Moreover, the indirect effects may consist of nonmember mobilization and aggregate strength. To examine the direct and indirect mechanisms, I analyze both state-level panel data and individual-level data with a multilevel approach. First, my panel analysis shows that unions are positively associated with turnout as expected. Yet, the association is observed only in midterm elections, but not in presidential elections. Second, more importantly, my individual-level analysis suggests that indirect nonmember mobilization and indirect aggregate strength are positively related to turnout, while direct member mobilization is not. The findings imply that the direct effects are limited and, thus, that decreasing levels of voter turnout due to recently declining union membership come primarily from indirect mobilization rather than direct mobilization.


Social Change ◽  
2020 ◽  
Vol 50 (1) ◽  
pp. 160-168
Author(s):  
Surajit Deb

The Social Change Indicators series in this special issue presents state-level data on labour force participation rate, unemployment rate, status of employment and sectoral distribution.


Sign in / Sign up

Export Citation Format

Share Document