Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data

2021 ◽  
Vol 45 (3) ◽  
pp. 159-177
Author(s):  
Chen-Wei Liu

Missing not at random (MNAR) modeling for non-ignorable missing responses usually assumes that the latent variable distribution is a bivariate normal distribution. Such an assumption is rarely verified and often employed as a standard in practice. Recent studies for “complete” item responses (i.e., no missing data) have shown that ignoring the nonnormal distribution of a unidimensional latent variable, especially skewed or bimodal, can yield biased estimates and misleading conclusion. However, dealing with the bivariate nonnormal latent variable distribution with present MNAR data has not been looked into. This article proposes to extend unidimensional empirical histogram and Davidian curve methods to simultaneously deal with nonnormal latent variable distribution and MNAR data. A simulation study is carried out to demonstrate the consequence of ignoring bivariate nonnormal distribution on parameter estimates, followed by an empirical analysis of “don’t know” item responses. The results presented in this article show that examining the assumption of bivariate nonnormal latent variable distribution should be considered as a routine for MNAR data to minimize the impact of nonnormality on parameter estimates.

2020 ◽  
pp. 001316442093963
Author(s):  
Stefanie A. Wind ◽  
Randall E. Schumacker

Researchers frequently use Rasch models to analyze survey responses because these models provide accurate parameter estimates for items and examinees when there are missing data. However, researchers have not fully considered how missing data affect the accuracy of dimensionality assessment in Rasch analyses such as principal components analysis (PCA) of standardized residuals. Because adherence to unidimensionality is a prerequisite for the appropriate interpretation and use of Rasch model results, insight into the impact of missing data on the accuracy of this approach is critical. We used a simulation study to examine the accuracy of standardized residual PCA with various proportions of missing data and multidimensionality. We also explored an adaptation of modified parallel analysis in combination with standardized residual PCA as a source of additional information about dimensionality when missing data are present. Our results suggested that missing data impact the accuracy of PCA on standardized residuals, and that the adaptation of modified parallel analysis provides useful supplementary information about dimensionality when there are missing data.


2020 ◽  
pp. 107699862095376
Author(s):  
Scott Monroe

This research proposes a new statistic for testing latent variable distribution fit for unidimensional item response theory (IRT) models. If the typical assumption of normality is violated, then item parameter estimates will be biased, and dependent quantities such as IRT score estimates will be adversely affected. The proposed statistic compares the specified latent variable distribution to the sample average of latent variable posterior distributions commonly used in IRT scoring. Formally, the statistic is an instantiation of a generalized residual and is thus asymptotically distributed as standard normal. Also, the statistic naturally complements residual-based item-fit statistics, as both are conditional on the latent trait, and can be presented with graphical plots. In addition, a corresponding unconditional statistic, which controls for multiple comparisons, is proposed. The statistics are evaluated using a simulation study, and empirical analyses are provided.


2008 ◽  
Vol 68 (6) ◽  
pp. 907-922 ◽  
Author(s):  
Cees A. W. Glas ◽  
Jonald L. Pimentel

In tests with time limits, items at the end are often not reached. Usually, the pattern of missing responses depends on the ability level of the respondents; therefore, missing data are not ignorable in statistical inference. This study models data using a combination of two item response theory (IRT) models: one for the observed response data and one for the missing data indicator. The missing data indicator is modeled using a sequential model with linear restrictions on the item parameters. The models are connected by the assumption that the respondents' latent proficiency parameters have a joint multivariate normal distribution. Model parameters are estimated by maximum marginal likelihood. Simulations show that treating missing data as ignorable can lead to considerable bias in parameter estimates. Including an IRT model for the missing data indicator removes this bias. The method is illustrated with data from an intelligence test with a time limit.


2018 ◽  
Vol 19 (2) ◽  
pp. 174-193 ◽  
Author(s):  
José LP da Silva ◽  
Enrico A Colosimo ◽  
Fábio N Demarqui

Generalized estimating equations (GEEs) are a well-known method for the analysis of categorical longitudinal data. This method presents computational simplicity and provides consistent parameter estimates that have a population-averaged interpretation. However, with missing data, the resulting parameter estimates are consistent only under the strong assumption of missing completely at random (MCAR). Some corrections can be done when the missing data mechanism is missing at random (MAR): inverse probability weighting GEE (WGEE) and multiple imputation GEE (MIGEE). A recent method combining ideas of these two approaches has a doubly robust property in the sense that one only needs to correctly specify the weight or the imputation model in order to obtain consistent estimates for the parameters. In this work, a proportional odds model is assumed and a doubly robust estimator is proposed for the analysis of ordinal longitudinal data with intermittently missing responses and covariates under the MAR mechanism. In addition, the association structure is modelled by means of either the correlation coefficient or local odds ratio. The performance of the proposed method is compared to both WGEE and MIGEE through a simulation study. The method is applied to a dataset related to rheumatic mitral stenosis.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lihan Chen ◽  
Victoria Savalei

In missing data analysis, the reporting of missing rates is insufficient for the readers to determine the impact of missing data on the efficiency of parameter estimates. A more diagnostic measure, the fraction of missing information (FMI), shows how the standard errors of parameter estimates increase from the information loss due to ignorable missing data. FMI is well-known in the multiple imputation literature (Rubin, 1987), but it has only been more recently developed for full information maximum likelihood (Savalei and Rhemtulla, 2012). Sample FMI estimates using this approach have since then been made accessible as part of the lavaan package (Rosseel, 2012) in the R statistical programming language. However, the properties of FMI estimates at finite sample sizes have not been the subject of comprehensive investigation. In this paper, we present a simulation study on the properties of three sample FMI estimates from FIML in two common models in psychology, regression and two-factor analysis. We summarize the performance of these FMI estimates and make recommendations on their application.


2017 ◽  
Vol 78 (5) ◽  
pp. 857-886 ◽  
Author(s):  
Zhen Li ◽  
Li Cai

In standard item response theory (IRT) applications, the latent variable is typically assumed to be normally distributed. If the normality assumption is violated, the item parameter estimates can become biased. Summed score likelihood–based statistics may be useful for testing latent variable distribution fit. We develop Satorra–Bentler type moment adjustments to approximate the test statistics’ tail-area probability. A simulation study was conducted to examine the calibration and power of the unadjusted and adjusted statistics in various simulation conditions. Results show that the proposed indices have tail-area probabilities that can be closely approximated by central chi-squared random variables under the null hypothesis. Furthermore, the test statistics are focused. They are powerful for detecting latent variable distributional assumption violations, and not sensitive (correctly) to other forms of model misspecification such as multidimensionality. As a comparison, the goodness-of-fit statistic M2 has considerably lower power against latent variable nonnormality than the proposed indices. Empirical data from a patient-reported health outcomes study are used as illustration.


Methodology ◽  
2020 ◽  
Vol 16 (2) ◽  
pp. 147-165 ◽  
Author(s):  
Steffi Pohl ◽  
Benjamin Becker

Approaches for dealing with item omission include incorrect scoring, ignoring missing values, and approaches for nonignorable missing values and have only been evaluated for certain forms of nonignorability. In this paper we investigate the performance of these approaches for various conditions of nonignorability, that is, when the missing response depends on i) the item response, ii) a latent missing propensity, or iii) both. No approach results in unbiased parameter estimates of the Rasch model under all missing data mechanisms. Incorrect scoring only results in unbiased estimates under very specific data constellations of missing mechanisms i) and iii). The approach for nonignorable missing values only results in unbiased estimates under condition ii). Ignoring results in slightly more biased estimates than the approach for nonignorable missing values, while the latter also indicates the presence of nonignorablity under all simulated conditions. We illustrate the results in an empirical example on PISA data.


Methodology ◽  
2018 ◽  
Vol 14 (1) ◽  
pp. 30-44 ◽  
Author(s):  
Audrey J. Leroux ◽  
S. Natasha Beretvas

Abstract. The current study proposed a new model, termed the cross-classified multiple membership latent variable regression (CCMM-LVR) model that provides an extension to the three-level latent variable regression (HM3-LVR) model that can be used with cross-classified multiple membership data, for example, in the presence of student mobility across schools. The HM3-LVR model is beneficial for testing more flexible hypotheses about growth trajectory parameters and handles pure clustering of participants within higher-level (level-3) units. However, the HM3-LVR model involves the assumption that students remain in the same cluster (school) throughout the duration of the time period of interest. The CCMM-LVR model appropriately models the participants’ changing clusters over time. The impact of ignoring mobility in the real data was investigated by comparing parameter estimates, standard error estimates, and model fit indices for the model (CCMM-LVR) that appropriately modeled the cross-classified multiple membership structure with results when this structure was ignored (HM3-LVR).


Methodology ◽  
2015 ◽  
Vol 11 (3) ◽  
pp. 89-99 ◽  
Author(s):  
Leslie Rutkowski ◽  
Yan Zhou

Abstract. Given a consistent interest in comparing achievement across sub-populations in international assessments such as TIMSS, PIRLS, and PISA, it is critical that sub-population achievement is estimated reliably and with sufficient precision. As such, we systematically examine the limitations to current estimation methods used by these programs. Using a simulation study along with empirical results from the 2007 cycle of TIMSS, we show that a combination of missing and misclassified data in the conditioning model induces biases in sub-population achievement estimates, the magnitude and degree to which can be readily explained by data quality. Importantly, estimated biases in sub-population achievement are limited to the conditioning variable with poor-quality data while other sub-population achievement estimates are unaffected. Findings are generally in line with theory on missing and error-prone covariates. The current research adds to a small body of literature that has noted some of the limitations to sub-population estimation.


Marketing ZFP ◽  
2019 ◽  
Vol 41 (4) ◽  
pp. 21-32
Author(s):  
Dirk Temme ◽  
Sarah Jensen

Missing values are ubiquitous in empirical marketing research. If missing data are not dealt with properly, this can lead to a loss of statistical power and distorted parameter estimates. While traditional approaches for handling missing data (e.g., listwise deletion) are still widely used, researchers can nowadays choose among various advanced techniques such as multiple imputation analysis or full-information maximum likelihood estimation. Due to the available software, using these modern missing data methods does not pose a major obstacle. Still, their application requires a sound understanding of the prerequisites and limitations of these methods as well as a deeper understanding of the processes that have led to missing values in an empirical study. This article is Part 1 and first introduces Rubin’s classical definition of missing data mechanisms and an alternative, variable-based taxonomy, which provides a graphical representation. Secondly, a selection of visualization tools available in different R packages for the description and exploration of missing data structures is presented.


Sign in / Sign up

Export Citation Format

Share Document