Modeling Nonignorable Missing Data in Speeded Tests

2008 ◽  
Vol 68 (6) ◽  
pp. 907-922 ◽  
Author(s):  
Cees A. W. Glas ◽  
Jonald L. Pimentel

In tests with time limits, items at the end are often not reached. Usually, the pattern of missing responses depends on the ability level of the respondents; therefore, missing data are not ignorable in statistical inference. This study models data using a combination of two item response theory (IRT) models: one for the observed response data and one for the missing data indicator. The missing data indicator is modeled using a sequential model with linear restrictions on the item parameters. The models are connected by the assumption that the respondents' latent proficiency parameters have a joint multivariate normal distribution. Model parameters are estimated by maximum marginal likelihood. Simulations show that treating missing data as ignorable can lead to considerable bias in parameter estimates. Including an IRT model for the missing data indicator removes this bias. The method is illustrated with data from an intelligence test with a time limit.

2021 ◽  
Vol 45 (3) ◽  
pp. 159-177
Author(s):  
Chen-Wei Liu

Missing not at random (MNAR) modeling for non-ignorable missing responses usually assumes that the latent variable distribution is a bivariate normal distribution. Such an assumption is rarely verified and often employed as a standard in practice. Recent studies for “complete” item responses (i.e., no missing data) have shown that ignoring the nonnormal distribution of a unidimensional latent variable, especially skewed or bimodal, can yield biased estimates and misleading conclusion. However, dealing with the bivariate nonnormal latent variable distribution with present MNAR data has not been looked into. This article proposes to extend unidimensional empirical histogram and Davidian curve methods to simultaneously deal with nonnormal latent variable distribution and MNAR data. A simulation study is carried out to demonstrate the consequence of ignoring bivariate nonnormal distribution on parameter estimates, followed by an empirical analysis of “don’t know” item responses. The results presented in this article show that examining the assumption of bivariate nonnormal latent variable distribution should be considered as a routine for MNAR data to minimize the impact of nonnormality on parameter estimates.


2014 ◽  
Vol 926-930 ◽  
pp. 3830-3833
Author(s):  
Zhi Hui Fu ◽  
Cui Xin Peng ◽  
Bin Li

Missing data are often a problem in statistical modeling. How to estimate item parameters with missing data in item response theory (IRT) is an interesting issue. The Bayesian paradigm offers a natural model-based solution for this problem by treating missing values as random variables and estimating their posterior distributions. In this article, based on a data augmentation scheme using the Gibbs sampler, we propose a Bayesian procedure to estimate the multidimensional two parameter Logistic model with missing responses.


Author(s):  
HUA FANG ◽  
KIMBERLY ANDREWS ESPY ◽  
MARIA L. RIZZO ◽  
CHRISTIAN STOPP ◽  
SANDRA A. WIEBE ◽  
...  

Methods for identifying meaningful growth patterns of longitudinal trial data with both nonignorable intermittent and drop-out missingness are rare. In this study, a combined approach with statistical and data mining techniques is utilized to address the nonignorable missing data issue in growth pattern recognition. First, a parallel mixture model is proposed to model the nonignorable missing information from a real-world patient-oriented study and concurrently to estimate the growth trajectories of participants. Then, based on individual growth parameter estimates and their auxiliary feature attributes, a fuzzy clustering method is incorporated to identify the growth patterns. This case study demonstrates that the combined multi-step approach can achieve both statistical generality and computational efficiency for growth pattern recognition in longitudinal studies with nonignorable missing data.


Methodology ◽  
2017 ◽  
Vol 13 (4) ◽  
pp. 135-143 ◽  
Author(s):  
Alejandro Hernandez-Camacho ◽  
Julio Olea ◽  
Francisco J. Abad

Abstract. The bifactor model (BM) and the testlet response model (TRM) are the most common multidimensional models applied to testlet-based tests. The common procedure is to estimate these models using different estimation methods (see, e.g., DeMars, 2006 ). A possible consequence of this is that previous findings about the implications of fitting a wrong model to the data may be confounded with the estimation procedures they employed. With this in mind, the present study uses the same method (maximum marginal likelihood [MML] using dimensional reduction) to compare uni- and multidimensional strategies to testlet-based tests, and assess the performance of various relative fit indices. Data were simulated under three different models, namely BM, TRM, and the unidimensional model. Recovery of item parameters, reliability estimates, and selection rates of the relative fit indices were documented. The results were essentially consistent with those obtained through different methods ( DeMars, 2006 ), indicating that the effect of the estimation method is negligible. Regarding the fit indices, Akaike Information Criterion (AIC) showed the best selection rates, whereas Bayes Information Criterion (BIC) tended to select a model which is simpler than the true one. The work concludes with recommendations for practitioners and proposals for future research.


2018 ◽  
Vol 19 (2) ◽  
pp. 174-193 ◽  
Author(s):  
José LP da Silva ◽  
Enrico A Colosimo ◽  
Fábio N Demarqui

Generalized estimating equations (GEEs) are a well-known method for the analysis of categorical longitudinal data. This method presents computational simplicity and provides consistent parameter estimates that have a population-averaged interpretation. However, with missing data, the resulting parameter estimates are consistent only under the strong assumption of missing completely at random (MCAR). Some corrections can be done when the missing data mechanism is missing at random (MAR): inverse probability weighting GEE (WGEE) and multiple imputation GEE (MIGEE). A recent method combining ideas of these two approaches has a doubly robust property in the sense that one only needs to correctly specify the weight or the imputation model in order to obtain consistent estimates for the parameters. In this work, a proportional odds model is assumed and a doubly robust estimator is proposed for the analysis of ordinal longitudinal data with intermittently missing responses and covariates under the MAR mechanism. In addition, the association structure is modelled by means of either the correlation coefficient or local odds ratio. The performance of the proposed method is compared to both WGEE and MIGEE through a simulation study. The method is applied to a dataset related to rheumatic mitral stenosis.


2019 ◽  
Vol 79 (4) ◽  
pp. 699-726 ◽  
Author(s):  
Karoline A. Sachse ◽  
Nicole Mahler ◽  
Steffi Pohl

Mechanisms causing item nonresponses in large-scale assessments are often said to be nonignorable. Parameter estimates can be biased if nonignorable missing data mechanisms are not adequately modeled. In trend analyses, it is plausible for the missing data mechanism and the percentage of missing values to change over time. In this article, we investigated (a) the extent to which the missing data mechanism and the percentage of missing values changed over time in real large-scale assessment data, (b) how different approaches for dealing with missing data performed under such conditions, and (c) the practical implications for trend estimates. These issues are highly relevant because the conclusions hold for all kinds of group mean differences in large-scale assessments. In a reanalysis of PISA (Programme for International Student Assessment) data from 35 OECD countries, we found that missing data mechanisms and numbers of missing values varied considerably across time points, countries, and domains. In a simulation study, we generated data in which we allowed the missing data mechanism and the amount of missing data to change over time. We showed that the trend estimates were biased if differences in the missing-data mechanisms were not taken into account, in our case, when omissions were scored as wrong, when omissions were ignored, or when model-based approaches assuming a constant missing data mechanism over time were used. The results suggest that the most accurate estimates can be obtained from the application of multiple group models for nonignorable missing values when the amounts of missing data and the missing data mechanisms changed over time. In an empirical example, we furthermore showed that the large decline in PISA reading literacy in Ireland in 2009 was reduced when we estimated trends using missing data treatments that accounted for changes in missing data mechanisms.


Methodology ◽  
2020 ◽  
Vol 16 (2) ◽  
pp. 147-165 ◽  
Author(s):  
Steffi Pohl ◽  
Benjamin Becker

Approaches for dealing with item omission include incorrect scoring, ignoring missing values, and approaches for nonignorable missing values and have only been evaluated for certain forms of nonignorability. In this paper we investigate the performance of these approaches for various conditions of nonignorability, that is, when the missing response depends on i) the item response, ii) a latent missing propensity, or iii) both. No approach results in unbiased parameter estimates of the Rasch model under all missing data mechanisms. Incorrect scoring only results in unbiased estimates under very specific data constellations of missing mechanisms i) and iii). The approach for nonignorable missing values only results in unbiased estimates under condition ii). Ignoring results in slightly more biased estimates than the approach for nonignorable missing values, while the latter also indicates the presence of nonignorablity under all simulated conditions. We illustrate the results in an empirical example on PISA data.


1999 ◽  
Vol 15 (2) ◽  
pp. 91-98 ◽  
Author(s):  
Lutz F. Hornke

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.


Marketing ZFP ◽  
2019 ◽  
Vol 41 (4) ◽  
pp. 21-32
Author(s):  
Dirk Temme ◽  
Sarah Jensen

Missing values are ubiquitous in empirical marketing research. If missing data are not dealt with properly, this can lead to a loss of statistical power and distorted parameter estimates. While traditional approaches for handling missing data (e.g., listwise deletion) are still widely used, researchers can nowadays choose among various advanced techniques such as multiple imputation analysis or full-information maximum likelihood estimation. Due to the available software, using these modern missing data methods does not pose a major obstacle. Still, their application requires a sound understanding of the prerequisites and limitations of these methods as well as a deeper understanding of the processes that have led to missing values in an empirical study. This article is Part 1 and first introduces Rubin’s classical definition of missing data mechanisms and an alternative, variable-based taxonomy, which provides a graphical representation. Secondly, a selection of visualization tools available in different R packages for the description and exploration of missing data structures is presented.


Energies ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 388
Author(s):  
Riccardo De Blasis ◽  
Giovanni Batista Masala ◽  
Filippo Petroni

The energy produced by a wind farm in a given location and its associated income depends both on the wind characteristics in that location—i.e., speed and direction—and the dynamics of the electricity spot price. Because of the evidence of cross-correlations between wind speed, direction and price series and their lagged series, we aim to assess the income of a hypothetical wind farm located in central Italy when all interactions are considered. To model these cross and auto-correlations efficiently, we apply a high-order multivariate Markov model which includes dependencies from each time series and from a certain level of past values. Besides this, we used the Raftery Mixture Transition Distribution model (MTD) to reduce the number of parameters to get a more parsimonious model. Using data from the MERRA-2 project and from the electricity market in Italy, we estimate the model parameters and validate them through a Monte Carlo simulation. The results show that the simulated income faithfully reproduces the empirical income and that the multivariate model also closely reproduces the cross-correlations between the variables. Therefore, the model can be used to predict the income generated by a wind farm.


Sign in / Sign up

Export Citation Format

Share Document