scholarly journals Assessing the Accuracy of Parameter Estimates in the Presence of Rapid Guessing Misclassifications

2021 ◽  
pp. 001316442110036
Author(s):  
Joseph A. Rios

The presence of rapid guessing (RG) presents a challenge to practitioners in obtaining accurate estimates of measurement properties and examinee ability. In response to this concern, researchers have utilized response times as a proxy of RG and have attempted to improve parameter estimation accuracy by filtering RG responses using popular scoring approaches, such as the effort-moderated item response theory (EM-IRT) model. However, such an approach assumes that RG can be correctly identified based on an indirect proxy of examinee behavior. A failure to meet this assumption leads to the inclusion of distortive and psychometrically uninformative information in parameter estimates. To address this issue, a simulation study was conducted to examine how violations to the assumption of correct RG classification influences EM-IRT item and ability parameter estimation accuracy and compares these results with parameter estimates from the three-parameter logistic (3PL) model, which includes RG responses in scoring. Two RG misclassification factors were manipulated: type (underclassification vs. overclassification) and rate (10%, 30%, and 50%). Results indicated that the EM-IRT model provided improved item parameter estimation over the 3PL model regardless of misclassification type and rate. Furthermore, under most conditions, increased rates of RG underclassification were associated with the greatest bias in ability parameter estimates from the EM-IRT model. In spite of this, the EM-IRT model with RG misclassifications demonstrated more accurate ability parameter estimation than the 3PL model when the mean ability of RG subgroups did not differ. This suggests that in certain situations it may be better for practitioners to (a) imperfectly identify RG than to ignore the presence of such invalid responses and (b) select liberal over conservative response time thresholds to mitigate bias from underclassified RG.

2021 ◽  
Author(s):  
Joseph Rios

The presence of rapid guessing (RG) presents a challenge to practitioners in obtaining accurate estimates of measurement properties and examinee ability. In response to this concern, researchers have utilized response times as a proxy of RG, and have attempted to improve parameter estimation accuracy by filtering RG responses using popular scoring approaches, such as the Effort-moderated IRT (EM-IRT) model. However, such an approach assumes that RG can be correctly identified based on an indirect proxy of examinee behavior. A failure to meet this assumption leads to the inclusion of distortive and psychometrically uninformative information in parameter estimates. To address this issue, a simulation study was conducted to examine how violations to the assumption of correct RG classification influences EM-IRT item and ability parameter estimation accuracy and compares these results to parameter estimates from the three-parameter logistic (3PL) model, which includes RG responses in scoring. Two RG misclassification factors were manipulated: type (underclassification vs. overclassification) and rate (10%, 30%, and 50%). Results indicated that the EMIRT model provided improved item parameter estimation over the 3PL model regardless of misclassification type and rate. Furthermore, under most conditions, increased rates of RG underclassification were associated with the greatest bias in ability parameter estimates from the EM-IRT model. In spite of this, the EM-IRT model with RG misclassifications demonstrated more accurate ability parameter estimation than the 3PL model when the mean ability of RG subgroups did not differ. This suggests that in certain situations it may be better for practitioners to: (a) imperfectly identify RG than to ignore the presence of such invalid responses, and (b) select liberal over conservative response time thresholds to mitigate bias from underclassified RG.


2020 ◽  
pp. 001316442094989
Author(s):  
Joseph A. Rios ◽  
James Soland

As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., two-parameter logistic [2PL]) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (Assumption 1) and is unrelated to the underlying ability of examinees (Assumption 2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of Assumption 1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating Assumption 2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating Assumption 2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.


2020 ◽  
Author(s):  
Joseph Rios ◽  
Jim Soland

As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the Effort-Moderated IRT (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., 2PL) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (assumption #1) and is unrelated to the underlying ability of examinees (assumption #2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of assumption #1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating assumption #2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating assumption #2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared to the 2PL model.


2020 ◽  
Author(s):  
Joseph Rios ◽  
Jim Soland

As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the Effort-Moderated IRT (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., 2PL) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (assumption #1) and is unrelated to the underlying ability of examinees (assumption #2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of assumption #1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating assumption #2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating assumption #2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared to the 2PL model.


2020 ◽  
Vol 80 (4) ◽  
pp. 775-807
Author(s):  
Yue Liu ◽  
Ying Cheng ◽  
Hongyun Liu

The responses of non-effortful test-takers may have serious consequences as non-effortful responses can impair model calibration and latent trait inferences. This article introduces a mixture model, using both response accuracy and response time information, to help differentiating non-effortful and effortful individuals, and to improve item parameter estimation based on the effortful group. Two mixture approaches are compared with the traditional response time mixture model (TMM) method and the normative threshold 10 (NT10) method with response behavior effort criteria in four simulation scenarios with regard to item parameter recovery and classification accuracy. The results demonstrate that the mixture methods and the TMM method can reduce the bias of item parameter estimates caused by non-effortful individuals, with the mixture methods showing more advantages when the non-effort severity is high or the response times are not lognormally distributed. An illustrative example is also provided.


2021 ◽  
Author(s):  
Jan Steinfeld ◽  
Alexander Robitzsch

This article describes the conditional maximum likelihood-based item parameter estimation in probabilistic multistage designs. In probabilistic multistage designs, the routing is not solely based on a raw score j and a cut score c as well as a rule for routing into a module such as j < c or j ≤ c but is based on a probability p(j) for each raw score j. It can be shown that the use of a conventional conditional maximum likelihood parameter estimate in multistage designs leads to severely biased item parameter estimates. Zwitser and Maris (2013) were able to show that with deterministic routing, the integration of the design into the item parameter estimation leads to unbiased estimates. This article extends this approach to probabilistic routing and, at the same time, represents a generalization. In a simulation study, it is shown that the item parameter estimation in probabilistic designs leads to unbiased item parameter estimates.


2019 ◽  
Vol 44 (3) ◽  
pp. 309-341 ◽  
Author(s):  
Jeffrey M. Patton ◽  
Ying Cheng ◽  
Maxwell Hong ◽  
Qi Diao

In psychological and survey research, the prevalence and serious consequences of careless responses from unmotivated participants are well known. In this study, we propose to iteratively detect careless responders and cleanse the data by removing their responses. The careless responders are detected using person-fit statistics. In two simulation studies, the iterative procedure leads to nearly perfect power in detecting extremely careless responders and much higher power than the noniterative procedure in detecting moderately careless responders. Meanwhile, the false-positive error rate is close to the nominal level. In addition, item parameter estimation is much improved by iteratively cleansing the calibration sample. The bias in item discrimination and location parameter estimates is substantially reduced. The standard error estimates, which are spuriously small in the presence of careless responses, are corrected by the iterative cleansing procedure. An empirical example is also presented to illustrate the proposed procedure. These results suggest that the proposed procedure is a promising way to improve item parameter estimation for tests of 20 items or longer when data are contaminated by careless responses.


Psych ◽  
2021 ◽  
Vol 3 (3) ◽  
pp. 279-307
Author(s):  
Jan Steinfeld ◽  
Alexander Robitzsch

There is some debate in the psychometric literature about item parameter estimation in multistage designs. It is occasionally argued that the conditional maximum likelihood (CML) method is superior to the marginal maximum likelihood method (MML) because no assumptions have to be made about the trait distribution. However, CML estimation in its original formulation leads to biased item parameter estimates. Zwitser and Maris (2015, Psychometrika) proposed a modified conditional maximum likelihood estimation method for multistage designs that provides practically unbiased item parameter estimates. In this article, the differences between different estimation approaches for multistage designs were investigated in a simulation study. Four different estimation conditions (CML, CML estimation with the consideration of the respective MST design, MML with the assumption of a normal distribution, and MML with log-linear smoothing) were examined using a simulation study, considering different multistage designs, number of items, sample size, and trait distributions. The results showed that in the case of the substantial violation of the normal distribution, the CML method seemed to be preferable to MML estimation employing a misspecified normal trait distribution, especially if the number of items and sample size increased. However, MML estimation using log-linear smoothing lea to results that were very similar to the CML method with the consideration of the respective MST design.


2021 ◽  
Author(s):  
Jan Steinfeld ◽  
Alexander Robitzsch

This article describes the conditional maximum likelihood-based item parameter estimation in probabilistic multistage designs. In probabilistic multistage designs, the routing is not solely based on a raw score j and a cut score c as well as a rule for routing into a module such as j < c or j ≤ c but is based on a probability p(j) for each raw score j. It can be shown that the use of a conventional conditional maximum likelihood parameter estimate in multistage designs leads to severely biased item parameter estimates. Zwitser and Maris (2013) were able to show that with deterministic routing, the integration of the design into the item parameter estimation leads to unbiased estimates. This article extends this approach to probabilistic routing and, at the same time, represents a generalization. In a simulation study, it is shown that the item parameter estimation in probabilistic designs leads to unbiased item parameter estimates.


1988 ◽  
Vol 13 (3) ◽  
pp. 243-271 ◽  
Author(s):  
Michael R. Harwell ◽  
Frank B. Baker ◽  
Michael Zwarts

The Bock and Aitkin (1981) Marginal Maximum Likelihood/EM approach to item parameter estimation is an alternative to the classical joint maximum likelihood procedure of item response theory. Unfortunately, the complexity of the underlying mathematics and the terse nature of the existing literature has made understanding of the approach difficult. To make the approach accessible to a wider audience, the present didactic paper provides the essential mathematical details of a marginal maximum likelihood/EM solution and shows how it can be used to obtain consistent item parameter estimates. For pedagogical purposes, a short BASIC computer program is used to illustrate the underlying simplicity of the method.


Sign in / Sign up

Export Citation Format

Share Document