CUSUM-Based Person-Fit Statistics for Adaptive Testing

Snijders developed a family of person fit indices that asymptotically follow the standard normal distribution, when the ability parameter is estimated. So far, [Formula: see text], U*, W*, [Formula: see text], and [Formula: see text] from this family have been proposed in previous literature. One common property shared by [Formula: see text], U*, and W* (also [Formula: see text] and [Formula: see text] in some specific conditions) is that they employ symmetric weight functions and thus identify spurious scores on both easy and difficult items in the same manner. However, when the purpose is to detect only the spuriously high scores on difficult items, such as cheating, guessing, and having item preknowledge, using symmetric weight functions may jeopardize the detection rates of the target aberrant response patterns. By specifying two types of asymmetric weight functions, this study proposes SHa(λ)* (λ = 1/2 or 1) and SHb(β)* (β = 2 or 3) based on Snijders’s framework to specifically detect spuriously high scores on difficult items. Two simulation studies were carried out to investigate the Type I error rates and empirical power of SHa(λ)* and SHb(β)*, compared with [Formula: see text], U*, W*, [Formula: see text], and [Formula: see text]. The empirical results demonstrated satisfactory performance of the proposed indices. Recommendations were also made on the choice of different person fit indices based on specific purposes.

Download Full-text

Evaluating the Fit of Sequential G-DINA Model Using Limited-Information Measures

Applied Psychological Measurement ◽

10.1177/0146621619843829 ◽

2019 ◽

Vol 44 (3) ◽

pp. 167-181 ◽

Cited By ~ 2

Author(s):

Wenchao Ma

Keyword(s):

Goodness Of Fit ◽

Type I Error ◽

Model Simulation ◽

Real Data ◽

Error Rates ◽

Type I ◽

Limited Information ◽

Detection Rates ◽

Root Mean Square Residual ◽

Information Measures

Limited-information fit measures appear to be promising in assessing the goodness-of-fit of dichotomous response cognitive diagnosis models (CDMs), but their performance has not been examined for polytomous response CDMs. This study investigates the performance of the Mord statistic and standardized root mean square residual (SRMSR) for an ordinal response CDM—the sequential generalized deterministic inputs, noisy “and” gate model. Simulation studies showed that the Mord statistic had well-calibrated Type I error rates, but the correct detection rates were influenced by various factors such as item quality, sample size, and the number of response categories. In addition, the SRMSR was also influenced by many factors and the common practice of comparing the SRMSR against a prespecified cut-off (e.g., .05) may not be appropriate. A set of real data was analyzed as well to illustrate the use of Mord statistic and SRMSR in practice.

Download Full-text

Power and Type I error rates of goodness-of-fit statistics for binomial generalized estimating equations (GEE) models

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2005.07.017 ◽

2006 ◽

Vol 50 (12) ◽

pp. 3432-3448 ◽

Cited By ~ 7

Author(s):

Hui-Yi Lin ◽

Leann Myers

Keyword(s):

Generalized Estimating Equations ◽

Goodness Of Fit ◽

Type I Error ◽

Estimating Equations ◽

Error Rates ◽

Type I ◽

Type I Error Rates ◽

Fit Statistics ◽

Generalized Estimating

Download Full-text

Performance of the S − χ2 Statistic for Full-Information Bifactor Models

Educational and Psychological Measurement ◽

10.1177/0013164410392031 ◽

2011 ◽

Vol 71 (6) ◽

pp. 986-1005 ◽

Cited By ~ 10

Author(s):

Ying Li ◽

André A. Rupp

Keyword(s):

Latent Variable ◽

Type I Error ◽

Latent Trait ◽

Structural Similarity ◽

Error Rates ◽

Type I ◽

Full Information ◽

Multidimensional Item Response ◽

Type I Error Rates ◽

Bifactor Models

This study investigated the Type I error rate and power of the multivariate extension of the S − χ2 statistic using unidimensional and multidimensional item response theory (UIRT and MIRT, respectively) models as well as full-information bifactor (FI-bifactor) models through simulation. Manipulated factors included test length, sample size, latent trait characteristics such as discrimination pattern and intertrait correlations, and model type misspecification. The nominal Type I error rates were observed under all conditions. The power of the S − χ2 statistic for UIRT models was high for MIRT and FI-bifactor models that were structurally most distinct from the UIRT models but was low otherwise. The power of the S − χ2 statistic to detect misfitting between MIRT and FI-bifactor models was low across all conditions because of the structural similarity of these two models. Finally, information-based indices of relative model–data fit and latent variable correlations were obtained, and these showed expected patterns across conditions.

Download Full-text

Effects of Test Level Discrimination and Difficulty on Answer-Copying Indices

International Journal of Evaluation and Research in Education (IJERE) ◽

10.11591/ijere.v7i1.11488 ◽

2018 ◽

Vol 7 (1) ◽

pp. 32

Author(s):

Onder Sunbul ◽

Seha Yormaz

Keyword(s):

Type I Error ◽

Error Rates ◽

Type I ◽

Test Length ◽

Power Performance ◽

Detection Rates ◽

Power Rate ◽

Test Level ◽

Answer Copying ◽

Better Than

In this study Type I Error and the power rates of ω and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable nominal alpha levels. The power study showed that average test difficulty was very effective for power (true detection) rates of indices. Clear patterns were observed for the increase of test difficulty in favor of both ω and GBT power rate. Contrary to expectations; average test discrimination was not as effective as average test difficulty. The results of the interaction effects of item discrimination and difficulty showed that for the cases whose b parameters were lower than 0 with weak discrimination, indices had weak power for both ω and GBT. In addition, for the cases whose b parameter levels were below zero with high discrimination indices, the power performance of both answer-copying indices were very weak. Results for test length showed that with the increase of test length the power rate of both ω and GBT tended to increase. Also, ω performed slightly better than GBT or very close to GBT for 80-item test length however, ω performed better than GBT in terms of power rate for the cases with 40-item test length

Download Full-text

Effects of Test Level Discrimination and Difficulty on Answer-Copying Indices

International Journal of Evaluation and Research in Education (IJERE) ◽

10.11591/ijere.v1i1.11488 ◽

2018 ◽

Vol 7 (1) ◽

pp. 32

Author(s):

Önder Sünbül ◽

Seha Yormaz

Keyword(s):

Type I Error ◽

Error Rates ◽

Type I ◽

Test Length ◽

Power Performance ◽

Detection Rates ◽

Power Rate ◽

Test Level ◽

Answer Copying ◽

Better Than

In this study Type I Error and the power rates of ω and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable nominal alpha levels. The power study showed that average test difficulty was very effective for power (true detection) rates of indices. Clear patterns were observed for the increase of test difficulty in favor of both ω and GBT power rate. Contrary to expectations; average test discrimination was not as effective as average test difficulty. The results of the interaction effects of item discrimination and difficulty showed that for the cases whose b parameters were lower than 0 with weak discrimination, indices had weak power for both ω and GBT. In addition, for the cases whose b parameter levels were below zero with high discrimination indices, the power performance of both answer-copying indices were very weak. Results for test length showed that with the increase of test length the power rate of both ω and GBT tended to increase. Also, ω performed slightly better than GBT or very close to GBT for 80-item test length however, ω performed better than GBT in terms of power rate for the cases with 40-item test length

Download Full-text

Assessing Person Fit With the Information Matrix Test

Methodology ◽

10.1027/1614-2241/a000085 ◽

2015 ◽

Vol 11 (1) ◽

pp. 3-12 ◽

Cited By ~ 2

Author(s):

Jochen Ranger ◽

Jörg-Tobias Kuhn

Keyword(s):

Simulation Study ◽

Type I Error ◽

Information Matrix ◽

Small Samples ◽

Type I ◽

Person Fit ◽

Power Of The Test ◽

Order Expansion ◽

Trait Stability ◽

Information Matrix Test

In this manuscript, a new approach to the analysis of person fit is presented that is based on the information matrix test of White (1982) . This test can be interpreted as a test of trait stability during the measurement situation. The test follows approximately a χ2-distribution. In small samples, the approximation can be improved by a higher-order expansion. The performance of the test is explored in a simulation study. This simulation study suggests that the test adheres to the nominal Type-I error rate well, although it tends to be conservative in very short scales. The power of the test is compared to the power of four alternative tests of person fit. This comparison corroborates that the power of the information matrix test is similar to the power of the alternative tests. Advantages and areas of application of the information matrix test are discussed.

Download Full-text

Type I error rates and power of several versions of scaled chi-square difference tests in investigations of measurement invariance.

Psychological Methods ◽

10.1037/met0000097 ◽

2017 ◽

Vol 22 (3) ◽

pp. 467-485 ◽

Cited By ~ 4

Author(s):

Jordan Campbell Brace ◽

Victoria Savalei

Keyword(s):

Measurement Invariance ◽

Type I Error ◽

Error Rates ◽

Type I ◽

Chi Square ◽

Type I Error Rates

Download Full-text

Correction: “Influence of Selection Bias on the Test Decision – A Simulation Study”

Methods of Information in Medicine ◽

10.3414/me11-01-0043e ◽

2014 ◽

Vol 53 (05) ◽

pp. 343-343

Keyword(s):

Selection Bias ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

Block Size ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Representation Error ◽

Numeric Representation

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.

Download Full-text

The Use of Theory of Linear Mixed-Effects Models to Detect Fraudulent Erasures at an Aggregate Level

Educational and Psychological Measurement ◽

10.1177/0013164421994893 ◽

2021 ◽

pp. 001316442199489

Author(s):

Luyao Peng ◽

Sandip Sinharay

Keyword(s):

Type I Error ◽

Real Data ◽

Mixed Effects ◽

Error Rates ◽

Mixed Effects Models ◽

Type I ◽

Aggregate Level ◽

Linear Mixed Effects Models ◽

Linear Mixed Effects ◽

Best Linear Unbiased

Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of Wollack and Eckerly (2017) and Sinharay (2018) and suggests a new aggregate-level EDI by incorporating the empirical best linear unbiased predictor from the literature of linear mixed-effects models (e.g., McCulloch et al., 2008). A simulation study shows that the new EDI has larger power than the indices of Wollack and Eckerly (2017) and Sinharay (2018). In addition, the new index has satisfactory Type I error rates. A real data example is also included.

Download Full-text