Detecting DIF in Multidimensional Forced Choice Measures Using the Thurstonian Item Response Theory Model

2020 ◽  
pp. 109442812095982
Author(s):  
Philseok Lee ◽  
Seang-Hwane Joo ◽  
Stephen Stark

Although modern item response theory (IRT) methods of test construction and scoring have overcome ipsativity problems historically associated with multidimensional forced choice (MFC) formats, there has been little research on MFC differential item functioning (DIF) detection, where item refers to a block, or group, of statements presented for an examinee’s consideration. This research investigated DIF detection with three-alternative MFC items based on the Thurstonian IRT (TIRT) model, using omnibus Wald tests on loadings and thresholds. We examined constrained and free baseline model comparisons strategies with different types and magnitudes of DIF, latent trait correlations, sample sizes, and levels of impact in an extensive Monte Carlo study. Results indicated the free baseline strategy was highly effective in detecting DIF, with power approaching 1.0 in the large sample size and large magnitude of DIF conditions, and similar effectiveness in the impact and no-impact conditions. This research also included an empirical example to demonstrate the viability of the best performing method with real examinees and showed how a DIF and a DTF effect size measure can be used to assess the practical significance of MFC DIF findings.

Psychometrika ◽  
2021 ◽  
Author(s):  
Susanne Frick

AbstractThe multidimensional forced-choice (MFC) format has been proposed to reduce faking because items within blocks can be matched on desirability. However, the desirability of individual items might not transfer to the item blocks. The aim of this paper is to propose a mixture item response theory model for faking in the MFC format that allows to estimate the fakability of MFC blocks, termed the Faking Mixture model. Given current computing capabilities, within-subject data from both high- and low-stakes contexts are needed to estimate the model. A simulation showed good parameter recovery under various conditions. An empirical validation showed that matching was necessary but not sufficient to create an MFC questionnaire that can reduce faking. The Faking Mixture model can be used to reduce fakability during test construction.


Assessment ◽  
2019 ◽  
Vol 27 (4) ◽  
pp. 706-718 ◽  
Author(s):  
Kate E. Walton ◽  
Lina Cherkasova ◽  
Richard D. Roberts

Forced choice (FC) measures may be a desirable alternative to single stimulus (SS) Likert items, which are easier to fake and can have associated response biases. However, classical methods of scoring FC measures lead to ipsative data, which have a number of psychometric problems. A Thurstonian item response theory (TIRT) model has been introduced as a way to overcome these issues, but few empirical validity studies have been conducted to ensure its effectiveness. This was the goal of the current three studies, which used FC measures of domains from popular personality frameworks including the Big Five and HEXACO, and both statement and adjective item stems. We computed TIRT and ipsative scores and compared their validity estimates. Convergent and discriminant validity of the scores were evaluated by correlating them with SS scores, and test-criterion validity evidence was evaluated by examining their relationships with meaningful outcomes. In all three studies, there was evidence for the convergent and test-criterion validity of the TIRT scores, though at times this was on par with the validity of the ipsative scores. The discriminant validity of the TIRT scores was problematic and was often worse than the ipsative scores.


2022 ◽  
Vol 12 ◽  
Author(s):  
Feifei Huang ◽  
Zhe Li ◽  
Ying Liu ◽  
Jingan Su ◽  
Li Yin ◽  
...  

Educational assessments tests are often constructed using testlets because of the flexibility to test various aspects of the cognitive activities and broad content sampling. However, the violation of the local item independence assumption is inevitable when tests are built using testlet items. In this study, simulations are conducted to evaluate the performance of item response theory models and testlet response theory models for both the dichotomous and polytomous items in the context of equating tests composed of testlets. We also examine the impact of testlet effect, length of testlet items, and sample size on estimating item and person parameters. The results show that more accurate performance of testlet response theory models over item response theory models was consistently observed across the studies, which supports the benefits of using the testlet response theory models in equating for tests composed of testlets. Further, results of the study indicate that when sample size is large, item response theory models performed similarly to testlet response theory models across all studies.


2019 ◽  
Vol 80 (3) ◽  
pp. 578-603
Author(s):  
HyeSun Lee ◽  
Weldon Z. Smith

Based on the framework of testlet models, the current study suggests the Bayesian random block item response theory (BRB IRT) model to fit forced-choice formats where an item block is composed of three or more items. To account for local dependence among items within a block, the BRB IRT model incorporated a random block effect into the response function and used a Markov Chain Monte Carlo procedure for simultaneous estimation of item and trait parameters. The simulation results demonstrated that the BRB IRT model performed well for the estimation of item and trait parameters and for screening those with relatively low scores on target traits. As found in the literature, the composition of item blocks was crucial for model performance; negatively keyed items were required for item blocks. The empirical application showed the performance of the BRB IRT model was equivalent to that of the Thurstonian IRT model. The potential advantage of the BRB IRT model as a base for more complex measurement models was also demonstrated by incorporating gender as a covariate into the BRB IRT model to explain response probabilities. Recommendations for the adoption of forced-choice formats were provided along with the discussion about using negatively keyed items.


2021 ◽  
pp. 073428292110037
Author(s):  
Carlos Calderón Carvajal ◽  
Carmen Ximénez Gómez ◽  
Siu Lay-Lisboa ◽  
Mauricio Briceño

Kolb’s Learning Style Inventory (LSI) continues to generate a great debate among researchers, given the contradictory evidence resulting from its psychometric properties. One primary criticism focuses on the artificiality of the results derived from its internal structure because of the ipsative nature of the forced-choice format. This study seeks to contribute to the resolution of this debate. A short version of Kolb’s LSI with a forced-choice format and an additional inventory scored on a Likert scale was completed by a sample of students of the University Católica del Norte in Antofagasta, Chile. The data obtained from the two forms of the reduced version of the LSI were compared using principal component analysis, confirmatory factor analysis, and the Thurstonian Item Response Theory model. The results support the hypothesis of the existence of four learning mode dimensions. However, they do not support the existence of the learning styles as proposed by Kolb, indicating that said reports are the product of the artificial structure generated by the ipsative forced-choice format .


Sign in / Sign up

Export Citation Format

Share Document