Assessment of Differential Statement Functioning in Ipsative Tests With Multidimensional Forced-Choice Items

Applied Psychological Measurement ◽

10.1177/0146621620965739 ◽

2020 ◽

pp. 014662162096573

Author(s):

Xue-Lan Qiu ◽

Wen-Chung Wang

Keyword(s):

Item Response Theory ◽

Item Response ◽

Standard Error ◽

Forced Choice ◽

Simulation Studies ◽

Response Theory ◽

Career Interest ◽

Response Biases ◽

Item Response Theory Models

Ipsative tests with multidimensional forced-choice (MFC) items have been widely used to assess career interest, values, and personality to prevent response biases. Recently, there has been a surge of interest in developing item response theory models for MFC items. In reality, a statement in an MFC item may have different utilities for different groups, which is referred to as differential statement functioning (DSF). However, few studies have been investigated methods for detecting DSF owing to the challenges related to the features of ipsative tests. In this study, three methods were adapted for DSF assessment in MFC items: equal-mean-utility (EMU), all-other-statement (AOS), and constant-statement (CS). Simulation studies were conducted to evaluate the recovery of parameters and the performance of the proposed methods. Results showed that statement parameters and DSF parameters were well recovered for all the three methods when the test did not contain any DSF statement. When the test contained one or more DSF statements, only the CS method yielded accurate estimates. With respect to DSF assessment, both the EMU method using the bootstrap standard error and the AOS method performed appropriately so long as the test did not contain any DSF statement. The CS method performed well in cases where one or more DSF-free statements were chosen as an anchor. The longer the anchor statements, the higher the power of DSF detection.

Download Full-text

On the Validity of Forced Choice Scores Derived From the Thurstonian Item Response Theory Model

Assessment ◽

10.1177/1073191119843585 ◽

2019 ◽

Vol 27 (4) ◽

pp. 706-718 ◽

Cited By ~ 3

Author(s):

Kate E. Walton ◽

Lina Cherkasova ◽

Richard D. Roberts

Keyword(s):

Item Response Theory ◽

Item Response ◽

Discriminant Validity ◽

Criterion Validity ◽

Theory Model ◽

Forced Choice ◽

Response Theory ◽

Test Criterion ◽

Convergent And Discriminant Validity ◽

Response Biases

Forced choice (FC) measures may be a desirable alternative to single stimulus (SS) Likert items, which are easier to fake and can have associated response biases. However, classical methods of scoring FC measures lead to ipsative data, which have a number of psychometric problems. A Thurstonian item response theory (TIRT) model has been introduced as a way to overcome these issues, but few empirical validity studies have been conducted to ensure its effectiveness. This was the goal of the current three studies, which used FC measures of domains from popular personality frameworks including the Big Five and HEXACO, and both statement and adjective item stems. We computed TIRT and ipsative scores and compared their validity estimates. Convergent and discriminant validity of the scores were evaluated by correlating them with SS scores, and test-criterion validity evidence was evaluated by examining their relationships with meaningful outcomes. In all three studies, there was evidence for the convergent and test-criterion validity of the TIRT scores, though at times this was on par with the validity of the ipsative scores. The discriminant validity of the TIRT scores was problematic and was often worse than the ipsative scores.

Download Full-text

SIMULATION STUDIES APPLYING POSTERIOR PREDICTIVE MODEL CHECKING FOR ASSESSING FIT OF THE COMMON ITEM RESPONSE THEORY MODELS

ETS Research Report Series ◽

10.1002/j.2333-8504.2003.tb01920.x ◽

2003 ◽

Vol 2003 (2) ◽

pp. i-55 ◽

Cited By ~ 14

Author(s):

Sandip Sinharay ◽

Matthew S. Johnson

Keyword(s):

Item Response Theory ◽

Model Checking ◽

Predictive Model ◽

Item Response ◽

Simulation Studies ◽

Response Theory ◽

Posterior Predictive Model Checking ◽

Common Item ◽

The Common ◽

Item Response Theory Models

Download Full-text

Item Response Theory Models for Ipsative Tests With Multidimensional Pairwise Comparison Items

Applied Psychological Measurement ◽

10.1177/0146621617703183 ◽

2017 ◽

Vol 41 (8) ◽

pp. 600-613 ◽

Cited By ~ 9

Author(s):

Wen-Chung Wang ◽

Xue-Lan Qiu ◽

Chia-Wen Chen ◽

Sage Ro ◽

Kuan-Yu Jin

Keyword(s):

Item Response Theory ◽

Item Response ◽

Pairwise Comparison ◽

Forced Choice ◽

Measurement Properties ◽

Response Theory ◽

Psychological Differentiation ◽

Parameter Recovery ◽

Latent Traits ◽

Item Response Theory Models

There is re-emerging interest in adopting forced-choice items to address the issue of response bias in Likert-type items for noncognitive latent traits. Multidimensional pairwise comparison (MPC) items are commonly used forced-choice items. However, few studies have been aimed at developing item response theory models for MPC items owing to the challenges associated with ipsativity. Acknowledging that the absolute scales of latent traits are not identifiable in ipsative tests, this study developed a Rasch ipsative model for MPC items that has desirable measurement properties, yields a single utility value for each statement, and allows for comparing psychological differentiation between and within individuals. The simulation results showed a good parameter recovery for the new model with existing computer programs. This article provides an empirical example of an ipsative test on work style and behaviors.

Download Full-text

Two item response theory models for analysing normative forced-choice personality items

British Journal of Mathematical and Statistical Psychology ◽

10.1348/000711005x64691 ◽

2006 ◽

Vol 59 (2) ◽

pp. 379-395 ◽

Cited By ~ 1

Author(s):

Pere J. Ferrando

Keyword(s):

Item Response Theory ◽

Item Response ◽

Forced Choice ◽

Response Theory ◽

Item Response Theory Models

Download Full-text

Different approaches to modeling response styles in divide-by-total item response theory models (part 1): A model integration.

Psychological Methods ◽

10.1037/met0000249 ◽

2020 ◽

Vol 25 (5) ◽

pp. 560-576

Author(s):

Mirka Henninger ◽

Thorsten Meiser

Keyword(s):

Item Response Theory ◽

Item Response ◽

Response Styles ◽

Model Integration ◽

Response Theory ◽

Item Response Theory Models ◽

Total Item

Download Full-text

Using Item Response Theory Models to Evaluate the Practice Environment Scale

Journal of Nursing Measurement ◽

10.1891/1061-3749.22.2.323 ◽

2014 ◽

Vol 22 (2) ◽

pp. 323-341 ◽

Cited By ~ 6

Author(s):

Dheeraj Raju ◽

Xiaogang Su ◽

Patricia A. Patrician

Keyword(s):

Item Response Theory ◽

Item Response ◽

Information Criterion ◽

Partial Credit Model ◽

Practice Environment ◽

Partial Credit ◽

Response Theory ◽

Environment Scale ◽

Graded Response ◽

Item Response Theory Models

Background and Purpose: The purpose of this article is to introduce different types of item response theory models and to demonstrate their usefulness by evaluating the Practice Environment Scale. Methods: Item response theory models such as constrained and unconstrained graded response model, partial credit model, Rasch model, and one-parameter logistic model are demonstrated. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) indices are used as model selection criterion. Results: The unconstrained graded response and partial credit models indicated the best fit for the data. Almost all items in the instrument performed well. Conclusions: Although most of the items strongly measure the construct, there are a few items that could be eliminated without substantially altering the instrument. The analysis revealed that the instrument may function differently when administered to different unit types.

Download Full-text

Robust maximum marginal likelihood (RMML) estimation for item response theory models

Behavior Research Methods ◽

10.3758/s13428-018-1150-4 ◽

2018 ◽

Vol 51 (2) ◽

pp. 573-588 ◽

Cited By ~ 3

Author(s):

Maxwell R. Hong ◽

Ying Cheng

Keyword(s):

Item Response Theory ◽

Item Response ◽

Marginal Likelihood ◽

Response Theory ◽

Maximum Marginal Likelihood ◽

Item Response Theory Models

Download Full-text

How Often Is the Misfit of Item Response Theory Models Practically Significant?

Educational Measurement Issues and Practice ◽

10.1111/emip.12024 ◽

2014 ◽

Vol 33 (1) ◽

pp. 23-35 ◽

Cited By ~ 29

Author(s):

Sandip Sinharay ◽

Shelby J. Haberman

Keyword(s):

Item Response Theory ◽

Item Response ◽

Response Theory ◽

Item Response Theory Models

Download Full-text

Prior Sensitivity of the Posterior Predictive Checks Method for Item Response Theory Models

Measurement Interdisciplinary Research and Perspectives ◽

10.1080/15366367.2018.1502026 ◽

2018 ◽

Vol 16 (4) ◽

pp. 239-255

Author(s):

Allison J. Ames

Keyword(s):

Item Response Theory ◽

Item Response ◽

Response Theory ◽

Posterior Predictive Checks ◽

Item Response Theory Models

Download Full-text

Examination of Different Item Response Theory Models on Tests Composed of Testlets

Journal of Education and Learning ◽

10.5539/jel.v6n4p113 ◽

2017 ◽

Vol 6 (4) ◽

pp. 113

Author(s):

Esin Yilmaz Kogar ◽

Hülya Kelecioglu

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Data Sets ◽

Meaningful Difference ◽

Response Theory ◽

Size Change ◽

Data Set ◽

Testlet Response Theory ◽

Item Response Theory Models

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and sample size change, and then to compare the obtained results. Mathematic test in PISA 2012 was employed as the data collection tool, and 36 items were used to constitute six different data sets containing different numbers of testlets and independent items. Subsequently, from these constituted data sets, three different sample sizes of 250, 500 and 1000 persons were selected randomly. When the findings of the research were examined, it was determined that, generally the lowest mean error values were those obtained from UIRT, and TRT yielded a mean of error estimation lower than that of BIF. It was found that, under all conditions, models which take into consideration the local dependency have provided a better model-data compatibility than UIRT, generally there is no meaningful difference between BIF and TRT, and both models can be used for those data sets. It can be said that when there is a meaningful difference between those two models, generally BIF yields a better result. In addition, it has been determined that, in each sample size and data set, item and ability parameters and correlations of errors of the parameters are generally high.

Download Full-text