A Bayesian Random Block Item Response Theory Model for Forced-Choice Formats

Based on the framework of testlet models, the current study suggests the Bayesian random block item response theory (BRB IRT) model to fit forced-choice formats where an item block is composed of three or more items. To account for local dependence among items within a block, the BRB IRT model incorporated a random block effect into the response function and used a Markov Chain Monte Carlo procedure for simultaneous estimation of item and trait parameters. The simulation results demonstrated that the BRB IRT model performed well for the estimation of item and trait parameters and for screening those with relatively low scores on target traits. As found in the literature, the composition of item blocks was crucial for model performance; negatively keyed items were required for item blocks. The empirical application showed the performance of the BRB IRT model was equivalent to that of the Thurstonian IRT model. The potential advantage of the BRB IRT model as a base for more complex measurement models was also demonstrated by incorporating gender as a covariate into the BRB IRT model to explain response probabilities. Recommendations for the adoption of forced-choice formats were provided along with the discussion about using negatively keyed items.

Download Full-text

On the Validity of Forced Choice Scores Derived From the Thurstonian Item Response Theory Model

Assessment ◽

10.1177/1073191119843585 ◽

2019 ◽

Vol 27 (4) ◽

pp. 706-718 ◽

Cited By ~ 3

Author(s):

Kate E. Walton ◽

Lina Cherkasova ◽

Richard D. Roberts

Keyword(s):

Item Response Theory ◽

Item Response ◽

Discriminant Validity ◽

Criterion Validity ◽

Theory Model ◽

Forced Choice ◽

Response Theory ◽

Test Criterion ◽

Convergent And Discriminant Validity ◽

Response Biases

Forced choice (FC) measures may be a desirable alternative to single stimulus (SS) Likert items, which are easier to fake and can have associated response biases. However, classical methods of scoring FC measures lead to ipsative data, which have a number of psychometric problems. A Thurstonian item response theory (TIRT) model has been introduced as a way to overcome these issues, but few empirical validity studies have been conducted to ensure its effectiveness. This was the goal of the current three studies, which used FC measures of domains from popular personality frameworks including the Big Five and HEXACO, and both statement and adjective item stems. We computed TIRT and ipsative scores and compared their validity estimates. Convergent and discriminant validity of the scores were evaluated by correlating them with SS scores, and test-criterion validity evidence was evaluated by examining their relationships with meaningful outcomes. In all three studies, there was evidence for the convergent and test-criterion validity of the TIRT scores, though at times this was on par with the validity of the ipsative scores. The discriminant validity of the TIRT scores was problematic and was often worse than the ipsative scores.

Download Full-text

Analisis Kualitas Tes Buatan Guru Melalui Pendekatan Item Response Theory (IRT) Model Rasch

Tarbawy : Jurnal Pendidikan Islam ◽

10.32923/tarbawy.v7i1.1187 ◽

2020 ◽

Vol 7 (1) ◽

pp. 61-70

Author(s):

Dinar Pratama

Keyword(s):

Item Response Theory ◽

Item Response ◽

Theory Model ◽

Item Response Theory Model ◽

Response Theory ◽

Irt Model ◽

Item Fit

Tujuan utama penelitian ini dilakukan adalah untuk menganalisis dan mendeskripsikan karakteristik khusus tes buatan guru Akidah Akhlak melalui pendekatan Item Response Theory (IRT) model Rasch. Jenis penelitian ini termasuk penelitian kuantitatif deskriptif. Subjek pada penelitian ini berjumlah 67 pola respon siswa terhadap tes dengan lima alternatif jawaban. Perangkat tes buatan guru ini diambil dari hasil pelaksanaan Ujian Akhir Semester tahun pelajaran 2018/2019 melalui teknik dokumentasi. Analisis data kuantitatif dilakukan melalui pendekatan IRT model Rasch dengan bantuan software QUEST. Berdasarkan hasil analisis, dari 30 item terdapat 28 item fit dengan model Rasch dengan nilai OUTFIT t ≤ 2.00. Ditinjau dari tingkat kesulitan item, terdapat 7 item atau sebesar 25% dengan kategori sangat sulit. Item dengan kategori sulit sebanyak 6 item atau 21.4%, kategori item sedang sebanyak 2 item atau sebesar 7.14%, kategori mudah sebanyak 13 item atau sebesar 46.4%, dan 0% untuk kategori item soal sangat mudah. Rentang nilai tingkat kesukaran berkisar antara -2.94 sampai 4.18. Nilai reliability of item estimate sebesar 0.94 dengan kategori baik sekali dan nilai reliability of case estimate sebesar 0.38 dengan kategori lemah. Berdasarkan nilai reliability of case estimate, tes ini perlu dilakukan revisi agar sesuai dengan kemampuan peserta tes. Kata Kunci: Tes, Item Response Theory, Model Rasch

Download Full-text

The Bayesian Multilevel Trifactor Item Response Theory Model

Educational and Psychological Measurement ◽

10.1177/0013164418806694 ◽

2018 ◽

Vol 79 (3) ◽

pp. 462-494 ◽

Cited By ~ 2

Author(s):

Ken A. Fujimoto

Keyword(s):

Item Response Theory ◽

Item Response ◽

Latent Trait ◽

Theory Model ◽

Response Theory ◽

Irt Model ◽

Response Data ◽

Proposed Model ◽

Method Effects ◽

Bayesian Irt

Advancements in item response theory (IRT) have led to models for dual dependence, which control for cluster and method effects during a psychometric analysis. Currently, however, this class of models does not include one that controls for when the method effects stem from two method sources in which one source functions differently across the aspects of another source (i.e., a nested method–source interaction). For this study, then, a Bayesian IRT model is proposed, one that accounts for such interaction among method sources while controlling for the clustering of individuals within the sample. The proposed model accomplishes these tasks by specifying a multilevel trifactor structure for the latent trait space. Details of simulations are also reported. These simulations demonstrate that this model can identify when item response data represent a multilevel trifactor structure, and it does so in data from samples as small as 250 cases nested within 50 clusters. Additionally, the simulations show that misleading estimates for the item discriminations could arise when the trifactor structure reflected in the data is not correctly accounted for. The utility of the model is also illustrated through the analysis of empirical data.

Download Full-text

Reviewing the Structure of Kolb’s Learning Style Inventory From Factor Analysis and Thurstonian Item Response Theory (IRT) Model Approaches

Journal of Psychoeducational Assessment ◽

10.1177/07342829211003739 ◽

2021 ◽

pp. 073428292110037

Author(s):

Carlos Calderón Carvajal ◽

Carmen Ximénez Gómez ◽

Siu Lay-Lisboa ◽

Mauricio Briceño

Keyword(s):

Factor Analysis ◽

Item Response Theory ◽

Item Response ◽

Learning Style ◽

Principal Component ◽

Theory Model ◽

Forced Choice ◽

Short Version ◽

Response Theory ◽

Learning Style Inventory

Kolb’s Learning Style Inventory (LSI) continues to generate a great debate among researchers, given the contradictory evidence resulting from its psychometric properties. One primary criticism focuses on the artificiality of the results derived from its internal structure because of the ipsative nature of the forced-choice format. This study seeks to contribute to the resolution of this debate. A short version of Kolb’s LSI with a forced-choice format and an additional inventory scored on a Likert scale was completed by a sample of students of the University Católica del Norte in Antofagasta, Chile. The data obtained from the two forms of the reduced version of the LSI were compared using principal component analysis, confirmatory factor analysis, and the Thurstonian Item Response Theory model. The results support the hypothesis of the existence of four learning mode dimensions. However, they do not support the existence of the learning styles as proposed by Kolb, indicating that said reports are the product of the artificial structure generated by the ipsative forced-choice format .

Download Full-text

Modeling Faking in the Multidimensional Forced-Choice Format: The Faking Mixture Model

Psychometrika ◽

10.1007/s11336-021-09818-6 ◽

2021 ◽

Author(s):

Susanne Frick

Keyword(s):

Item Response Theory ◽

Item Response ◽

Mixture Model ◽

Test Construction ◽

Theory Model ◽

Forced Choice ◽

Empirical Validation ◽

Response Theory ◽

Parameter Recovery ◽

Good Parameter

AbstractThe multidimensional forced-choice (MFC) format has been proposed to reduce faking because items within blocks can be matched on desirability. However, the desirability of individual items might not transfer to the item blocks. The aim of this paper is to propose a mixture item response theory model for faking in the MFC format that allows to estimate the fakability of MFC blocks, termed the Faking Mixture model. Given current computing capabilities, within-subject data from both high- and low-stakes contexts are needed to estimate the model. A simulation showed good parameter recovery under various conditions. An empirical validation showed that matching was necessary but not sufficient to create an MFC questionnaire that can reduce faking. The Faking Mixture model can be used to reduce fakability during test construction.

Download Full-text

The Four-Parameter Logistic Item Response Theory Model As a Robust Method of Estimating Ability Despite Aberrant Responses

Social Behavior and Personality An International Journal ◽

10.2224/sbp.2012.40.10.1679 ◽

2012 ◽

Vol 40 (10) ◽

pp. 1679-1694 ◽

Cited By ~ 18

Author(s):

Wen-Wei Liao ◽

Rong-Guey Ho ◽

Yung-Chin Yen ◽

Hsu-Chen Cheng

Keyword(s):

Item Response Theory ◽

Item Response ◽

Computerized Adaptive Testing ◽

Theory Model ◽

Ability Estimation ◽

Response Theory ◽

Test Items ◽

Irt Model ◽

Efficiency Measures ◽

3Pl Model

In computerized adaptive testing (CAT), aberrant responses such as careless errors and lucky guesses may cause significant ability estimation biases in the dynamic administration of test items. We investigated the robustness of the 4-parameter logistic item response theory (4PL IRT; Barton & Lord, 1981) model in comparison with the 3-parameter logistic (3PL) IRT model (Birnbaum, 1968). We applied additional precision and efficiency measures to evaluate the 4PL IRT model. We measured the precision of CAT with respect to the estimation bias and mean absolute differences (MAD) between estimated and actual abilities. An improvement in administrative efficiency is reflected in fewer items being required for satisfying the stopping rule. Our results indicate that the 4PL IRT model provides a more efficient and robust ability estimation than the 3PL model.

Download Full-text

Modeling Multidimensional Forced Choice Measures with the Zinnes and Griggs Pairwise Preference Item Response Theory Model

Multivariate Behavioral Research ◽

10.1080/00273171.2021.1960142 ◽

2021 ◽

pp. 1-21

Author(s):

Seang-Hwane Joo ◽

Philseok Lee ◽

Stephen Stark

Keyword(s):

Item Response Theory ◽

Item Response ◽

Theory Model ◽

Forced Choice ◽

Item Response Theory Model ◽

Response Theory ◽

Pairwise Preference

Download Full-text

Detecting DIF in Multidimensional Forced Choice Measures Using the Thurstonian Item Response Theory Model

Organizational Research Methods ◽

10.1177/1094428120959822 ◽

2020 ◽

pp. 109442812095982

Author(s):

Philseok Lee ◽

Seang-Hwane Joo ◽

Stephen Stark

Keyword(s):

Item Response Theory ◽

Item Response ◽

Test Construction ◽

Theory Model ◽

Forced Choice ◽

Practical Significance ◽

Large Sample Size ◽

Response Theory ◽

Study Results ◽

The Impact

Although modern item response theory (IRT) methods of test construction and scoring have overcome ipsativity problems historically associated with multidimensional forced choice (MFC) formats, there has been little research on MFC differential item functioning (DIF) detection, where item refers to a block, or group, of statements presented for an examinee’s consideration. This research investigated DIF detection with three-alternative MFC items based on the Thurstonian IRT (TIRT) model, using omnibus Wald tests on loadings and thresholds. We examined constrained and free baseline model comparisons strategies with different types and magnitudes of DIF, latent trait correlations, sample sizes, and levels of impact in an extensive Monte Carlo study. Results indicated the free baseline strategy was highly effective in detecting DIF, with power approaching 1.0 in the large sample size and large magnitude of DIF conditions, and similar effectiveness in the impact and no-impact conditions. This research also included an empirical example to demonstrate the viability of the best performing method with real examinees and showed how a DIF and a DTF effect size measure can be used to assess the practical significance of MFC DIF findings.

Download Full-text

Assessing Dimensionality of the Ideal Point Item Response Theory Model Using Posterior Predictive Model Checking

Organizational Research Methods ◽

10.1177/10944281211050609 ◽

2021 ◽

pp. 109442812110506

Author(s):

Seang-Hwane Joo ◽

Philseok Lee ◽

Jung Yeon Park ◽

Stephen Stark

Keyword(s):

Item Response Theory ◽

Model Checking ◽

Predictive Model ◽

Item Response ◽

Ideal Point ◽

Theory Model ◽

Type I ◽

Response Theory ◽

Posterior Predictive Model Checking ◽

Irt Model

Although the use of ideal point item response theory (IRT) models for organizational research has increased over the last decade, the assessment of construct dimensionality of ideal point scales has been overlooked in previous research. In this study, we developed and evaluated dimensionality assessment methods for an ideal point IRT model under the Bayesian framework. We applied the posterior predictive model checking (PPMC) approach to the most widely used ideal point IRT model, the generalized graded unfolding model (GGUM). We conducted a Monte Carlo simulation to compare the performance of item pair discrepancy statistics and to evaluate the Type I error and power rates of the methods. The simulation results indicated that the Bayesian dimensionality detection method controlled Type I errors reasonably well across the conditions. In addition, the proposed method showed better performance than existing methods, yielding acceptable power when 20% of the items were generated from the secondary dimension. Organizational implications and limitations of the study are further discussed.

Download Full-text