The Four-Parameter Logistic Item Response Theory Model As a Robust Method of Estimating Ability Despite Aberrant Responses

In computerized adaptive testing (CAT), aberrant responses such as careless errors and lucky guesses may cause significant ability estimation biases in the dynamic administration of test items. We investigated the robustness of the 4-parameter logistic item response theory (4PL IRT; Barton & Lord, 1981) model in comparison with the 3-parameter logistic (3PL) IRT model (Birnbaum, 1968). We applied additional precision and efficiency measures to evaluate the 4PL IRT model. We measured the precision of CAT with respect to the estimation bias and mean absolute differences (MAD) between estimated and actual abilities. An improvement in administrative efficiency is reflected in fewer items being required for satisfying the stopping rule. Our results indicate that the 4PL IRT model provides a more efficient and robust ability estimation than the 3PL model.

Download Full-text

Analisis Kualitas Tes Buatan Guru Melalui Pendekatan Item Response Theory (IRT) Model Rasch

Tarbawy : Jurnal Pendidikan Islam ◽

10.32923/tarbawy.v7i1.1187 ◽

2020 ◽

Vol 7 (1) ◽

pp. 61-70

Author(s):

Dinar Pratama

Keyword(s):

Item Response Theory ◽

Item Response ◽

Theory Model ◽

Item Response Theory Model ◽

Response Theory ◽

Irt Model ◽

Item Fit

Tujuan utama penelitian ini dilakukan adalah untuk menganalisis dan mendeskripsikan karakteristik khusus tes buatan guru Akidah Akhlak melalui pendekatan Item Response Theory (IRT) model Rasch. Jenis penelitian ini termasuk penelitian kuantitatif deskriptif. Subjek pada penelitian ini berjumlah 67 pola respon siswa terhadap tes dengan lima alternatif jawaban. Perangkat tes buatan guru ini diambil dari hasil pelaksanaan Ujian Akhir Semester tahun pelajaran 2018/2019 melalui teknik dokumentasi. Analisis data kuantitatif dilakukan melalui pendekatan IRT model Rasch dengan bantuan software QUEST. Berdasarkan hasil analisis, dari 30 item terdapat 28 item fit dengan model Rasch dengan nilai OUTFIT t ≤ 2.00. Ditinjau dari tingkat kesulitan item, terdapat 7 item atau sebesar 25% dengan kategori sangat sulit. Item dengan kategori sulit sebanyak 6 item atau 21.4%, kategori item sedang sebanyak 2 item atau sebesar 7.14%, kategori mudah sebanyak 13 item atau sebesar 46.4%, dan 0% untuk kategori item soal sangat mudah. Rentang nilai tingkat kesukaran berkisar antara -2.94 sampai 4.18. Nilai reliability of item estimate sebesar 0.94 dengan kategori baik sekali dan nilai reliability of case estimate sebesar 0.38 dengan kategori lemah. Berdasarkan nilai reliability of case estimate, tes ini perlu dilakukan revisi agar sesuai dengan kemampuan peserta tes. Kata Kunci: Tes, Item Response Theory, Model Rasch

Download Full-text

A Bayesian Random Block Item Response Theory Model for Forced-Choice Formats

Educational and Psychological Measurement ◽

10.1177/0013164419871659 ◽

2019 ◽

Vol 80 (3) ◽

pp. 578-603

Author(s):

HyeSun Lee ◽

Weldon Z. Smith

Keyword(s):

Item Response Theory ◽

Item Response ◽

Model Performance ◽

Theory Model ◽

Forced Choice ◽

Simultaneous Estimation ◽

Response Theory ◽

Irt Model ◽

Measurement Models ◽

Random Block

Based on the framework of testlet models, the current study suggests the Bayesian random block item response theory (BRB IRT) model to fit forced-choice formats where an item block is composed of three or more items. To account for local dependence among items within a block, the BRB IRT model incorporated a random block effect into the response function and used a Markov Chain Monte Carlo procedure for simultaneous estimation of item and trait parameters. The simulation results demonstrated that the BRB IRT model performed well for the estimation of item and trait parameters and for screening those with relatively low scores on target traits. As found in the literature, the composition of item blocks was crucial for model performance; negatively keyed items were required for item blocks. The empirical application showed the performance of the BRB IRT model was equivalent to that of the Thurstonian IRT model. The potential advantage of the BRB IRT model as a base for more complex measurement models was also demonstrated by incorporating gender as a covariate into the BRB IRT model to explain response probabilities. Recommendations for the adoption of forced-choice formats were provided along with the discussion about using negatively keyed items.

Download Full-text

The Bayesian Multilevel Trifactor Item Response Theory Model

Educational and Psychological Measurement ◽

10.1177/0013164418806694 ◽

2018 ◽

Vol 79 (3) ◽

pp. 462-494 ◽

Cited By ~ 2

Author(s):

Ken A. Fujimoto

Keyword(s):

Item Response Theory ◽

Item Response ◽

Latent Trait ◽

Theory Model ◽

Response Theory ◽

Irt Model ◽

Response Data ◽

Proposed Model ◽

Method Effects ◽

Bayesian Irt

Advancements in item response theory (IRT) have led to models for dual dependence, which control for cluster and method effects during a psychometric analysis. Currently, however, this class of models does not include one that controls for when the method effects stem from two method sources in which one source functions differently across the aspects of another source (i.e., a nested method–source interaction). For this study, then, a Bayesian IRT model is proposed, one that accounts for such interaction among method sources while controlling for the clustering of individuals within the sample. The proposed model accomplishes these tasks by specifying a multilevel trifactor structure for the latent trait space. Details of simulations are also reported. These simulations demonstrate that this model can identify when item response data represent a multilevel trifactor structure, and it does so in data from samples as small as 250 cases nested within 50 clusters. Additionally, the simulations show that misleading estimates for the item discriminations could arise when the trifactor structure reflected in the data is not correctly accounted for. The utility of the model is also illustrated through the analysis of empirical data.

Download Full-text

A Sharing Item Response Theory Model for Computerized Adaptive Testing

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986029004439 ◽

2004 ◽

Vol 29 (4) ◽

pp. 439-460 ◽

Cited By ~ 14

Author(s):

Daniel O. Segall

Keyword(s):

Item Response Theory ◽

Item Response ◽

Computerized Adaptive Testing ◽

Control Method ◽

Theory Model ◽

Item Selection ◽

Exposure Control ◽

Response Theory ◽

Item Responses ◽

Scoring Algorithms

A new sharing item response theory (SIRT) model is presented that explicitly models the effects of sharing item content between informants and test takers. This model is used to construct adaptive item selection and scoring rules that provide increased precision and reduced score gains in instances where sharing occurs. The adaptive item selection rules are expressed as functions of the item’s exposure rate in addition to other commonly used properties (characterized by difficulty, discrimination, and guessing parameters). Based on the results of simulated item responses, the new item selection and scoring algorithms compare favorably to the Sympson–Hetter exposure control method. The new SIRT approach provides higher reliability and lower score gains in instances where sharing occurs.

Download Full-text

An Item Response Theory–Based, Computerized Adaptive Testing Version of the MacArthur–Bates Communicative Development Inventory: Words & Sentences (CDI:WS)

Journal of Speech Language and Hearing Research ◽

10.1044/2015_jslhr-l-15-0202 ◽

2016 ◽

Vol 59 (2) ◽

pp. 281-289 ◽

Cited By ~ 5

Author(s):

Guido Makransky ◽

Philip S. Dale ◽

Philip Havmose ◽

Dorthe Bleses

Keyword(s):

Item Response Theory ◽

Item Response ◽

Computerized Adaptive Testing ◽

Real Data ◽

Adaptive Testing ◽

Measurement Precision ◽

Response Theory ◽

Irt Model ◽

Communicative Development ◽

Accuracy And Precision

Purpose This study investigated the feasibility and potential validity of an item response theory (IRT)–based computerized adaptive testing (CAT) version of the MacArthur–Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining measurement precision. Method Parent-reported vocabulary for the American CDI:WS norming sample consisting of 1,461 children between the ages of 16 and 30 months was used to investigate the fit of the items to the 2-parameter logistic IRT model and to simulate CDI-CAT versions with 400, 200, 100, 50, 25, 10, and 5 items. Results All but 14 items fit the 2-parameter logistic IRT model, and real data simulations of CDI-CATs with at least 50 items recovered full CDI scores with correlations over .95. Furthermore, the CDI-CATs with at least 50 items had similar correlations with age and socioeconomic status as the full CDI:WS. Conclusion These results provide strong evidence that a CAT version of the CDI:WS has the potential to reduce length while maintaining the accuracy and precision of the full instrument.

Download Full-text

Assessing Dimensionality of the Ideal Point Item Response Theory Model Using Posterior Predictive Model Checking

Organizational Research Methods ◽

10.1177/10944281211050609 ◽

2021 ◽

pp. 109442812110506

Author(s):

Seang-Hwane Joo ◽

Philseok Lee ◽

Jung Yeon Park ◽

Stephen Stark

Keyword(s):

Item Response Theory ◽

Model Checking ◽

Predictive Model ◽

Item Response ◽

Ideal Point ◽

Theory Model ◽

Type I ◽

Response Theory ◽

Posterior Predictive Model Checking ◽

Irt Model

Although the use of ideal point item response theory (IRT) models for organizational research has increased over the last decade, the assessment of construct dimensionality of ideal point scales has been overlooked in previous research. In this study, we developed and evaluated dimensionality assessment methods for an ideal point IRT model under the Bayesian framework. We applied the posterior predictive model checking (PPMC) approach to the most widely used ideal point IRT model, the generalized graded unfolding model (GGUM). We conducted a Monte Carlo simulation to compare the performance of item pair discrepancy statistics and to evaluate the Type I error and power rates of the methods. The simulation results indicated that the Bayesian dimensionality detection method controlled Type I errors reasonably well across the conditions. In addition, the proposed method showed better performance than existing methods, yielding acceptable power when 20% of the items were generated from the secondary dimension. Organizational implications and limitations of the study are further discussed.

Download Full-text

Effect Size Measures for Bi-Factor Testlet Item Response Theory Model

PsycEXTRA Dataset ◽

10.1037/e589212013-001 ◽

2013 ◽

Author(s):

Akihito Kamata ◽

Chalie Patarapichayatham

Keyword(s):

Item Response Theory ◽

Item Response ◽

Effect Size ◽

Theory Model ◽

Item Response Theory Model ◽

Response Theory

Download Full-text

A Multidimensional Item Response Theory Model for Continuous and Graded Responses With Error in Persons and Items

Educational and Psychological Measurement ◽

10.1177/0013164421998412 ◽

2021 ◽

pp. 001316442199841

Author(s):

Pere J. Ferrando ◽

David Navarro-González

Keyword(s):

Item Response Theory ◽

Item Response ◽

Theory Model ◽

Response Model ◽

Response Theory ◽

Continuous Response ◽

Graded Responses ◽

Graded Response ◽

Continuous Responses ◽

Differential Measurement Error

Item response theory “dual” models (DMs) in which both items and individuals are viewed as sources of differential measurement error so far have been proposed only for unidimensional measures. This article proposes two multidimensional extensions of existing DMs: the M-DTCRM (dual Thurstonian continuous response model), intended for (approximately) continuous responses, and the M-DTGRM (dual Thurstonian graded response model), intended for ordered-categorical responses (including binary). A rationale for the extension to the multiple-content-dimensions case, which is based on the concept of the multidimensional location index, is first proposed and discussed. Then, the models are described using both the factor-analytic and the item response theory parameterizations. Procedures for (a) calibrating the items, (b) scoring individuals, (c) assessing model appropriateness, and (d) assessing measurement precision are finally discussed. The simulation results suggest that the proposal is quite feasible, and an illustrative example based on personality data is also provided. The proposals are submitted to be of particular interest for the case of multidimensional questionnaires in which the number of items per scale would not be enough for arriving at stable estimates if the existing unidimensional DMs were fitted on a separate-scale basis.

Download Full-text

Exploring the Robustness of a Unidimensional Item Response Theory Model With Empirically Multidimensional Data

Applied Measurement in Education ◽

10.1080/08957347.2017.1316277 ◽

2017 ◽

Vol 30 (3) ◽

pp. 163-177 ◽

Cited By ~ 5

Author(s):

Daniel Anderson ◽

Joshua D. Kahn ◽

Gerald Tindal

Keyword(s):

Item Response Theory ◽

Item Response ◽

Theory Model ◽

Item Response Theory Model ◽

Multidimensional Data ◽

Response Theory

Download Full-text

Employment of Item Response Theory to measure change in Children's Analogical Thinking Modifiability Test

Estudos de Psicologia (Campinas) ◽

10.1590/s0103-166x2013000400001 ◽

2013 ◽

Vol 30 (4) ◽

pp. 479-486

Author(s):

Odoisa Antunes de Queiroz ◽

Ricardo Primi ◽

Lucas de Francisco Carvalho ◽

Sônia Regina Fiorim Enumo

Keyword(s):

Item Response Theory ◽

Item Response ◽

Dynamic Testing ◽

Intermediate Phase ◽

Cognitive Testing ◽

Difficulty Level ◽

Response Theory ◽

Analogical Thinking ◽

Test Items ◽

Post Test

Dynamic testing, with an intermediate phase of assistance, measures changes between pretest and post-test assuming a common metric between them. To test this assumption we applied the Item Response Theory in the responses of 69 children to dynamic cognitive testing Children's Analogical Thinking Modifiability Test adapted, with 12 items, totaling 828 responses, with the purpose of verifying if the original scale yields the same results as the equalized scale obtained by Item Response Theory in terms of "changes quantifying". We followed the steps: 1) anchorage of the pre and post-test items through a cognitive analysis, finding 3 common items; 2) estimation of the items' difficulty level parameter and comparison of those; 3) equalization of the items and estimation of "thetas"; 4) comparison of the scales. The Children's Analogical Thinking Modifiability Test metric was similar to that estimated by the TRI, but it is necessary to differentiate the pre and post-test items' difficulty, adjusting it to samples with high and low performance.

Download Full-text