The Four-Parameter Logistic Item Response Theory Model As a Robust Method of Estimating Ability Despite Aberrant Responses

2012 ◽  
Vol 40 (10) ◽  
pp. 1679-1694 ◽  
Author(s):  
Wen-Wei Liao ◽  
Rong-Guey Ho ◽  
Yung-Chin Yen ◽  
Hsu-Chen Cheng

In computerized adaptive testing (CAT), aberrant responses such as careless errors and lucky guesses may cause significant ability estimation biases in the dynamic administration of test items. We investigated the robustness of the 4-parameter logistic item response theory (4PL IRT; Barton & Lord, 1981) model in comparison with the 3-parameter logistic (3PL) IRT model (Birnbaum, 1968). We applied additional precision and efficiency measures to evaluate the 4PL IRT model. We measured the precision of CAT with respect to the estimation bias and mean absolute differences (MAD) between estimated and actual abilities. An improvement in administrative efficiency is reflected in fewer items being required for satisfying the stopping rule. Our results indicate that the 4PL IRT model provides a more efficient and robust ability estimation than the 3PL model.

2020 ◽  
Vol 7 (1) ◽  
pp. 61-70
Author(s):  
Dinar Pratama

Tujuan utama penelitian ini dilakukan adalah untuk menganalisis dan mendeskripsikan karakteristik khusus tes buatan guru Akidah Akhlak melalui pendekatan Item Response Theory (IRT) model Rasch. Jenis penelitian ini termasuk penelitian  kuantitatif deskriptif. Subjek pada penelitian ini berjumlah 67 pola respon siswa terhadap tes dengan lima alternatif jawaban. Perangkat tes buatan guru ini diambil dari hasil pelaksanaan Ujian Akhir Semester tahun pelajaran 2018/2019 melalui teknik dokumentasi. Analisis data kuantitatif dilakukan melalui pendekatan IRT model Rasch dengan bantuan software QUEST. Berdasarkan hasil analisis, dari 30 item terdapat 28 item fit dengan model Rasch dengan nilai OUTFIT t ≤ 2.00. Ditinjau dari tingkat kesulitan item, terdapat 7 item atau sebesar 25% dengan kategori sangat sulit. Item dengan kategori sulit sebanyak 6 item atau 21.4%, kategori item sedang sebanyak 2 item atau sebesar 7.14%, kategori mudah sebanyak 13 item atau sebesar 46.4%, dan 0% untuk kategori item soal sangat mudah. Rentang nilai tingkat kesukaran berkisar antara -2.94 sampai 4.18. Nilai reliability of item estimate sebesar 0.94 dengan kategori baik sekali dan nilai reliability of case estimate sebesar 0.38 dengan kategori lemah. Berdasarkan nilai reliability of case estimate, tes ini perlu dilakukan revisi agar sesuai dengan kemampuan peserta tes. Kata Kunci: Tes, Item Response Theory, Model Rasch


2019 ◽  
Vol 80 (3) ◽  
pp. 578-603
Author(s):  
HyeSun Lee ◽  
Weldon Z. Smith

Based on the framework of testlet models, the current study suggests the Bayesian random block item response theory (BRB IRT) model to fit forced-choice formats where an item block is composed of three or more items. To account for local dependence among items within a block, the BRB IRT model incorporated a random block effect into the response function and used a Markov Chain Monte Carlo procedure for simultaneous estimation of item and trait parameters. The simulation results demonstrated that the BRB IRT model performed well for the estimation of item and trait parameters and for screening those with relatively low scores on target traits. As found in the literature, the composition of item blocks was crucial for model performance; negatively keyed items were required for item blocks. The empirical application showed the performance of the BRB IRT model was equivalent to that of the Thurstonian IRT model. The potential advantage of the BRB IRT model as a base for more complex measurement models was also demonstrated by incorporating gender as a covariate into the BRB IRT model to explain response probabilities. Recommendations for the adoption of forced-choice formats were provided along with the discussion about using negatively keyed items.


2018 ◽  
Vol 79 (3) ◽  
pp. 462-494 ◽  
Author(s):  
Ken A. Fujimoto

Advancements in item response theory (IRT) have led to models for dual dependence, which control for cluster and method effects during a psychometric analysis. Currently, however, this class of models does not include one that controls for when the method effects stem from two method sources in which one source functions differently across the aspects of another source (i.e., a nested method–source interaction). For this study, then, a Bayesian IRT model is proposed, one that accounts for such interaction among method sources while controlling for the clustering of individuals within the sample. The proposed model accomplishes these tasks by specifying a multilevel trifactor structure for the latent trait space. Details of simulations are also reported. These simulations demonstrate that this model can identify when item response data represent a multilevel trifactor structure, and it does so in data from samples as small as 250 cases nested within 50 clusters. Additionally, the simulations show that misleading estimates for the item discriminations could arise when the trifactor structure reflected in the data is not correctly accounted for. The utility of the model is also illustrated through the analysis of empirical data.


2004 ◽  
Vol 29 (4) ◽  
pp. 439-460 ◽  
Author(s):  
Daniel O. Segall

A new sharing item response theory (SIRT) model is presented that explicitly models the effects of sharing item content between informants and test takers. This model is used to construct adaptive item selection and scoring rules that provide increased precision and reduced score gains in instances where sharing occurs. The adaptive item selection rules are expressed as functions of the item’s exposure rate in addition to other commonly used properties (characterized by difficulty, discrimination, and guessing parameters). Based on the results of simulated item responses, the new item selection and scoring algorithms compare favorably to the Sympson–Hetter exposure control method. The new SIRT approach provides higher reliability and lower score gains in instances where sharing occurs.


2016 ◽  
Vol 59 (2) ◽  
pp. 281-289 ◽  
Author(s):  
Guido Makransky ◽  
Philip S. Dale ◽  
Philip Havmose ◽  
Dorthe Bleses

Purpose This study investigated the feasibility and potential validity of an item response theory (IRT)–based computerized adaptive testing (CAT) version of the MacArthur–Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining measurement precision. Method Parent-reported vocabulary for the American CDI:WS norming sample consisting of 1,461 children between the ages of 16 and 30 months was used to investigate the fit of the items to the 2-parameter logistic IRT model and to simulate CDI-CAT versions with 400, 200, 100, 50, 25, 10, and 5 items. Results All but 14 items fit the 2-parameter logistic IRT model, and real data simulations of CDI-CATs with at least 50 items recovered full CDI scores with correlations over .95. Furthermore, the CDI-CATs with at least 50 items had similar correlations with age and socioeconomic status as the full CDI:WS. Conclusion These results provide strong evidence that a CAT version of the CDI:WS has the potential to reduce length while maintaining the accuracy and precision of the full instrument.


2021 ◽  
pp. 109442812110506
Author(s):  
Seang-Hwane Joo ◽  
Philseok Lee ◽  
Jung Yeon Park ◽  
Stephen Stark

Although the use of ideal point item response theory (IRT) models for organizational research has increased over the last decade, the assessment of construct dimensionality of ideal point scales has been overlooked in previous research. In this study, we developed and evaluated dimensionality assessment methods for an ideal point IRT model under the Bayesian framework. We applied the posterior predictive model checking (PPMC) approach to the most widely used ideal point IRT model, the generalized graded unfolding model (GGUM). We conducted a Monte Carlo simulation to compare the performance of item pair discrepancy statistics and to evaluate the Type I error and power rates of the methods. The simulation results indicated that the Bayesian dimensionality detection method controlled Type I errors reasonably well across the conditions. In addition, the proposed method showed better performance than existing methods, yielding acceptable power when 20% of the items were generated from the secondary dimension. Organizational implications and limitations of the study are further discussed.


2021 ◽  
pp. 001316442199841
Author(s):  
Pere J. Ferrando ◽  
David Navarro-González

Item response theory “dual” models (DMs) in which both items and individuals are viewed as sources of differential measurement error so far have been proposed only for unidimensional measures. This article proposes two multidimensional extensions of existing DMs: the M-DTCRM (dual Thurstonian continuous response model), intended for (approximately) continuous responses, and the M-DTGRM (dual Thurstonian graded response model), intended for ordered-categorical responses (including binary). A rationale for the extension to the multiple-content-dimensions case, which is based on the concept of the multidimensional location index, is first proposed and discussed. Then, the models are described using both the factor-analytic and the item response theory parameterizations. Procedures for (a) calibrating the items, (b) scoring individuals, (c) assessing model appropriateness, and (d) assessing measurement precision are finally discussed. The simulation results suggest that the proposal is quite feasible, and an illustrative example based on personality data is also provided. The proposals are submitted to be of particular interest for the case of multidimensional questionnaires in which the number of items per scale would not be enough for arriving at stable estimates if the existing unidimensional DMs were fitted on a separate-scale basis.


2013 ◽  
Vol 30 (4) ◽  
pp. 479-486
Author(s):  
Odoisa Antunes de Queiroz ◽  
Ricardo Primi ◽  
Lucas de Francisco Carvalho ◽  
Sônia Regina Fiorim Enumo

Dynamic testing, with an intermediate phase of assistance, measures changes between pretest and post-test assuming a common metric between them. To test this assumption we applied the Item Response Theory in the responses of 69 children to dynamic cognitive testing Children's Analogical Thinking Modifiability Test adapted, with 12 items, totaling 828 responses, with the purpose of verifying if the original scale yields the same results as the equalized scale obtained by Item Response Theory in terms of "changes quantifying". We followed the steps: 1) anchorage of the pre and post-test items through a cognitive analysis, finding 3 common items; 2) estimation of the items' difficulty level parameter and comparison of those; 3) equalization of the items and estimation of "thetas"; 4) comparison of the scales. The Children's Analogical Thinking Modifiability Test metric was similar to that estimated by the TRI, but it is necessary to differentiate the pre and post-test items' difficulty, adjusting it to samples with high and low performance.


Sign in / Sign up

Export Citation Format

Share Document