Consequences of Ignoring Guessing Effects on Measurement Invariance Analysis

Pseudo-guessing parameters are present in item response theory applications for many educational assessments. When sample size is not sufficiently large, the guessing parameters may be ignored from the analysis. This study examines the impact of ignoring pseudo-guessing parameters on measurement invariance analysis, specifically, on item difficulty, item discrimination, and mean and variance of ability distribution. Results show that when non-zero guessing parameters are ignored from the measurement invariance analysis, item discrimination estimates tend to decrease particularly for more difficult items, and item difficulty estimates decrease unless the items are highly discriminating and difficult. As the guessing parameter increases, the size of the decrease in item discrimination and difficulty tends to increase, and the estimated mean and variance of ability distribution tend to be inaccurate. When two groups have heterogeneous ability distributions, ignoring the guessing parameter affects the reference group and the focal group differently. Implications of result findings are discussed.

Download Full-text

أثر صعوبة الفقرة و حجم العينة في دقة معادلة درجات الاختبارات باستخدام نظرية استجابة الفقرة IRT = Effect of Item Difficulty and Sample Size on the Accuracy of Equating by Using Item Response Theory

Journal of Educational and Psychological Studies [JEPS] ◽

10.12816/0026667 ◽

2016 ◽

Vol 10 (1) ◽

pp. 182-200

Author(s):

يوسف عبد العاطي المحروق

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Item Difficulty ◽

Response Theory

Download Full-text

An Extension of Testlet-Based Equating to the Polytomous Testlet Response Theory Model

Frontiers in Psychology ◽

10.3389/fpsyg.2021.743362 ◽

2022 ◽

Vol 12 ◽

Author(s):

Feifei Huang ◽

Zhe Li ◽

Ying Liu ◽

Jingan Su ◽

Li Yin ◽

...

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Theory Model ◽

Response Theory ◽

Polytomous Items ◽

Testlet Response Theory ◽

Accurate Performance ◽

The Impact ◽

Item Response Theory Models

Educational assessments tests are often constructed using testlets because of the flexibility to test various aspects of the cognitive activities and broad content sampling. However, the violation of the local item independence assumption is inevitable when tests are built using testlet items. In this study, simulations are conducted to evaluate the performance of item response theory models and testlet response theory models for both the dichotomous and polytomous items in the context of equating tests composed of testlets. We also examine the impact of testlet effect, length of testlet items, and sample size on estimating item and person parameters. The results show that more accurate performance of testlet response theory models over item response theory models was consistently observed across the studies, which supports the benefits of using the testlet response theory models in equating for tests composed of testlets. Further, results of the study indicate that when sample size is large, item response theory models performed similarly to testlet response theory models across all studies.

Download Full-text

Effect of Item Difficulty and Sample Size on the Accuracy of Equating by Using Item Response Theory

Journal of Educational and Psychological Studies [JEPS] ◽

10.24200/jeps.vol10iss1pp182-200 ◽

2016 ◽

Vol 10 (1) ◽

pp. 182

Author(s):

Yousef A. Al Mahrouq

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Standard Error ◽

Item Difficulty ◽

Standard Errors ◽

Large Sample Size ◽

Mean Square ◽

Response Theory ◽

Simulation Data

This study explored the effect of item difficulty and sample size on the accuracy of equating by using item response theory. This study used simulation data. The equating method was evaluated using an equating criterion (SEE, RMSE). Standard error of equating between the criterion scores and equated scores, and root mean square error of equating (RMSE) were used as measures to compare the method to the criterion equating. The results indicated that the large sample size reduces the standard error of the equating and reduces residuals. The results also showed that different difficulty models tend to produce smaller standard errors and the values of RMSE. The similar difficulty models tend to produce decreasing standard errors and the values of RMSE.

Download Full-text

Test Length and Sample Size for Item-Difficulty Parameter Estimation in Item Response Theory

10.7176/jep/10-30-08 ◽

2019 ◽

Keyword(s):

Parameter Estimation ◽

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Item Difficulty ◽

Test Length ◽

Response Theory

Download Full-text

Analyzing Item Difficulty and Discrimination in a Dichotomously Scored Writing Test: Focus on Classical Testing Theorem and Item Response Theory

Studies in English Education ◽

10.22275/see.21.3.09 ◽

2016 ◽

Vol 21 (3) ◽

pp. 235-235

Author(s):

Ho Lee

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Difficulty ◽

Response Theory ◽

Writing Test

Download Full-text

Item Response Theory for Psychometric Properties of the SNOT-22 (22-Item Sinonasal Outcome Test)

Otolaryngology ◽

10.1177/01945998211018383 ◽

2021 ◽

pp. 019459982110183

Author(s):

David T. Liu ◽

Katie M. Philips ◽

Marlene M. Speth ◽

Gerold Besser ◽

Christian A. Mueller ◽

...

Keyword(s):

Item Response Theory ◽

Psychometric Properties ◽

Item Response ◽

Chronic Rhinosinusitis ◽

Facial Pain ◽

Item Discrimination ◽

High Quality ◽

Graded Response ◽

Sinonasal Outcome Test ◽

Snot 22

Objective The SNOT-22 (22-item Sinonasal Outcome Test) is a high-quality outcome measure that assesses chronic rhinosinusitis–specific quality of life. The aim of this study was to gain greater insight into the information provided by the SNOT-22 by determining its item-based psychometric properties. Study Design Retrospective cohort study. Setting Tertiary care academic centers. Methods This study used a previously described data set of the SNOT-22 completed by 800 patients with chronic rhinosinusitis. Item response theory graded response models were used to determine parameters reflecting item discrimination, difficulty, and information provided by each item toward the SNOT-22 subdomain to which it belonged. Results The unconstrained graded response model fitted the SNOT-22 data best. Item discrimination parameters and total information provided showed the greatest variability within the nasal subdomain, and the item related to sense of smell/taste demonstrated the lowest discrimination and provided the least amount of information overall. The dizziness item provided disparately lower total information and discrimination in the otologic/facial pain subdomain. Items in the sleep and emotional subdomains generally provided high discrimination. While items in the nasal, sleep, and otologic/facial pain subdomains spanned all levels of difficulty, emotional subdomain items covered higher levels of difficulty, indicating greater information provided at higher levels of disease severity. Conclusion The item-specific psychometric properties of the SNOT-22 support it as a high-quality instrument. Our results suggest the need and possibility for revision of the smell/taste dysfunction item, for example its wording, to improve its ability to discriminate among the different levels of disease burden.

Download Full-text

Validation study of the Korean version of decent work scale

European Journal of Training and Development ◽

10.1108/ejtd-03-2021-0040 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yunsoo Lee ◽

Ji Hoon Song ◽

Soo Jung Kim

Keyword(s):

Item Response Theory ◽

Item Response ◽

Work Engagement ◽

Item Difficulty ◽

Test Theory ◽

Decent Work ◽

Validity And Reliability ◽

Response Theory ◽

Content Type ◽

Korean Version

Purpose This paper aims to validate the Korean version of the decent work scale and examine the relationship between decent work and work engagement. Design/methodology/approach After completing translation and back translation, the authors surveyed 266 Korean employees from various organizations via network sampling. They assessed Rasch’s model based on item response theory. In addition, they used classical test theory to evaluate the decent work scale’s validity and reliability. Findings The authors found that the current version of the decent work scale has good validity, reliability and item difficulty, and decent work has a positive relationship with work engagement. However, based on item response theory, the assessment showed that three of the items are extremely similar to another item within the same dimension, implying that the items are unable to discriminate among individual traits. Originality/value This study validated the decent work scale in a Korean work environment using Rasch’s (1960) model from the perspective of item response theory.

Download Full-text

Examination of Different Item Response Theory Models on Tests Composed of Testlets

Journal of Education and Learning ◽

10.5539/jel.v6n4p113 ◽

2017 ◽

Vol 6 (4) ◽

pp. 113

Author(s):

Esin Yilmaz Kogar ◽

Hülya Kelecioglu

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Data Sets ◽

Meaningful Difference ◽

Response Theory ◽

Size Change ◽

Data Set ◽

Testlet Response Theory ◽

Item Response Theory Models

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and sample size change, and then to compare the obtained results. Mathematic test in PISA 2012 was employed as the data collection tool, and 36 items were used to constitute six different data sets containing different numbers of testlets and independent items. Subsequently, from these constituted data sets, three different sample sizes of 250, 500 and 1000 persons were selected randomly. When the findings of the research were examined, it was determined that, generally the lowest mean error values were those obtained from UIRT, and TRT yielded a mean of error estimation lower than that of BIF. It was found that, under all conditions, models which take into consideration the local dependency have provided a better model-data compatibility than UIRT, generally there is no meaningful difference between BIF and TRT, and both models can be used for those data sets. It can be said that when there is a meaningful difference between those two models, generally BIF yields a better result. In addition, it has been determined that, in each sample size and data set, item and ability parameters and correlations of errors of the parameters are generally high.

Download Full-text

سلوك دالة المعلومات بتغير نسب فقد الاستجابة وحجم العينة في ضوء نظرية استجابة الفقرة = The Behavior of the Information Function Related to the Percentages of Missing Response and Sample Size Depending on Item Response Theory

مجلة الزرقاء للبحوث و الدراسات الإنسانية ◽

10.12816/0054762 ◽

2018 ◽

Vol 18 (3) ◽

pp. 429-440

Author(s):

بني عواد ، علي محمد العرسان

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Information Function ◽

Response Theory ◽

Missing Response

Download Full-text

Differential item functioning of the Boston Naming Test in cognitively normal African American and Caucasian older adults

Journal of the International Neuropsychological Society ◽

10.1017/s1355617709990361 ◽

2009 ◽

Vol 15 (5) ◽

pp. 758-768 ◽

Cited By ~ 33

Author(s):

OTTO PEDRAZA ◽

NEILL R. GRAFF-RADFORD ◽

GLENN E. SMITH ◽

ROBERT J. IVNIK ◽

FLOYD B. WILLIS ◽

...

Keyword(s):

African American ◽

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Group Performance ◽

Response Theory ◽

Boston Naming Test ◽

Item Functioning ◽

The Impact ◽

Naming Test

AbstractScores on the Boston Naming Test (BNT) are frequently lower for African American when compared with Caucasian adults. Although demographically based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo’s Older Americans and Older African Americans Normative Studies. Under a two-parameter logistic item response theory framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Of these 12 items, 6 (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance. (JINS, 2009, 15, 758–768.)

Download Full-text