Consequences of Ignoring Guessing Effects on Measurement Invariance Analysis

2021 ◽  
pp. 014662162110139
Author(s):  
Ismail Cuhadar ◽  
Yanyun Yang ◽  
Insu Paek

Pseudo-guessing parameters are present in item response theory applications for many educational assessments. When sample size is not sufficiently large, the guessing parameters may be ignored from the analysis. This study examines the impact of ignoring pseudo-guessing parameters on measurement invariance analysis, specifically, on item difficulty, item discrimination, and mean and variance of ability distribution. Results show that when non-zero guessing parameters are ignored from the measurement invariance analysis, item discrimination estimates tend to decrease particularly for more difficult items, and item difficulty estimates decrease unless the items are highly discriminating and difficult. As the guessing parameter increases, the size of the decrease in item discrimination and difficulty tends to increase, and the estimated mean and variance of ability distribution tend to be inaccurate. When two groups have heterogeneous ability distributions, ignoring the guessing parameter affects the reference group and the focal group differently. Implications of result findings are discussed.

2022 ◽  
Vol 12 ◽  
Author(s):  
Feifei Huang ◽  
Zhe Li ◽  
Ying Liu ◽  
Jingan Su ◽  
Li Yin ◽  
...  

Educational assessments tests are often constructed using testlets because of the flexibility to test various aspects of the cognitive activities and broad content sampling. However, the violation of the local item independence assumption is inevitable when tests are built using testlet items. In this study, simulations are conducted to evaluate the performance of item response theory models and testlet response theory models for both the dichotomous and polytomous items in the context of equating tests composed of testlets. We also examine the impact of testlet effect, length of testlet items, and sample size on estimating item and person parameters. The results show that more accurate performance of testlet response theory models over item response theory models was consistently observed across the studies, which supports the benefits of using the testlet response theory models in equating for tests composed of testlets. Further, results of the study indicate that when sample size is large, item response theory models performed similarly to testlet response theory models across all studies.


Author(s):  
Yousef A. Al Mahrouq

This study explored the effect of item difficulty and sample size on the accuracy of equating by using item response theory. This study used simulation data. The equating method was evaluated using an equating criterion (SEE, RMSE). Standard error of equating between the criterion scores and equated scores, and root mean square error of equating (RMSE) were used as measures to compare the method to the criterion equating. The results indicated that the large sample size reduces the standard error of the equating and reduces residuals. The results also showed that different difficulty models tend to produce smaller standard errors and the values of RMSE. The similar difficulty models tend to produce decreasing standard errors and the values of RMSE.


2021 ◽  
pp. 019459982110183
Author(s):  
David T. Liu ◽  
Katie M. Philips ◽  
Marlene M. Speth ◽  
Gerold Besser ◽  
Christian A. Mueller ◽  
...  

Objective The SNOT-22 (22-item Sinonasal Outcome Test) is a high-quality outcome measure that assesses chronic rhinosinusitis–specific quality of life. The aim of this study was to gain greater insight into the information provided by the SNOT-22 by determining its item-based psychometric properties. Study Design Retrospective cohort study. Setting Tertiary care academic centers. Methods This study used a previously described data set of the SNOT-22 completed by 800 patients with chronic rhinosinusitis. Item response theory graded response models were used to determine parameters reflecting item discrimination, difficulty, and information provided by each item toward the SNOT-22 subdomain to which it belonged. Results The unconstrained graded response model fitted the SNOT-22 data best. Item discrimination parameters and total information provided showed the greatest variability within the nasal subdomain, and the item related to sense of smell/taste demonstrated the lowest discrimination and provided the least amount of information overall. The dizziness item provided disparately lower total information and discrimination in the otologic/facial pain subdomain. Items in the sleep and emotional subdomains generally provided high discrimination. While items in the nasal, sleep, and otologic/facial pain subdomains spanned all levels of difficulty, emotional subdomain items covered higher levels of difficulty, indicating greater information provided at higher levels of disease severity. Conclusion The item-specific psychometric properties of the SNOT-22 support it as a high-quality instrument. Our results suggest the need and possibility for revision of the smell/taste dysfunction item, for example its wording, to improve its ability to discriminate among the different levels of disease burden.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yunsoo Lee ◽  
Ji Hoon Song ◽  
Soo Jung Kim

Purpose This paper aims to validate the Korean version of the decent work scale and examine the relationship between decent work and work engagement. Design/methodology/approach After completing translation and back translation, the authors surveyed 266 Korean employees from various organizations via network sampling. They assessed Rasch’s model based on item response theory. In addition, they used classical test theory to evaluate the decent work scale’s validity and reliability. Findings The authors found that the current version of the decent work scale has good validity, reliability and item difficulty, and decent work has a positive relationship with work engagement. However, based on item response theory, the assessment showed that three of the items are extremely similar to another item within the same dimension, implying that the items are unable to discriminate among individual traits. Originality/value This study validated the decent work scale in a Korean work environment using Rasch’s (1960) model from the perspective of item response theory.


2017 ◽  
Vol 6 (4) ◽  
pp. 113
Author(s):  
Esin Yilmaz Kogar ◽  
Hülya Kelecioglu

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and sample size change, and then to compare the obtained results. Mathematic test in PISA 2012 was employed as the data collection tool, and 36 items were used to constitute six different data sets containing different numbers of testlets and independent items. Subsequently, from these constituted data sets, three different sample sizes of 250, 500 and 1000 persons were selected randomly. When the findings of the research were examined, it was determined that, generally the lowest mean error values were those obtained from UIRT, and TRT yielded a mean of error estimation lower than that of BIF. It was found that, under all conditions, models which take into consideration the local dependency have provided a better model-data compatibility than UIRT, generally there is no meaningful difference between BIF and TRT, and both models can be used for those data sets. It can be said that when there is a meaningful difference between those two models, generally BIF yields a better result. In addition, it has been determined that, in each sample size and data set, item and ability parameters and correlations of errors of the parameters are generally high.


2009 ◽  
Vol 15 (5) ◽  
pp. 758-768 ◽  
Author(s):  
OTTO PEDRAZA ◽  
NEILL R. GRAFF-RADFORD ◽  
GLENN E. SMITH ◽  
ROBERT J. IVNIK ◽  
FLOYD B. WILLIS ◽  
...  

AbstractScores on the Boston Naming Test (BNT) are frequently lower for African American when compared with Caucasian adults. Although demographically based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo’s Older Americans and Older African Americans Normative Studies. Under a two-parameter logistic item response theory framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Of these 12 items, 6 (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance. (JINS, 2009, 15, 758–768.)


Sign in / Sign up

Export Citation Format

Share Document