Analyzing Item Difficulty and Discrimination in a Dichotomously Scored Writing Test: Focus on Classical Testing Theorem and Item Response Theory

2016 ◽  
Vol 21 (3) ◽  
pp. 235-235
Author(s):  
Ho Lee
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yunsoo Lee ◽  
Ji Hoon Song ◽  
Soo Jung Kim

Purpose This paper aims to validate the Korean version of the decent work scale and examine the relationship between decent work and work engagement. Design/methodology/approach After completing translation and back translation, the authors surveyed 266 Korean employees from various organizations via network sampling. They assessed Rasch’s model based on item response theory. In addition, they used classical test theory to evaluate the decent work scale’s validity and reliability. Findings The authors found that the current version of the decent work scale has good validity, reliability and item difficulty, and decent work has a positive relationship with work engagement. However, based on item response theory, the assessment showed that three of the items are extremely similar to another item within the same dimension, implying that the items are unable to discriminate among individual traits. Originality/value This study validated the decent work scale in a Korean work environment using Rasch’s (1960) model from the perspective of item response theory.


2001 ◽  
Vol 9 (1) ◽  
pp. 5-22 ◽  
Author(s):  
Cheryl T. Beck ◽  
Robert K. Gable

The benefits of item response theory (IRT) analysis in obtaining empirical support for construct validity make it an essential step in the instrument development process. IRT analysis can result in finer construct interpretations that lead to more thorough descriptions of low- and high-scoring respondents. A critical function of IRT is its ability to determine the adequacy with which the attitude continuum underlying each dimension is assessed by the respective items in an instrument. Many nurse researchers, however, are not reaping the benefits of IRT in the development of affective instruments. The purpose of this article is to familiarize nurse researchers with this valuable approach through a description of the Facets computer program. Facets uses a one parameter (i.e., item difficulty) Rasch measurement model. Data from a survey of 525 new mothers that assessed the psychometric properties of the Postpartum Depresssion Screening Scale are used to illustrate the Facets program. It is hoped that IRT will gain increased prominence in affective instrument development as more nurse researchers become aware of computer programs such as Facets to assist in analysis.


2015 ◽  
Vol 58 (3) ◽  
pp. 865-877 ◽  
Author(s):  
Gerasimos Fergadiotis ◽  
Stacey Kellough ◽  
William D. Hula

Purpose In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Method Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. Results The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Conclusions Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.


2019 ◽  
Vol 9 (2) ◽  
pp. 133-146
Author(s):  
Yance Manoppo ◽  
Djemari Mardapi

This study aimed to reveal: (1) the characteristics of items of Chemistry Test in National Examination by using the classical test theory and item response theory; (2) the amount of cheating which occured by using Angoff's B-index Method, Pair 1 Method, Pair 2 Method, Modified Error Similarity Analysis (MESA) Method, and G2 Method; (3) the methods that detect more cheating in the implementation of the Chemistry Test in National Examination for high schools in the year 2011/2012 in Maluku Province. The results of the analysis with the classical test theory approach show that 77.5% items have item difficulty functioning well, 55% items have discrimination yet qualified and 70% items have distractor that works well with the index reliability test of 0,772. The analysis using the item response theory approach shows that 14 (35%) items fit with the model, the maximum function information is 11,4069 at θ = -1,6, and the magnitude of the error of measurement is 2,296. The number of pairs who are suspected of cheating is as follows: 13 pairs according to Angoff's B-index Method, 212 pairs according to Pair 1 Method, 444 pairs according to Pair 2 Method, 7 pairs according to MESA Method, and 102 pairs according to G2 Method. The most widely detecting cheating in a row is a   Pair 2, Pair 1, G2, Angoff's B-index, and MESA.


2019 ◽  
Vol 23 (4) ◽  
pp. 275-283
Author(s):  
Ling Wang ◽  
John W. Nelson

The aim of the study is to evaluate psychometric properties of the Chinese version of Caring Factor Survey-Caring of Manager (CFS-CM), which evaluated by using with classical test theory (CTT) and item response theory (IRT). CTT analyses evaluate include internal consistence reliability, test–retest reliability and construct validity. IRT analyses were conducted to test the unidimensionality, item fit, item difficulty, the reliability, and rating scale analysis. CTT showed good psychometric properties of the CFS-CM. However, IRT revealed some problems of category level. Taking the above issue into consideration, it could be beneficial to perfect the CFS-CM in the future.


2021 ◽  
Author(s):  
Paul Silvia ◽  
Rebekah Rodriguez ◽  
James C. Kaufman ◽  
Roni Reiter-Palmon ◽  
Jeb S. Puryear

The original 90-item Creative Behavior Inventory (CBI) was a landmark self-report scale in creativity research, and the 28-item brief form developed nearly 20 years ago is a popular measure of everyday creativity. Relatively little is known, however, about the psychometric properties of this widely used scale. In the current research, we conduct a detailed psychometric investigation into the 28-item CBI by applying methods from item response theory using a sample of 2,082 adults. Our investigation revealed several strengths of the current scale: excellent reliability, suitable dimensionality, appropriate item difficulty, and reasonably good item discrimination. Several areas for improvement were highlighted as well: (1) the four-point response scale should probably have fewer options; (2) a handful of items showed gender-based differential item functioning, indicating some item bias; and (3) local dependence statistics revealed clusters of items that are probably redundant. These analyses support the continued use of the CBI for assessing engagement in everyday creative behaviors and suggest that the CBI could benefit from thoughtful revision.


Author(s):  
Yousef A. Al Mahrouq

This study explored the effect of item difficulty and sample size on the accuracy of equating by using item response theory. This study used simulation data. The equating method was evaluated using an equating criterion (SEE, RMSE). Standard error of equating between the criterion scores and equated scores, and root mean square error of equating (RMSE) were used as measures to compare the method to the criterion equating. The results indicated that the large sample size reduces the standard error of the equating and reduces residuals. The results also showed that different difficulty models tend to produce smaller standard errors and the values of RMSE. The similar difficulty models tend to produce decreasing standard errors and the values of RMSE.


2021 ◽  
pp. 019394592110159
Author(s):  
Wen Liu ◽  
Lilian Dindo ◽  
Katherine Hadlandsmyth ◽  
George Jay Unick ◽  
M. Bridget Zimmerman ◽  
...  

Little research has compared item functioning of the Patient-Reported Outcomes Measurement Information System (PROMIS®) anxiety short form 6a and the generalized anxiety disorder 7-item scale using item response theory models. This was a secondary analysis of self-reported assessments from 67 at-risk U.S. military veterans. The two measures performed comparably well with data fitting adequately to models, acceptable item discriminations, and item and test information curves being unimodal and symmetric. The PROMIS® anxiety short form 6a performed better in that item difficulty estimates had a wider range and distributed more evenly and all response categories had less floor effect, while the third category in most items of the generalized anxiety disorder 7-item scale were rarely used. While both measures may be appropriate, findings provided preliminary information supporting use of the PROMIS® anxiety short form 6a as potentially preferable, especially for veterans with low-to-moderate anxiety. Further testing is needed in larger, more diverse samples.


Sign in / Sign up

Export Citation Format

Share Document