A Note on the D-Scoring Method Adapted for Polytomous Test Items

The study aimed to calibrate Oman data of the PIRLS test using the graded response model and to examine the psychometric properties of it, as well as identify the fit and unfit of its items. PIRLS2011 test booklets were used, which consisted of 146 test items (74 dichotomous and 72 polytomous). Items were divided into 13 booklets; each with two blocks (one literary and one informational). PIRLS test booklets were administered to 13 groups of fourth grade students in Sultanate of Oman with a total sample of 10394 students. Assumptions of IRT (unidimensionality and local independence) were examined and supported. Also, item fit was examined and supported using Samejima’s graded response model. The data was analyzed by Multilog7.03 program to estimate both item and ability parameters. Results indicated that the assumptions of IRT were proved. Also, IRT analysis revealed that 8 items showed unfit which represents only 5% of the test items. So, this result confirms that the test has good psychometric properties under the IRT.

Download Full-text

Calibrating PIRLS Test in Sultanate of Oman Using Item Response Theory

Journal of Educational and Psychological Studies [JEPS] ◽

10.24200/jeps.vol13iss3pp496-515 ◽

2019 ◽

Vol 13 (3) ◽

pp. 496

Author(s):

Amal K. Al-zaabi ◽

Abdulhameed Hassan ◽

Rashid S. Al-mehrzi

Keyword(s):

Psychometric Properties ◽

Total Sample ◽

Response Model ◽

Graded Response Model ◽

Sultanate Of Oman ◽

Polytomous Items ◽

Test Items ◽

Item Fit ◽

Graded Response ◽

Fourth Grade Students

The study aimed to calibrate Oman data of the PIRLS test using the graded response model and to examine the psychometric properties of it, as well as identify the fit and unfit of its items. PIRLS2011 test booklets were used, which consisted of 146 test items (74 dichotomous and 72 polytomous). Items were divided into 13 booklets; each with two blocks (one literary and one informational). PIRLS test booklets were administered to 13 groups of fourth grade students in Sultanate of Oman with a total sample of 10394 students. Assumptions of IRT (unidimensionality and local independence) were examined and supported. Also, item fit was examined and supported using Samejima’s graded response model. The data was analyzed by Multilog7.03 program to estimate both item and ability parameters. Results indicated that the assumptions of IRT were proved. Also, IRT analysis revealed that 8 items showed unfit which represents only 5% of the test items. So, this result confirms that the test has good psychometric properties under the IRT.

Download Full-text

The Monotonic Polynomial Graded Response Model: Implementation and a Comparative Study

Applied Psychological Measurement ◽

10.1177/0146621620909897 ◽

2020 ◽

Vol 44 (6) ◽

pp. 465-481

Author(s):

Carl F. Falk

Keyword(s):

Latent Variable ◽

Kernel Smoothing ◽

Marginal Likelihood ◽

Expectation Maximization Algorithm ◽

Real Data ◽

Response Model ◽

Response Functions ◽

Graded Response Model ◽

Graded Response ◽

Maximum Marginal Likelihood

We present a monotonic polynomial graded response (GRMP) model that subsumes the unidimensional graded response model for ordered categorical responses and results in flexible category response functions. We suggest improvements in the parameterization of the polynomial underlying similar models, expand upon an underlying response variable derivation of the model, and in lieu of an overall discrimination parameter we propose an index to aid in interpreting the strength of relationship between the latent variable and underlying item responses. In applications, the GRMP is compared to two approaches: (a) a previously developed monotonic polynomial generalized partial credit (GPCMP) model; and (b) logistic and probit variants of the heteroscedastic graded response (HGR) model that we estimate using maximum marginal likelihood with the expectation–maximization algorithm. Results suggest that the GRMP can fit real data better than the GPCMP and the probit variant of the HGR, but is slightly outperformed by the logistic HGR. Two simulation studies compared the ability of the GRMP and logistic HGR to recover category response functions. While the GRMP showed some ability to recover HGR response functions and those based on kernel smoothing, the HGR was more specific in the types of response functions it could recover. In general, the GRMP and HGR make different assumptions regarding the underlying response variables, and can result in different category response function shapes.

Download Full-text

An Examination of Item Response Theory Item Fit Indices for the Graded Response Model

Organizational Research Methods ◽

10.1177/1094428109350930 ◽

2009 ◽

Vol 14 (1) ◽

pp. 10-23 ◽

Cited By ~ 15

Author(s):

David M. LaHuis ◽

Patrick Clark ◽

Erin O'Brien

Keyword(s):

Item Response Theory ◽

Item Response ◽

Response Model ◽

Fit Indices ◽

Graded Response Model ◽

Response Theory ◽

Item Fit ◽

Graded Response

Download Full-text

Online Calibration of Polytomous Items Under the Graded Response Model

Frontiers in Psychology ◽

10.3389/fpsyg.2019.03085 ◽

2020 ◽

Vol 10 ◽

Author(s):

Jianhua Xiong ◽

Shuliang Ding ◽

Fen Luo ◽

Zhaosheng Luo

Keyword(s):

Response Model ◽

Graded Response Model ◽

Polytomous Items ◽

Online Calibration ◽

Graded Response

Download Full-text

An IRT Analysis of the Personal Optimism Scale

European Journal of Psychological Assessment ◽

10.1027/1015-5759.24.1.49 ◽

2008 ◽

Vol 24 (1) ◽

pp. 49-56 ◽

Cited By ~ 16

Author(s):

Wolfgang A. Rauch ◽

Karl Schweizer ◽

Helfried Moosbrugger

Keyword(s):

Item Response ◽

Data Model ◽

Latent Trait ◽

Model Fit ◽

Response Functions ◽

Graded Response Model ◽

Irt Model ◽

Item Parameters ◽

Graded Response ◽

Personal Optimism

Abstract. In this study the psychometric properties of the Personal Optimism scale of the POSO-E questionnaire ( Schweizer & Koch, 2001 ) for the assessment of dispositional optimism are evaluated by applying Samejima's (1969) graded response model, a parametric item response theory (IRT) model for polytomous data. Model fit is extensively evaluated via fit checks on the lower-order margins of the contingency table of observed and expected responses and visual checks of fit plots comparing observed and expected category response functions. The model proves appropriate for the data; a small amount of misfit is interpreted in terms of previous research using other measures for optimism. Item parameters and information functions show that optimism can be measured accurately, especially at moderately low to middle levels of the latent trait scale, and particularly by the negatively worded items.

Download Full-text

Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory

Journal of Speech Language and Hearing Research ◽

10.1044/2015_jslhr-h-15-0156 ◽

2016 ◽

Vol 59 (2) ◽

pp. 373-383 ◽

Cited By ~ 14

Author(s):

J. Mirjam Boeschen Hospers ◽

Niels Smits ◽

Cas Smits ◽

Mariska Stam ◽

Caroline B. Terwee ◽

...

Keyword(s):

Item Response Theory ◽

Item Response ◽

Standard Error ◽

Item Information ◽

Response Model ◽

Graded Response Model ◽

Response Theory ◽

Hearing Disability ◽

Hearing Ability ◽

Graded Response

Purpose We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18–70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study “Netherlands Longitudinal Study on Hearing.” A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. Results The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. Conclusions This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.

Download Full-text

Optimal Examination Sites for Periodontal Disease Evaluation: Applying the Item Response Theory Graded Response Model

Journal of Clinical Medicine ◽

10.3390/jcm9113754 ◽

2020 ◽

Vol 9 (11) ◽

pp. 3754

Author(s):

Yoshiaki Nomura ◽

Toshiya Morozumi ◽

Mitsuo Fukuda ◽

Nobuhiro Hanada ◽

Erika Kakuta ◽

...

Keyword(s):

Item Response Theory ◽

Item Response ◽

Complex Structure ◽

Response Model ◽

Graded Response Model ◽

Response Theory ◽

Graded Response ◽

Periodontal Index ◽

Selection Of

Periodontal examination data have a complex structure. For epidemiological studies, mass screenings, and public health use, a simple index that represents the periodontal condition is necessary. Periodontal indices for partial examination of selected teeth have been developed. However, the selected teeth vary between indices, and a justification for the selection of examination teeth has not been presented. We applied a graded response model based on the item response theory to select optimal examination teeth and sites that represent periodontal conditions. Data were obtained from 254 patients who participated in a multicenter follow-up study. Baseline data were obtained from initial follow-up. Optimal examination sites were selected using item information calculated by graded response modeling. Twelve sites—maxillary 2nd premolar (palatal-medial), 1st premolar (palatal-distal), canine (palatal-medial), lateral incisor (palatal-central), central incisor (palatal-distal) and mandibular 1st premolar (lingual, medial)—were selected. Mean values for clinical attachment level, probing pocket depth, and bleeding on probing by full mouth examinations were used for objective variables. Measuring the clinical parameters of these sites can predict the results of full mouth examination. For calculating the periodontal index by partial oral examination, a justification for the selection of examination sites is essential. This study presents an evidence-based partial examination methodology and its modeling.

Download Full-text

Performance of the S−χ2 Statistic for the Multidimensional Graded Response Model

Educational and Psychological Measurement ◽

10.1177/0013164420958060 ◽

2020 ◽

pp. 001316442095806

Author(s):

Shiyang Su ◽

Chun Wang ◽

David J. Weiss

Keyword(s):

Sample Size ◽

Small Sample Size ◽

Model Misspecification ◽

Small Sample ◽

Small Subset ◽

Response Model ◽

Graded Response Model ◽

Polytomous Items ◽

Item Fit ◽

Graded Response

[Formula: see text] is a popular item fit index that is available in commercial software packages such as flexMIRT. However, no research has systematically examined the performance of [Formula: see text] for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was to evaluate the performance of [Formula: see text] under two practical misfit scenarios: first, all items are misfitting due to model misspecification, and second, a small subset of items violate the underlying assumptions of the MGRM. Simulation studies showed that caution should be exercised when reporting item fit results of polytomous items using [Formula: see text] within the context of the MGRM, because of its inflated false positive rates (FPRs), especially with a small sample size and a long test. [Formula: see text] performed well when detecting overall model misfit as well as item misfit for a small subset of items when the ordinality assumption was violated. However, under a number of conditions of model misspecification or items violating the homogeneous discrimination assumption, even though true positive rates (TPRs) of [Formula: see text] were high when a small sample size was coupled with a long test, the inflated FPRs were generally directly related to increasing TPRs. There was also a suggestion that performance of [Formula: see text] was affected by the magnitude of misfit within an item. There was no evidence that FPRs for fitting items were exacerbated by the presence of a small percentage of misfitting items among them.

Download Full-text

Use of item response theory to investigate disability-related questions in the National Health and Nutrition Examination Survey

SAGE Open Medicine ◽

10.1177/20503121211012253 ◽

2021 ◽

Vol 9 ◽

pp. 205031212110122

Author(s):

Samuel W Terman ◽

James F Burke

Keyword(s):

Item Response Theory ◽

Item Response ◽

National Health ◽

Activity Limitations ◽

Response Model ◽

Nutrition Examination Survey ◽

Graded Response Model ◽

Health And Nutrition ◽

Graded Response ◽

Two Parameter

Objectives: Accurately measuring disability is critical toward policy development, economic analyses, and determining individual-level effects of health interventions. Nationally representative population surveys such as the National Health and Nutrition Examination Survey provide key opportunities to measure disability constructs such as activity limitations. However, only very limited work has previously evaluated the item response properties of questions pertaining to limitations in National Health and Nutrition Examination Survey. Methods: This was a cross-sectional study. We included participants ⩾20 years old for the 2013–2018 National Health and Nutrition Examination Survey cycles. Activity limitations, and a smaller number of body function impairments or participation restrictions, were determined from interview questions. We performed item response theory models (a two-parameter logistic and a graded response model) to characterize discriminating information along the latent continuum of activity limitation. Results: We included 17,057 participants. Although each particular limitation was somewhat rare (maximally 13%), 7214 (38%) responded having at least one limitation. We found a high amount of discriminating information at 1–2 standard deviations above average limitation, though essentially zero discrimination below that range. Items had substantial overlap in the range at which they provided information distinguishing individuals. The ordinal graded response model including 20 limitations provided greater information than the dichotomous two-parameter logistic model, though further omitting items from the graded response model led to loss of information. Conclusion: National Health and Nutrition Examination Survey disability-related questions, mostly specifically activity limitations, provided a high degree of information distinguishing individuals with higher than average limitations on the latent continuum, but essentially zero resolution to distinguish individuals with low or average limitations. Future work may focus on developing items which better distinguish individuals at the “lower” end of the limitation spectrum.

Download Full-text