scholarly journals A Note on the D-Scoring Method Adapted for Polytomous Test Items

2018 ◽  
Vol 79 (3) ◽  
pp. 545-557 ◽  
Author(s):  
Dimiter M. Dimitrov ◽  
Yong Luo

An approach to scoring tests with binary items, referred to as D-scoring method, was previously developed as a classical analog to basic models in item response theory (IRT) for binary items. As some tests include polytomous items, this study offers an approach to D-scoring of such items and parallels the results with those obtained under the graded response model (GRM) for ordered polytomous items in the framework of IRT. The proposed design of using D-scoring with “virtual” binary items generated from polytomous items provides (a) ability scores that are consistent with their GRM counterparts and (b) item category response functions analogous to those obtained under the GRM. This approach provides a unified framework for D-scoring and psychometric analysis of tests with binary and/or polytomous items that can be efficient in different scenarios of educational and psychological assessment.

Author(s):  
Rabeeah M. Alsaqri ◽  
Mohsen N. Al Salmi

The study aimed to calibrate Oman data of the PIRLS test using the graded response model and to examine the psychometric properties of it, as well as identify the fit and unfit of its items. PIRLS2011 test booklets were used, which consisted of 146 test items (74 dichotomous and 72 polytomous). Items were divided into 13 booklets; each with two blocks (one literary and one informational). PIRLS test booklets were administered to 13 groups of fourth grade students in Sultanate of Oman with a total sample of 10394 students. Assumptions of IRT (unidimensionality and local independence) were examined and supported. Also, item fit was examined and supported using Samejima’s graded response model. The data was analyzed by Multilog7.03 program to estimate both item and ability parameters. Results indicated that the assumptions of IRT were proved. Also, IRT analysis revealed that 8 items showed unfit which represents only 5% of the test items. So, this result confirms that the test has good psychometric properties under the IRT.


Author(s):  
Amal K. Al-zaabi ◽  
Abdulhameed Hassan ◽  
Rashid S. Al-mehrzi

The study aimed to calibrate Oman data of the PIRLS test using the graded response model and to examine the psychometric properties of it, as well as identify the fit and unfit of its items. PIRLS2011 test booklets were used, which consisted of 146 test items (74 dichotomous and 72 polytomous). Items were divided into 13 booklets; each with two blocks (one literary and one informational). PIRLS test booklets were administered to 13 groups of fourth grade students in Sultanate of Oman with a total sample of 10394 students. Assumptions of IRT (unidimensionality and local independence) were examined and supported. Also, item fit was examined and supported using Samejima’s graded response model. The data was analyzed by Multilog7.03 program to estimate both item and ability parameters. Results indicated that the assumptions of IRT were proved. Also, IRT analysis revealed that 8 items showed unfit which represents only 5% of the test items. So, this result confirms that the test has good psychometric properties under the IRT.


2020 ◽  
Vol 44 (6) ◽  
pp. 465-481
Author(s):  
Carl F. Falk

We present a monotonic polynomial graded response (GRMP) model that subsumes the unidimensional graded response model for ordered categorical responses and results in flexible category response functions. We suggest improvements in the parameterization of the polynomial underlying similar models, expand upon an underlying response variable derivation of the model, and in lieu of an overall discrimination parameter we propose an index to aid in interpreting the strength of relationship between the latent variable and underlying item responses. In applications, the GRMP is compared to two approaches: (a) a previously developed monotonic polynomial generalized partial credit (GPCMP) model; and (b) logistic and probit variants of the heteroscedastic graded response (HGR) model that we estimate using maximum marginal likelihood with the expectation–maximization algorithm. Results suggest that the GRMP can fit real data better than the GPCMP and the probit variant of the HGR, but is slightly outperformed by the logistic HGR. Two simulation studies compared the ability of the GRMP and logistic HGR to recover category response functions. While the GRMP showed some ability to recover HGR response functions and those based on kernel smoothing, the HGR was more specific in the types of response functions it could recover. In general, the GRMP and HGR make different assumptions regarding the underlying response variables, and can result in different category response function shapes.


2020 ◽  
Vol 10 ◽  
Author(s):  
Jianhua Xiong ◽  
Shuliang Ding ◽  
Fen Luo ◽  
Zhaosheng Luo

2008 ◽  
Vol 24 (1) ◽  
pp. 49-56 ◽  
Author(s):  
Wolfgang A. Rauch ◽  
Karl Schweizer ◽  
Helfried Moosbrugger

Abstract. In this study the psychometric properties of the Personal Optimism scale of the POSO-E questionnaire ( Schweizer & Koch, 2001 ) for the assessment of dispositional optimism are evaluated by applying Samejima's (1969) graded response model, a parametric item response theory (IRT) model for polytomous data. Model fit is extensively evaluated via fit checks on the lower-order margins of the contingency table of observed and expected responses and visual checks of fit plots comparing observed and expected category response functions. The model proves appropriate for the data; a small amount of misfit is interpreted in terms of previous research using other measures for optimism. Item parameters and information functions show that optimism can be measured accurately, especially at moderately low to middle levels of the latent trait scale, and particularly by the negatively worded items.


2016 ◽  
Vol 59 (2) ◽  
pp. 373-383 ◽  
Author(s):  
J. Mirjam Boeschen Hospers ◽  
Niels Smits ◽  
Cas Smits ◽  
Mariska Stam ◽  
Caroline B. Terwee ◽  
...  

Purpose We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18–70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study “Netherlands Longitudinal Study on Hearing.” A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. Results The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. Conclusions This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.


2020 ◽  
Vol 9 (11) ◽  
pp. 3754
Author(s):  
Yoshiaki Nomura ◽  
Toshiya Morozumi ◽  
Mitsuo Fukuda ◽  
Nobuhiro Hanada ◽  
Erika Kakuta ◽  
...  

Periodontal examination data have a complex structure. For epidemiological studies, mass screenings, and public health use, a simple index that represents the periodontal condition is necessary. Periodontal indices for partial examination of selected teeth have been developed. However, the selected teeth vary between indices, and a justification for the selection of examination teeth has not been presented. We applied a graded response model based on the item response theory to select optimal examination teeth and sites that represent periodontal conditions. Data were obtained from 254 patients who participated in a multicenter follow-up study. Baseline data were obtained from initial follow-up. Optimal examination sites were selected using item information calculated by graded response modeling. Twelve sites—maxillary 2nd premolar (palatal-medial), 1st premolar (palatal-distal), canine (palatal-medial), lateral incisor (palatal-central), central incisor (palatal-distal) and mandibular 1st premolar (lingual, medial)—were selected. Mean values for clinical attachment level, probing pocket depth, and bleeding on probing by full mouth examinations were used for objective variables. Measuring the clinical parameters of these sites can predict the results of full mouth examination. For calculating the periodontal index by partial oral examination, a justification for the selection of examination sites is essential. This study presents an evidence-based partial examination methodology and its modeling.


2020 ◽  
pp. 001316442095806
Author(s):  
Shiyang Su ◽  
Chun Wang ◽  
David J. Weiss

[Formula: see text] is a popular item fit index that is available in commercial software packages such as flexMIRT. However, no research has systematically examined the performance of [Formula: see text] for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was to evaluate the performance of [Formula: see text] under two practical misfit scenarios: first, all items are misfitting due to model misspecification, and second, a small subset of items violate the underlying assumptions of the MGRM. Simulation studies showed that caution should be exercised when reporting item fit results of polytomous items using [Formula: see text] within the context of the MGRM, because of its inflated false positive rates (FPRs), especially with a small sample size and a long test. [Formula: see text] performed well when detecting overall model misfit as well as item misfit for a small subset of items when the ordinality assumption was violated. However, under a number of conditions of model misspecification or items violating the homogeneous discrimination assumption, even though true positive rates (TPRs) of [Formula: see text] were high when a small sample size was coupled with a long test, the inflated FPRs were generally directly related to increasing TPRs. There was also a suggestion that performance of [Formula: see text] was affected by the magnitude of misfit within an item. There was no evidence that FPRs for fitting items were exacerbated by the presence of a small percentage of misfitting items among them.


2021 ◽  
Vol 9 ◽  
pp. 205031212110122
Author(s):  
Samuel W Terman ◽  
James F Burke

Objectives: Accurately measuring disability is critical toward policy development, economic analyses, and determining individual-level effects of health interventions. Nationally representative population surveys such as the National Health and Nutrition Examination Survey provide key opportunities to measure disability constructs such as activity limitations. However, only very limited work has previously evaluated the item response properties of questions pertaining to limitations in National Health and Nutrition Examination Survey. Methods: This was a cross-sectional study. We included participants ⩾20 years old for the 2013–2018 National Health and Nutrition Examination Survey cycles. Activity limitations, and a smaller number of body function impairments or participation restrictions, were determined from interview questions. We performed item response theory models (a two-parameter logistic and a graded response model) to characterize discriminating information along the latent continuum of activity limitation. Results: We included 17,057 participants. Although each particular limitation was somewhat rare (maximally 13%), 7214 (38%) responded having at least one limitation. We found a high amount of discriminating information at 1–2 standard deviations above average limitation, though essentially zero discrimination below that range. Items had substantial overlap in the range at which they provided information distinguishing individuals. The ordinal graded response model including 20 limitations provided greater information than the dichotomous two-parameter logistic model, though further omitting items from the graded response model led to loss of information. Conclusion: National Health and Nutrition Examination Survey disability-related questions, mostly specifically activity limitations, provided a high degree of information distinguishing individuals with higher than average limitations on the latent continuum, but essentially zero resolution to distinguish individuals with low or average limitations. Future work may focus on developing items which better distinguish individuals at the “lower” end of the limitation spectrum.


Sign in / Sign up

Export Citation Format

Share Document