scholarly journals Matching IRT Models to Patient-Reported Outcomes Constructs: The Graded Response and Log-Logistic Models for Scaling Depression

Psychometrika ◽  
2021 ◽  
Author(s):  
Steven P. Reise ◽  
Han Du ◽  
Emily F. Wong ◽  
Anne S. Hubbard ◽  
Mark G. Haviland

AbstractItem response theory (IRT) model applications extend well beyond cognitive ability testing, and various patient-reported outcomes (PRO) measures are among the more prominent examples. PRO (and like) constructs differ from cognitive ability constructs in many ways, and these differences have model fitting implications. With a few notable exceptions, however, most IRT applications to PRO constructs rely on traditional IRT models, such as the graded response model. We review some notable differences between cognitive and PRO constructs and how these differences can present challenges for traditional IRT model applications. We then apply two models (the traditional graded response model and an alternative log-logistic model) to depression measure data drawn from the Patient-Reported Outcomes Measurement Information System project. We do not claim that one model is “a better fit” or more “valid” than the other; rather, we show that the log-logistic model may be more consistent with the construct of depression as a unipolar phenomenon. Clearly, the graded response and log-logistic models can lead to different conclusions about the psychometrics of an instrument and the scaling of individual differences. We underscore, too, that, in general, explorations of which model may be more appropriate cannot be decided only by fit index comparisons; these decisions may require the integration of psychometrics with theory and research findings on the construct of interest.

2017 ◽  
Vol 78 (3) ◽  
pp. 384-408 ◽  
Author(s):  
Yong Luo ◽  
Hong Jiao

Stan is a new Bayesian statistical software program that implements the powerful and efficient Hamiltonian Monte Carlo (HMC) algorithm. To date there is not a source that systematically provides Stan code for various item response theory (IRT) models. This article provides Stan code for three representative IRT models, including the three-parameter logistic IRT model, the graded response model, and the nominal response model. We demonstrate how IRT model comparison can be conducted with Stan and how the provided Stan code for simple IRT models can be easily extended to their multidimensional and multilevel cases.


Author(s):  
Cai Xu ◽  
Mark V. Schaverien ◽  
Joani M. Christensen ◽  
Chris J. Sidey-Gibbons

Abstract Purpose This study aimed to evaluate and improve the accuracy and efficiency of the QuickDASH for use in assessment of limb function in patients with upper extremity lymphedema using modern psychometric techniques. Method We conducted confirmative factor analysis (CFA) and Mokken analysis to examine the assumption of unidimensionality for IRT model on data from 285 patients who completed the QuickDASH, and then fit the data to Samejima’s graded response model (GRM) and assessed the assumption of local independence of items and calibrated the item responses for CAT simulation. Results Initial CFA and Mokken analyses demonstrated good scalability of items and unidimensionality. However, the local independence of items assumption was violated between items 9 (severity of pain) and 11 (sleeping difficulty due to pain) (Yen’s Q3 = 0.46) and disordered thresholds were evident for item 5 (cutting food). After addressing these breaches of assumptions, the re-analyzed GRM with the remaining 10 items achieved an improved fit. Simulation of CAT administration demonstrated a high correlation between scores on the CAT and the QuickDash (r = 0.98). Items 2 (doing heavy chores) and 8 (limiting work or daily activities) were the most frequently used. The correlation among factor scores derived from the QuickDASH version with 11 items and the Ultra-QuickDASH version with items 2 and 8 was as high as 0.91. Conclusion By administering just these two best performing QuickDash items we can obtain estimates that are very similar to those obtained from the full-length QuickDash without the need for CAT technology.


2022 ◽  
pp. 001316442110634
Author(s):  
Patrick D. Manapat ◽  
Michael C. Edwards

When fitting unidimensional item response theory (IRT) models, the population distribution of the latent trait (θ) is often assumed to be normally distributed. However, some psychological theories would suggest a nonnormal θ. For example, some clinical traits (e.g., alcoholism, depression) are believed to follow a positively skewed distribution where the construct is low for most people, medium for some, and high for few. Failure to account for nonnormality may compromise the validity of inferences and conclusions. Although corrections have been developed to account for nonnormality, these methods can be computationally intensive and have not yet been widely adopted. Previous research has recommended implementing nonnormality corrections when θ is not “approximately normal.” This research focused on examining how far θ can deviate from normal before the normality assumption becomes untenable. Specifically, our goal was to identify the type(s) and degree(s) of nonnormality that result in unacceptable parameter recovery for the graded response model (GRM) and 2-parameter logistic model (2PLM).


2013 ◽  
Vol 22 (2) ◽  
pp. 252-262 ◽  
Author(s):  
Michelene Chenault ◽  
Martijn Berger ◽  
Bernd Kremer ◽  
Lucien Anteunis

Purpose The purpose of this study was to improve the effectiveness of adult hearing screens and demonstrate that interventions assessment methods are needed that address the individual's experienced hearing. Item response theory, which provides a methodology for assessing patient-reported outcomes, is examined here to demonstrate its usefulness in hearing screens and interventions. Method The graded response model is applied to a scale of 11 items assessing perceived hearing functioning and 10 items assessing experienced social limitations completed by a sample of 212 persons age 55+ years. Fixed and variable slope models are compared. Discrimination and threshold parameters are estimated and information functions evaluated. Results Variable slope models for both scales provided the best fit. The estimated discrimination parameters for all items except for one in each scale were good if not excellent (1.5–3.4). Threshold values varied, demonstrating the complementary and supplementary value of items within a scale. The information provided by each item varies relative to trait values so that each scale of items provides information over a wider range of trait values. Conclusion Item response theory methodology facilitates the comparison of items relative to their discriminative ability and information provided and thus provides a basis for the selection of items for application in a screening setting.


2009 ◽  
Vol 40 (11) ◽  
pp. 1212-1220 ◽  
Author(s):  
Zhao-Sheng LUO ◽  
Xue-Lian OUYANG ◽  
Shu-Qing QI ◽  
Hai-Qi DAI ◽  
Shu-Liang DING

Sign in / Sign up

Export Citation Format

Share Document