Matching IRT Models to Patient-Reported Outcomes Constructs: The Graded Response and Log-Logistic Models for Scaling Depression

Psychometrika ◽

10.1007/s11336-021-09802-0 ◽

2021 ◽

Author(s):

Steven P. Reise ◽

Han Du ◽

Emily F. Wong ◽

Anne S. Hubbard ◽

Mark G. Haviland

Keyword(s):

Cognitive Ability ◽

Logistic Model ◽

Patient Reported Outcomes ◽

Logistic Models ◽

Response Model ◽

Graded Response Model ◽

Irt Model ◽

Irt Models ◽

Graded Response ◽

Patient Reported

AbstractItem response theory (IRT) model applications extend well beyond cognitive ability testing, and various patient-reported outcomes (PRO) measures are among the more prominent examples. PRO (and like) constructs differ from cognitive ability constructs in many ways, and these differences have model fitting implications. With a few notable exceptions, however, most IRT applications to PRO constructs rely on traditional IRT models, such as the graded response model. We review some notable differences between cognitive and PRO constructs and how these differences can present challenges for traditional IRT model applications. We then apply two models (the traditional graded response model and an alternative log-logistic model) to depression measure data drawn from the Patient-Reported Outcomes Measurement Information System project. We do not claim that one model is “a better fit” or more “valid” than the other; rather, we show that the log-logistic model may be more consistent with the construct of depression as a unipolar phenomenon. Clearly, the graded response and log-logistic models can lead to different conclusions about the psychometrics of an instrument and the scaling of individual differences. We underscore, too, that, in general, explorations of which model may be more appropriate cannot be decided only by fit index comparisons; these decisions may require the integration of psychometrics with theory and research findings on the construct of interest.

Download Full-text

Power and Sample Size Calculations in Clinical Trials with Patient-Reported Outcomes under Equal and Unequal Group Sizes Based on Graded Response Model: A Simulation Study

Value in Health ◽

10.1016/j.jval.2016.03.1857 ◽

2016 ◽

Vol 19 (5) ◽

pp. 639-647 ◽

Cited By ~ 5

Author(s):

Marziyeh Doostfatemeh ◽

Seyyed Mohammad Taghi Ayatollah ◽

Peyman Jafari

Keyword(s):

Clinical Trials ◽

Sample Size ◽

Simulation Study ◽

Patient Reported Outcomes ◽

Response Model ◽

Graded Response Model ◽

Graded Response ◽

Patient Reported ◽

Sample Size Calculations

Download Full-text

Using the Stan Program for Bayesian Item Response Theory

Educational and Psychological Measurement ◽

10.1177/0013164417693666 ◽

2017 ◽

Vol 78 (3) ◽

pp. 384-408 ◽

Cited By ~ 19

Author(s):

Yong Luo ◽

Hong Jiao

Keyword(s):

Item Response Theory ◽

Item Response ◽

Model Comparison ◽

Response Model ◽

Graded Response Model ◽

Response Theory ◽

Irt Model ◽

Irt Models ◽

Nominal Response Model ◽

Graded Response

Stan is a new Bayesian statistical software program that implements the powerful and efficient Hamiltonian Monte Carlo (HMC) algorithm. To date there is not a source that systematically provides Stan code for various item response theory (IRT) models. This article provides Stan code for three representative IRT models, including the three-parameter logistic IRT model, the graded response model, and the nominal response model. We demonstrate how IRT model comparison can be conducted with Stan and how the provided Stan code for simple IRT models can be easily extended to their multidimensional and multilevel cases.

Download Full-text

Efficient and precise Ultra-QuickDASH scale measuring lymphedema impact developed using computerized adaptive testing

Quality of Life Research ◽

10.1007/s11136-021-02979-y ◽

2021 ◽

Author(s):

Cai Xu ◽

Mark V. Schaverien ◽

Joani M. Christensen ◽

Chris J. Sidey-Gibbons

Keyword(s):

Factor Analysis ◽

Computerized Adaptive Testing ◽

Daily Activities ◽

Response Model ◽

Factor Scores ◽

Graded Response Model ◽

Local Independence ◽

Irt Model ◽

Graded Response ◽

Item Responses

Abstract Purpose This study aimed to evaluate and improve the accuracy and efficiency of the QuickDASH for use in assessment of limb function in patients with upper extremity lymphedema using modern psychometric techniques. Method We conducted confirmative factor analysis (CFA) and Mokken analysis to examine the assumption of unidimensionality for IRT model on data from 285 patients who completed the QuickDASH, and then fit the data to Samejima’s graded response model (GRM) and assessed the assumption of local independence of items and calibrated the item responses for CAT simulation. Results Initial CFA and Mokken analyses demonstrated good scalability of items and unidimensionality. However, the local independence of items assumption was violated between items 9 (severity of pain) and 11 (sleeping difficulty due to pain) (Yen’s Q3 = 0.46) and disordered thresholds were evident for item 5 (cutting food). After addressing these breaches of assumptions, the re-analyzed GRM with the remaining 10 items achieved an improved fit. Simulation of CAT administration demonstrated a high correlation between scores on the CAT and the QuickDash (r = 0.98). Items 2 (doing heavy chores) and 8 (limiting work or daily activities) were the most frequently used. The correlation among factor scores derived from the QuickDASH version with 11 items and the Ultra-QuickDASH version with items 2 and 8 was as high as 0.91. Conclusion By administering just these two best performing QuickDash items we can obtain estimates that are very similar to those obtained from the full-length QuickDash without the need for CAT technology.

Download Full-text

Examining the Robustness of the Graded Response and 2-Parameter Logistic Models to Violations of Construct Normality

Educational and Psychological Measurement ◽

10.1177/00131644211063453 ◽

2022 ◽

pp. 001316442110634

Author(s):

Patrick D. Manapat ◽

Michael C. Edwards

Keyword(s):

Logistic Model ◽

Population Distribution ◽

Latent Trait ◽

Logistic Models ◽

Skewed Distribution ◽

Parameter Recovery ◽

Irt Models ◽

Psychological Theories ◽

Graded Response ◽

Computationally Intensive

When fitting unidimensional item response theory (IRT) models, the population distribution of the latent trait (θ) is often assumed to be normally distributed. However, some psychological theories would suggest a nonnormal θ. For example, some clinical traits (e.g., alcoholism, depression) are believed to follow a positively skewed distribution where the construct is low for most people, medium for some, and high for few. Failure to account for nonnormality may compromise the validity of inferences and conclusions. Although corrections have been developed to account for nonnormality, these methods can be computationally intensive and have not yet been widely adopted. Previous research has recommended implementing nonnormality corrections when θ is not “approximately normal.” This research focused on examining how far θ can deviate from normal before the normality assumption becomes untenable. Specifically, our goal was to identify the type(s) and degree(s) of nonnormality that result in unacceptable parameter recovery for the graded response model (GRM) and 2-parameter logistic model (2PLM).

Download Full-text

Quantification of Experienced Hearing Problems With Item Response Theory

American Journal of Audiology ◽

10.1044/1059-0889(2013/12-0038) ◽

2013 ◽

Vol 22 (2) ◽

pp. 252-262 ◽

Cited By ~ 7

Author(s):

Michelene Chenault ◽

Martijn Berger ◽

Bernd Kremer ◽

Lucien Anteunis

Keyword(s):

Item Response Theory ◽

Item Response ◽

Patient Reported Outcomes ◽

Graded Response Model ◽

Response Theory ◽

Discriminative Ability ◽

Graded Response ◽

Patient Reported ◽

Best Fit ◽

Selection Of

Purpose The purpose of this study was to improve the effectiveness of adult hearing screens and demonstrate that interventions assessment methods are needed that address the individual's experienced hearing. Item response theory, which provides a methodology for assessing patient-reported outcomes, is examined here to demonstrate its usefulness in hearing screens and interventions. Method The graded response model is applied to a scale of 11 items assessing perceived hearing functioning and 10 items assessing experienced social limitations completed by a sample of 212 persons age 55+ years. Fixed and variable slope models are compared. Discrimination and threshold parameters are estimated and information functions evaluated. Results Variable slope models for both scales provided the best fit. The estimated discrimination parameters for all items except for one in each scale were good if not excellent (1.5–3.4). Threshold values varied, demonstrating the complementary and supplementary value of items within a scale. The information provided by each item varies relative to trait values so that each scale of items provides information over a wider range of trait values. Conclusion Item response theory methodology facilitates the comparison of items relative to their discriminative ability and information provided and thus provides a basis for the selection of items for application in a screening setting.

Download Full-text

Common-Item Scale-Linking Methods for the Bi-factor Graded Response Model in MIRT

Korean Society for Educational Evaluation ◽

10.31158/jeev.2019.32.4.549 ◽

2019 ◽

Vol 32 (4) ◽

pp. 549-573

Author(s):

Seonghoon Kim

Keyword(s):

Response Model ◽

Graded Response Model ◽

Common Item ◽

Graded Response ◽

Scale Linking ◽

Item Scale

Download Full-text

IRT Information Function of Polytomously Scored Items under the Graded Response Model

Acta Psychologica Sinica ◽

10.3724/sp.j.1041.2008.01212 ◽

2009 ◽

Vol 40 (11) ◽

pp. 1212-1220 ◽

Cited By ~ 1

Author(s):

Zhao-Sheng LUO ◽

Xue-Lian OUYANG ◽

Shu-Qing QI ◽

Hai-Qi DAI ◽

Shu-Liang DING

Keyword(s):

Response Model ◽

Graded Response Model ◽

Information Function ◽

Graded Response

Download Full-text

Dynamic and Comprehensive Item Selection Strategies for Computerized Adaptive Testing Based on Graded Response Model

Acta Psychologica Sinica ◽

10.3724/sp.j.1041.2012.00400 ◽

2013 ◽

Vol 44 (3) ◽

pp. 400-412 ◽

Cited By ~ 1

Author(s):

Fen LUO ◽

Shu-Liang DING ◽

Xiao-Qing WANG

Keyword(s):

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Item Selection ◽

Response Model ◽

Graded Response Model ◽

Selection Strategies ◽

Graded Response

Download Full-text

A Polytomous Extension of Rule Space Method Based on Graded Response Model

Acta Psychologica Sinica ◽

10.3724/sp.j.1041.2012.00249 ◽

2013 ◽

Vol 44 (2) ◽

pp. 249-262

Author(s):

Wei TIAN ◽

Tao XIN

Keyword(s):

Response Model ◽

Graded Response Model ◽

Rule Space ◽

Graded Response ◽

Space Method

Download Full-text

Acceleration model in the heterogeneous case of the general graded response model

Psychometrika ◽

10.1007/bf02294328 ◽

1995 ◽

Vol 60 (4) ◽

pp. 549-572 ◽

Cited By ~ 37

Author(s):

Fumiko Samejima

Keyword(s):

Response Model ◽

Graded Response Model ◽

Graded Response ◽

Acceleration Model

Download Full-text