Solving the measurement invariance anchor item problem in item response theory.

The National Children's Study (NCS) statistics and item response theory group was tasked with promoting the quality of study measures and analysis. This paper provides an overview of six measurement and statistical considerations for the NCS: (1) Conceptual and Measurement Model; (2) Reliability; (3) Validity; (4) Measurement Invariance; (5) Interpretability of Scores; and (6) Burden of administration. The guidance was based primarily on recommendations of the International Society of Quality of Life Research.

Download Full-text

Scale length does matter: Recommendations for Measurement Invariance Testing with Categorical Factor Analysis and Item Response Theory Approaches

10.31234/osf.io/udbna ◽

2020 ◽

Author(s):

E. Damiano D'Urso ◽

Kim De Roover ◽

Jeroen K. Vermunt ◽

Jesper Tijmstra

Keyword(s):

Factor Analysis ◽

Item Response Theory ◽

Measurement Invariance ◽

Item Response ◽

Simulation Studies ◽

Response Theory ◽

Multiple Group ◽

Item Level ◽

Positive Rate ◽

Scale Length

In social sciences, the study of group differences concerning latent constructs is ubiquitous. These constructs are generally measured by means of scales composed of ordinal items. In order to compare these constructs across groups, one crucial requirement is that they are measured equivalently or, in technical jargon, that measurement invariance holds across the groups. This study compared the performance of multiple group categorical confirmatory factor analysis (MG-CCFA) and multiple group item response theory (MG-IRT) in testing measurement invariance with ordinal data. A simulation study was conducted to compare the true positive rate (TPR) and false positive rate (FPR) both at the scale and at the item level for these two approaches under an invariance and a non-invariance scenario. The results of the simulation studies showed that the performance, in terms of the TPR, of MG-CCFA- and MG-IRT-based approaches mostly depends on the scale length. In fact, for long scales, the likelihood ratio test (LRT) approach, for MG-IRT, outperformed the other approaches, while, for short scales, MG-CCFA seemed to be generally preferable. In addition, the performance of MG-CCFA's fit measures, such as RMSEA and CFI, seemed to depend largely on the length of the scale, especially when MI was tested at the item level. General caution is recommended when using these measures, especially when MI is tested for each item individually. A decision flowchart, based on the results of the simulation studies, is provided to help summarizing the results and providing indications on which approach performed best and in which setting.

Download Full-text

Warwick Edinburgh Mental Well-Being Scale (WEMWBS): Measurement Invariance Across Genders And Item Response Theory Examination

10.21203/rs.3.rs-857946/v1 ◽

2021 ◽

Author(s):

Joshua Marmara ◽

Daniel Zarate ◽

Jeremy Vassallo ◽

Rhiannon Patten ◽

Vasileios Stavropoulos

Keyword(s):

Item Response Theory ◽

Measurement Invariance ◽

Item Response ◽

Community Sample ◽

Well Being ◽

The United States ◽

Subjective Well Being ◽

Response Theory ◽

Males And Females ◽

Mental Well Being

Abstract Background: The Warwick Edinburgh Mental Well-Being Scale (WEMWBS) is a measure of subjective well-being and assesses eudemonic and hedonic aspects of well-being. However, differential scoring of the WEMWBS across gender and its precision of measurement has not been examined. The present study assesses the psychometric properties of the WEMWBS using Measurement Invariance (MI) between males and females and Item Response Theory (IRT) analyses. Method: A community sample of 386 adults from the United States of America (USA), United Kingdom, Ireland, Australia, New Zealand, and Canada were assessed online (N = 394, 54.8% men, 43.1% women, Mage = 27.48, SD = 5.57). Results: MI analyses observed invariance across males and females at the configural level and metric level but non-invariance at the scalar level. The graded response model conducted to observe item properties indicated that all items demonstrated, although variable, sufficient discrimination capacity.Conclusions: Gender comparisons based on WEMWBS scores should be cautiously interpreted for specific items that demonstrate different scalar scales and similar scores indicate different severity. The items showed increased reliability for latent levels of ∓ 2 SD from the mean level of SWB. The WEMWBS may also not perform well for clinically low and high levels of SWB. Including assessments for clinical cases may optimise the use of the WEMWBS.

Download Full-text

Testing Measurement Invariance Using Item Response Theory in Longitudinal Data: An Introduction

Child Development Perspectives ◽

10.1111/j.1750-8606.2009.00109.x ◽

2010 ◽

Vol 4 (1) ◽

pp. 5-9 ◽

Cited By ~ 48

Author(s):

Roger E. Millsap

Keyword(s):

Item Response Theory ◽

Longitudinal Data ◽

Measurement Invariance ◽

Item Response ◽

Response Theory

Download Full-text

Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance.

Psychological Bulletin ◽

10.1037/0033-2909.114.3.552 ◽

1993 ◽

Vol 114 (3) ◽

pp. 552-566 ◽

Cited By ~ 656

Author(s):

Steven P. Reise ◽

Keith F. Widaman ◽

Robin H. Pugh

Keyword(s):

Factor Analysis ◽

Item Response Theory ◽

Confirmatory Factor Analysis ◽

Measurement Invariance ◽

Item Response ◽

Response Theory ◽

Confirmatory Factor

Download Full-text

Measurement Invariance and Item Response Theory Analysis of the Taylor Aggression Paradigm

10.31234/osf.io/8937u ◽

2021 ◽

Author(s):

Emily Lasko ◽

David Chester

Keyword(s):

Item Response Theory ◽

Measurement Invariance ◽

Item Response ◽

Social Feedback ◽

Response Theory ◽

Theory Analysis ◽

Taylor Aggression Paradigm ◽

Item Functioning ◽

Item Response Theory Analysis ◽

Great Utility

The Taylor Aggression Paradigm (TAP) is a widely used laboratory aggression task, yet item response theory (IRT) analyses of this task are nonexistent. To estimate these aspects of the TAP, we combined data from nine laboratory studies that employed the 25-trial version of the TAP (combined N = 1,856). One-factor and four-factor solutions for the TAP data exhibited evidence of measurement invariance across gender (men versus women) and experimental provocation (negative versus positive social feedback), as well as negligible instances of differential item functioning. As such, psychometric properties of the TAP were invariant across binary representations of gender and experimental provocation. Further, trials following low and high provocation were the least informative and those following moderate provocation were the most informative. Scoring approaches to the TAP may benefit from giving greater weight to trials following moderate provocation. Overall, we find great utility in applying IRT approaches to behavioral laboratory tasks.

Download Full-text