What makes experts reliable? Expert reliability and the estimation of latent traits

Experts code latent quantities for many influential political science datasets. Although scholars are aware of the importance of accounting for variation in expert reliability when aggregating such data, they have not systematically explored either the factors affecting expert reliability or the degree to which these factors influence estimates of latent concepts. Here we provide a template for examining potential correlates of expert reliability, using coder-level data for six randomly selected variables from a cross-national panel dataset. We aggregate these data with an ordinal item response theory model that parameterizes expert reliability, and regress the resulting reliability estimates on both expert demographic characteristics and measures of their coding behavior. We find little evidence of a consistent substantial relationship between most expert characteristics and reliability, and these null results extend to potentially problematic sources of bias in estimates, such as gender. The exceptions to these results are intuitive, and provide baseline guidance for expert recruitment and retention in future expert coding projects: attentive and confident experts who have contextual knowledge tend to be more reliable. Taken as a whole, these findings reinforce arguments that item response theory models are a relatively safe method for aggregating expert-coded data.

Download Full-text

Bayesian Nonparametric Monotone Regression of Dynamic Latent Traits in Item Response Theory Models

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998619887913 ◽

2019 ◽

Vol 45 (3) ◽

pp. 274-296

Author(s):

Yang Liu ◽

Xiaojing Wang

Keyword(s):

Item Response Theory ◽

Item Response ◽

Nonlinear Effects ◽

Real Data ◽

Monotone Regression ◽

Response Theory ◽

Growth Trend ◽

Nonparametric Prior ◽

Latent Traits ◽

Item Response Theory Models

Parametric methods, such as autoregressive models or latent growth modeling, are usually inflexible to model the dependence and nonlinear effects among the changes of latent traits whenever the time gap is irregular and the recorded time points are individually varying. Often in practice, the growth trend of latent traits is subject to certain monotone and smooth conditions. To incorporate such conditions and to alleviate the strong parametric assumption on regressing latent trajectories, a flexible nonparametric prior has been introduced to model the dynamic changes of latent traits for item response theory models over the study period. Suitable Bayesian computation schemes are developed for such analysis of the longitudinal and dichotomous item responses. Simulation studies and a real data example from educational testing have been used to illustrate our proposed methods.

Download Full-text

An Extension of Testlet-Based Equating to the Polytomous Testlet Response Theory Model

Frontiers in Psychology ◽

10.3389/fpsyg.2021.743362 ◽

2022 ◽

Vol 12 ◽

Author(s):

Feifei Huang ◽

Zhe Li ◽

Ying Liu ◽

Jingan Su ◽

Li Yin ◽

...

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Theory Model ◽

Response Theory ◽

Polytomous Items ◽

Testlet Response Theory ◽

Accurate Performance ◽

The Impact ◽

Item Response Theory Models

Educational assessments tests are often constructed using testlets because of the flexibility to test various aspects of the cognitive activities and broad content sampling. However, the violation of the local item independence assumption is inevitable when tests are built using testlet items. In this study, simulations are conducted to evaluate the performance of item response theory models and testlet response theory models for both the dichotomous and polytomous items in the context of equating tests composed of testlets. We also examine the impact of testlet effect, length of testlet items, and sample size on estimating item and person parameters. The results show that more accurate performance of testlet response theory models over item response theory models was consistently observed across the studies, which supports the benefits of using the testlet response theory models in equating for tests composed of testlets. Further, results of the study indicate that when sample size is large, item response theory models performed similarly to testlet response theory models across all studies.

Download Full-text

Item Response Theory Models for Ipsative Tests With Multidimensional Pairwise Comparison Items

Applied Psychological Measurement ◽

10.1177/0146621617703183 ◽

2017 ◽

Vol 41 (8) ◽

pp. 600-613 ◽

Cited By ~ 9

Author(s):

Wen-Chung Wang ◽

Xue-Lan Qiu ◽

Chia-Wen Chen ◽

Sage Ro ◽

Kuan-Yu Jin

Keyword(s):

Item Response Theory ◽

Item Response ◽

Pairwise Comparison ◽

Forced Choice ◽

Measurement Properties ◽

Response Theory ◽

Psychological Differentiation ◽

Parameter Recovery ◽

Latent Traits ◽

Item Response Theory Models

There is re-emerging interest in adopting forced-choice items to address the issue of response bias in Likert-type items for noncognitive latent traits. Multidimensional pairwise comparison (MPC) items are commonly used forced-choice items. However, few studies have been aimed at developing item response theory models for MPC items owing to the challenges associated with ipsativity. Acknowledging that the absolute scales of latent traits are not identifiable in ipsative tests, this study developed a Rasch ipsative model for MPC items that has desirable measurement properties, yields a single utility value for each statement, and allows for comparing psychological differentiation between and within individuals. The simulation results showed a good parameter recovery for the new model with existing computer programs. This article provides an empirical example of an ipsative test on work style and behaviors.

Download Full-text

Effect Size Measures for Bi-Factor Testlet Item Response Theory Model

PsycEXTRA Dataset ◽

10.1037/e589212013-001 ◽

2013 ◽

Author(s):

Akihito Kamata ◽

Chalie Patarapichayatham

Keyword(s):

Item Response Theory ◽

Item Response ◽

Effect Size ◽

Theory Model ◽

Item Response Theory Model ◽

Response Theory

Download Full-text

Different approaches to modeling response styles in divide-by-total item response theory models (part 1): A model integration.

Psychological Methods ◽

10.1037/met0000249 ◽

2020 ◽

Vol 25 (5) ◽

pp. 560-576

Author(s):

Mirka Henninger ◽

Thorsten Meiser

Keyword(s):

Item Response Theory ◽

Item Response ◽

Response Styles ◽

Model Integration ◽

Response Theory ◽

Item Response Theory Models ◽

Total Item

Download Full-text

A Multidimensional Item Response Theory Model for Continuous and Graded Responses With Error in Persons and Items

Educational and Psychological Measurement ◽

10.1177/0013164421998412 ◽

2021 ◽

pp. 001316442199841

Author(s):

Pere J. Ferrando ◽

David Navarro-González

Keyword(s):

Item Response Theory ◽

Item Response ◽

Theory Model ◽

Response Model ◽

Response Theory ◽

Continuous Response ◽

Graded Responses ◽

Graded Response ◽

Continuous Responses ◽

Differential Measurement Error

Item response theory “dual” models (DMs) in which both items and individuals are viewed as sources of differential measurement error so far have been proposed only for unidimensional measures. This article proposes two multidimensional extensions of existing DMs: the M-DTCRM (dual Thurstonian continuous response model), intended for (approximately) continuous responses, and the M-DTGRM (dual Thurstonian graded response model), intended for ordered-categorical responses (including binary). A rationale for the extension to the multiple-content-dimensions case, which is based on the concept of the multidimensional location index, is first proposed and discussed. Then, the models are described using both the factor-analytic and the item response theory parameterizations. Procedures for (a) calibrating the items, (b) scoring individuals, (c) assessing model appropriateness, and (d) assessing measurement precision are finally discussed. The simulation results suggest that the proposal is quite feasible, and an illustrative example based on personality data is also provided. The proposals are submitted to be of particular interest for the case of multidimensional questionnaires in which the number of items per scale would not be enough for arriving at stable estimates if the existing unidimensional DMs were fitted on a separate-scale basis.

Download Full-text

Exploring the Robustness of a Unidimensional Item Response Theory Model With Empirically Multidimensional Data

Applied Measurement in Education ◽

10.1080/08957347.2017.1316277 ◽

2017 ◽

Vol 30 (3) ◽

pp. 163-177 ◽

Cited By ~ 5

Author(s):

Daniel Anderson ◽

Joshua D. Kahn ◽

Gerald Tindal

Keyword(s):

Item Response Theory ◽

Item Response ◽

Theory Model ◽

Item Response Theory Model ◽

Multidimensional Data ◽

Response Theory

Download Full-text

Using Item Response Theory Models to Evaluate the Practice Environment Scale

Journal of Nursing Measurement ◽

10.1891/1061-3749.22.2.323 ◽

2014 ◽

Vol 22 (2) ◽

pp. 323-341 ◽

Cited By ~ 6

Author(s):

Dheeraj Raju ◽

Xiaogang Su ◽

Patricia A. Patrician

Keyword(s):

Item Response Theory ◽

Item Response ◽

Information Criterion ◽

Partial Credit Model ◽

Practice Environment ◽

Partial Credit ◽

Response Theory ◽

Environment Scale ◽

Graded Response ◽

Item Response Theory Models

Background and Purpose: The purpose of this article is to introduce different types of item response theory models and to demonstrate their usefulness by evaluating the Practice Environment Scale. Methods: Item response theory models such as constrained and unconstrained graded response model, partial credit model, Rasch model, and one-parameter logistic model are demonstrated. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) indices are used as model selection criterion. Results: The unconstrained graded response and partial credit models indicated the best fit for the data. Almost all items in the instrument performed well. Conclusions: Although most of the items strongly measure the construct, there are a few items that could be eliminated without substantially altering the instrument. The analysis revealed that the instrument may function differently when administered to different unit types.

Download Full-text