scholarly journals Evaluating Validity Properties of 25 Race-Related Scales

2022 ◽  
Author(s):  
Neil Hester ◽  
Jordan Axt ◽  
Eric Hehman

Racial attitudes, beliefs, and motivations lie at the center of many of the most influential theories of prejudice and discrimination. The extent to which such theories can meaningfully explain behavior hinges on accurate measurement of these latent constructs. We evaluated the validity properties of 25 race-related scales in a sample of 1,031,207 respondents using modern approaches such as dynamic fit indices, Item Response Theory, and nomological nets. Despite showing adequate internal reliability, many scales demonstrated poor model fit and had latent score distributions showing clear floor or ceiling effects, results that illustrate deficiencies in measures’ ability to capture their intended construct. Nomological nets further suggested that the theoretical space of “racial prejudice” is crowded with scales that may not actually capture meaningfully distinct latent constructs. We provide concrete recommendations for scale selection and renovation and outline implications for overlooking measurement issues in the study of prejudice and discrimination.

2020 ◽  
pp. 073428292093092 ◽  
Author(s):  
Patrícia Silva Lúcio ◽  
Joachim Vandekerckhove ◽  
Guilherme V. Polanczyk ◽  
Hugo Cogo-Moreira

The present study compares the fit of two- and three-parameter logistic (2PL and 3PL) models of item response theory in the performance of preschool children on the Raven’s Colored Progressive Matrices. The test of Raven is widely used for evaluating nonverbal intelligence of factor g. Studies comparing models with real data are scarce on the literature and this is the first to compare models of two and three parameters for the test of Raven, evaluating the informational gain of considering guessing probability. Participants were 582 Brazilian’s preschool children ( Mage = 57 months; SD = 7 months; 46% female) who responded individually to the instrument. The model fit indices suggested that the 2PL fit better to the data. The difficulty and ability parameters were similar between the models, with almost perfect correlations. Differences were observed in terms of discrimination and test information. The principle of parsimony must be called for comparing models.


Author(s):  
Brian Wesolowski

This chapter presents an introductory overview of concepts that underscore the general framework of item response theory. “Item response theory” is a broad umbrella term used to describe a family of mathematical measurement models that consider observed test scores to be a function of latent, unobservable constructs. Most musical constructs cannot be directly measured and are therefore unobservable. Musical constructs can therefore only be inferred based on secondary, observable behaviors. Item response theory uses observable behaviors as probabilistic distributions of responses as a logistic function of person and item parameters in order to define latent constructs. This chapter describes philosophical, theoretical, and applied perspectives of item response theory in the context of measuring musical behaviors.


2017 ◽  
Vol 41 (7) ◽  
pp. 512-529 ◽  
Author(s):  
William R. Dardick ◽  
Brandi A. Weiss

This article introduces three new variants of entropy to detect person misfit ( Ei, EMi, and EMRi), and provides preliminary evidence that these measures are worthy of further investigation. Previously, entropy has been used as a measure of approximate data–model fit to quantify how well individuals are classified into latent classes, and to quantify the quality of classification and separation between groups in logistic regression models. In the current study, entropy is explored through conceptual examples and Monte Carlo simulation comparing entropy with established measures of person fit in item response theory (IRT) such as lz, lz*, U, and W. Simulation results indicated that EMi and EMRi were successfully able to detect aberrant response patterns when comparing contaminated and uncontaminated subgroups of persons. In addition, EMi and EMRi performed similarly in showing separation between the contaminated and uncontaminated subgroups. However, EMRi may be advantageous over other measures when subtests include a small number of items. EMi and EMRi are recommended for use as approximate person-fit measures for IRT models. These measures of approximate person fit may be useful in making relative judgments about potential persons whose response patterns do not fit the theoretical model.


2014 ◽  
Vol 22 (1) ◽  
pp. 115-129 ◽  
Author(s):  
Anthony J. McGann

This article provides an algorithm to produce a time-series estimate of the political center (or median voter) from aggregate survey data, even when the same questions are not asked in most years. This is compared to the existing Stimson dyad ratios approach, which has been applied to various questions in political science. Unlike the dyad ratios approach, the model developed here is derived from an explicit model of individual behavior—the widely used item response theory model. I compare the results of both techniques using the data on public opinion from the United Kingdom from 1947 to 2005 from Bartle, Dellepiane-Avellaneda, and Stimson. Measures of overall model fit are provided, as well as techniques for testing model's assumptions and the fit of individual items. Full code is provided for estimation with free software WinBUGS and JAGS.


2021 ◽  
Vol 21 (2) ◽  
pp. 133-140
Author(s):  
Mohamad Masykurin Mafauzy ◽  
Tuan Hairulnizam Tuan Kamauzaman ◽  
Wan Nor Arifin ◽  
Hadi Fadhil Mat Said ◽  
Fatimah Ismail ◽  
...  

Flood disaster is the commonest natural disaster with huge impact on healthcare services in Malaysia. The FloodDMQ-BM© questionnaire was developed as a tool to assess the knowledge, attitude, and practice of healthcare providers regarding patient management during a flood disaster. We aim to further validate the FloodDMQ-BM© questionnaire by using Confirmatory Factor Analysis (CFA) and Item Response Theory (IRT).This cross-sectional study involved doctors, nurses and paramedics working in the Emergency Department of Hospital Universiti Sains Malaysia, Hospital Raja Perempuan Zainab II and Hospital Kuala Krai. Respondents were required to complete the FloodDMQ-BM© questionnaire. The responses were analysed by using CFA and IRT to establish its validity and reliability. A total of 215 respondents participated in this study. CFA analysis with Maximum Likelihood Robust as the estimation method, on the attitude and practice components resulted in good factor loadings (>0.5) in nearly all items and excellent model fit indices values (CFI = 0.96-0.98, TLI = 0.95-0.96, SRMR = 0.04-0.05, RMSEA = 0.07). Meanwhile, IRT analysis on the knowledge section showed a good two-way marginal fit based on S-X2, and a good model fit with RMSEA of 0.08. Based on the 2PL model by using the IRT assessment of the knowledge section, one item in the knowledge section (K3) was removed (chi-squared residual >4) resulting in improved model fit. The included items had well-standardized loadings (>0.3) and marginal reliability of 0. 651.Our results confirmed that the FloodDMQ-BM© questionnaire displayed valid and reliable psychometric properties.


2021 ◽  
Vol 13 (2) ◽  
Author(s):  
Alena Kašparová ◽  
Kateřina Doležalová ◽  
Viléma Novotná

Optimal movement rhythmisation is considered one of the basic prerequisites for improvements in the quality of movement performance using a particular technique. Well-developed rhythm-movement patterns play a role in successful learning of various physical activities as well as in athletic performance. University students – future PE and sports teachers – should improve their rhythmic feel skills during their studies so that they can use them later in their work and develop them in their future students. This requires the creation of a test battery for the evaluation of rhythmic feel skills through a series of music tests. This paper presents the results of tests taken by 121 university students at UK FTVS in Prague, the Czech Republic, and AWFIS in Gdaňsk, Poland. The test battery focused on three types of music-motor skills: perception skills and activities (items 1-18), reproduction skills and activities (items 19-27) and production skills and activities (item 28). The data were statistically processed using the classical test theory (factor analysis) and the item response theory (two-parameter model). Statistical methods also included reliability calculation and test validity. The expected rejection of the proposed hypothesis was confirmed both for the classical test theory and for the item response theory. The only exception was model 4 where, however, fit indices (especially TLI = 0.537) pointed more at a lack of evidence for hypothesis rejection than a perfect conformity of the model and data. The intention was to create and test models with the best data compliance. The best data compliance was found in models no. 1 and 5. Model 1 [CFI = 0.927, TLI=0.916, SRMR = 0.09, RMSEA (5 %) = 0.03, RMSEA (95 %) = 0.059] had a structure that corresponded to the proposed test battery and showed a relatively good compliance with data although IRT identified several problematic items. Model 5 [CFI = 0.956, TLI=0,942, SRMR = 0.073, RMSEA (5 %) = 0.03, RMSEA (95 %) = 0.111] was unidimensional (reproduction factor feeding items 19 through 27) and its fit indices showed better compliance of model and data. An optimised test battery should be developed based on these models followed by another validation of the test battery using statistical analyses.


2018 ◽  
Vol 43 (2) ◽  
pp. 172-173 ◽  
Author(s):  
Jorge N. Tendeiro ◽  
Sebastian Castro-Alvarez

In this article, the newly created GGUM R package is presented. This package finally brings the generalized graded unfolding model (GGUM) to the front stage for practitioners and researchers. It expands the possibilities of fitting this type of item response theory (IRT) model to settings that, up to now, were not possible (thus, beyond the limitations imposed by the widespread GGUM2004 software). The outcome is therefore a unique software, not limited by the dimensions of the data matrix or the operating system used. It includes various routines that allow fitting the model, checking model fit, plotting the results, and also interacting with GGUM2004 for those interested. The software should be of interest to all those who are interested in IRT in general or to ideal point models in particular.


2021 ◽  
Vol 46 (1) ◽  
pp. 53-67
Author(s):  
James Soland ◽  
Megan Kuhfeld

Researchers in the social sciences often obtain ratings of a construct of interest provided by multiple raters. While using multiple raters provides a way to help avoid the subjectivity of any given person’s responses, rater disagreement can be a problem. A variety of models exist to address rater disagreement in both structural equation modeling and item response theory frameworks. Recently, a model was developed by Bauer et al. (2013) and referred to as the “trifactor model” to provide applied researchers with a straightforward way of estimating scores that are purged of variance that is idiosyncratic by rater. Although the intent of the model is to be usable and interpretable, little is known about the circumstances under which it performs well, and those it does not. We conduct simulation studies to examine the performance of the trifactor model under a range of sample sizes and model specifications and then compare model fit, bias, and convergence rates.


Sign in / Sign up

Export Citation Format

Share Document