Is It Worthy to Take Account of the “Guessing” in the Performance of the Raven Test? Calling for the Principle of Parsimony for Test Validation

2020 ◽  
pp. 073428292093092 ◽  
Author(s):  
Patrícia Silva Lúcio ◽  
Joachim Vandekerckhove ◽  
Guilherme V. Polanczyk ◽  
Hugo Cogo-Moreira

The present study compares the fit of two- and three-parameter logistic (2PL and 3PL) models of item response theory in the performance of preschool children on the Raven’s Colored Progressive Matrices. The test of Raven is widely used for evaluating nonverbal intelligence of factor g. Studies comparing models with real data are scarce on the literature and this is the first to compare models of two and three parameters for the test of Raven, evaluating the informational gain of considering guessing probability. Participants were 582 Brazilian’s preschool children ( Mage = 57 months; SD = 7 months; 46% female) who responded individually to the instrument. The model fit indices suggested that the 2PL fit better to the data. The difficulty and ability parameters were similar between the models, with almost perfect correlations. Differences were observed in terms of discrimination and test information. The principle of parsimony must be called for comparing models.

2022 ◽  
Author(s):  
Neil Hester ◽  
Jordan Axt ◽  
Eric Hehman

Racial attitudes, beliefs, and motivations lie at the center of many of the most influential theories of prejudice and discrimination. The extent to which such theories can meaningfully explain behavior hinges on accurate measurement of these latent constructs. We evaluated the validity properties of 25 race-related scales in a sample of 1,031,207 respondents using modern approaches such as dynamic fit indices, Item Response Theory, and nomological nets. Despite showing adequate internal reliability, many scales demonstrated poor model fit and had latent score distributions showing clear floor or ceiling effects, results that illustrate deficiencies in measures’ ability to capture their intended construct. Nomological nets further suggested that the theoretical space of “racial prejudice” is crowded with scales that may not actually capture meaningfully distinct latent constructs. We provide concrete recommendations for scale selection and renovation and outline implications for overlooking measurement issues in the study of prejudice and discrimination.


2018 ◽  
Vol 15 (4) ◽  
pp. 2407
Author(s):  
Yeşim Bayrakdaroglu ◽  
Dursun Katkat

The purpose of this study is to research how marketing activities of international sports organizations are performed and to develop a scale determining the effects of image management on public. The audiences of interuniversity World Winter Olympic sheld in Erzurum in 2011 participated in the research. Explanatory and Confirmatory Factor Analysis, reliability analysis were performed over the data obtained. All model fit indices of 25-item and four-factor structure of quality-image scale perceived in sports organizations applied were found to be at good level. In line with the findings obtained from the explanatory and confirmatory factor analyses and reliability analysis, it can be uttered that the scale is a valid and reliable measurement tool that can be used in field researches.


2018 ◽  
Vol 18 (3) ◽  
Author(s):  
Pablo Ezequiel Flores-Kanter ◽  
Sergio Dominguez-Lara ◽  
Mario Alberto Trógolo ◽  
Leonardo Adrián Medrano

<p>Bifactor models have gained increasing popularity in the literature concerned with personality, psychopathology and assessment. Empirical studies using bifactor analysis generally judge the estimated model using SEM model fit indices, which may lead to erroneous interpretations and conclusions. To address this problem, several researchers have proposed multiple criteria to assess bifactor models, such as a) conceptual grounds, b) overall model fit indices, and c) specific bifactor model indicators. In this article, we provide a brief summary of these criteria. An example using data gathered from a recently published research article is also provided to show how taking into account all criteria, rather than solely SEM model fit indices, may prevent researchers from drawing wrong conclusions.</p>


2020 ◽  
Vol 44 (5) ◽  
pp. 362-375
Author(s):  
Tyler Strachan ◽  
Edward Ip ◽  
Yanyan Fu ◽  
Terry Ackerman ◽  
Shyh-Huei Chen ◽  
...  

As a method to derive a “purified” measure along a dimension of interest from response data that are potentially multidimensional in nature, the projective item response theory (PIRT) approach requires first fitting a multidimensional item response theory (MIRT) model to the data before projecting onto a dimension of interest. This study aims to explore how accurate the PIRT results are when the estimated MIRT model is misspecified. Specifically, we focus on using a (potentially misspecified) two-dimensional (2D)-MIRT for projection because of its advantages, including interpretability, identifiability, and computational stability, over higher dimensional models. Two large simulation studies (I and II) were conducted. Both studies examined whether the fitting of a 2D-MIRT is sufficient to recover the PIRT parameters when multiple nuisance dimensions exist in the test items, which were generated, respectively, under compensatory MIRT and bifactor models. Various factors were manipulated, including sample size, test length, latent factor correlation, and number of nuisance dimensions. The results from simulation studies I and II showed that the PIRT was overall robust to a misspecified 2D-MIRT. Smaller third and fourth simulation studies were done to evaluate recovery of the PIRT model parameters when the correctly specified higher dimensional MIRT or bifactor model was fitted with the response data. In addition, a real data set was used to illustrate the robustness of PIRT.


2019 ◽  
Vol 45 (3) ◽  
pp. 274-296
Author(s):  
Yang Liu ◽  
Xiaojing Wang

Parametric methods, such as autoregressive models or latent growth modeling, are usually inflexible to model the dependence and nonlinear effects among the changes of latent traits whenever the time gap is irregular and the recorded time points are individually varying. Often in practice, the growth trend of latent traits is subject to certain monotone and smooth conditions. To incorporate such conditions and to alleviate the strong parametric assumption on regressing latent trajectories, a flexible nonparametric prior has been introduced to model the dynamic changes of latent traits for item response theory models over the study period. Suitable Bayesian computation schemes are developed for such analysis of the longitudinal and dichotomous item responses. Simulation studies and a real data example from educational testing have been used to illustrate our proposed methods.


2017 ◽  
Vol 41 (7) ◽  
pp. 512-529 ◽  
Author(s):  
William R. Dardick ◽  
Brandi A. Weiss

This article introduces three new variants of entropy to detect person misfit ( Ei, EMi, and EMRi), and provides preliminary evidence that these measures are worthy of further investigation. Previously, entropy has been used as a measure of approximate data–model fit to quantify how well individuals are classified into latent classes, and to quantify the quality of classification and separation between groups in logistic regression models. In the current study, entropy is explored through conceptual examples and Monte Carlo simulation comparing entropy with established measures of person fit in item response theory (IRT) such as lz, lz*, U, and W. Simulation results indicated that EMi and EMRi were successfully able to detect aberrant response patterns when comparing contaminated and uncontaminated subgroups of persons. In addition, EMi and EMRi performed similarly in showing separation between the contaminated and uncontaminated subgroups. However, EMRi may be advantageous over other measures when subtests include a small number of items. EMi and EMRi are recommended for use as approximate person-fit measures for IRT models. These measures of approximate person fit may be useful in making relative judgments about potential persons whose response patterns do not fit the theoretical model.


2014 ◽  
Vol 22 (1) ◽  
pp. 115-129 ◽  
Author(s):  
Anthony J. McGann

This article provides an algorithm to produce a time-series estimate of the political center (or median voter) from aggregate survey data, even when the same questions are not asked in most years. This is compared to the existing Stimson dyad ratios approach, which has been applied to various questions in political science. Unlike the dyad ratios approach, the model developed here is derived from an explicit model of individual behavior—the widely used item response theory model. I compare the results of both techniques using the data on public opinion from the United Kingdom from 1947 to 2005 from Bartle, Dellepiane-Avellaneda, and Stimson. Measures of overall model fit are provided, as well as techniques for testing model's assumptions and the fit of individual items. Full code is provided for estimation with free software WinBUGS and JAGS.


Sign in / Sign up

Export Citation Format

Share Document