Educational and Psychological Measurement
Latest Publications


TOTAL DOCUMENTS

8525
(FIVE YEARS 170)

H-INDEX

121
(FIVE YEARS 4)

Published By Sage Publications

0013-1644

2022 ◽  
pp. 001316442110694
Author(s):  
Chet Robie ◽  
Adam W. Meade ◽  
Stephen D. Risavy ◽  
Sabah Rasheed

The effects of different response option orders on survey responses have been studied extensively. The typical research design involves examining the differences in response characteristics between conditions with the same item stems and response option orders that differ in valence—either incrementally arranged (e.g., strongly disagree to strongly agree) or decrementally arranged (e.g., strongly agree to strongly disagree). The present study added two additional experimental conditions—randomly incremental or decremental and completely randomized. All items were presented in an item-by-item format. We also extended previous studies by including an examination of response option order effects on: careless responding, correlations between focal predictors and criteria, and participant reactions, all the while controlling for false discovery rate and focusing on the size of effects. In a sample of 1,198 university students, we found little to no response option order effects on a recognized personality assessment vis-à-vis measurement equivalence, scale mean differences, item-level distributions, or participant reactions. However, the completely randomized response option order condition differed on several careless responding indices suggesting avenues for future research.


2022 ◽  
pp. 001316442110699
Author(s):  
Hung-Yu Huang

The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs) can provide information regarding the mastery status of test takers on latent discrete variables and are more commonly used for cognitive tests employed in educational settings than for noncognitive tests. The purpose of this study is to develop a new class of DCM for FC items under the higher-order DCM framework to meet the practical demands of simultaneously controlling for response biases and providing diagnostic classification information. By conducting a series of simulations and calibrating the model parameters with a Bayesian estimation, the study shows that, in general, the model parameters can be recovered satisfactorily with the use of long tests and large samples. More attributes improve the precision of the second-order latent trait estimation in a long test, but decrease the classification accuracy and the estimation quality of the structural parameters. When statements are allowed to load on two distinct attributes in paired comparison items, the specific-attribute condition produces better a parameter estimation than the overlap-attribute condition. Finally, an empirical analysis related to work-motivation measures is presented to demonstrate the applications and implications of the new model.


2022 ◽  
pp. 001316442110634
Author(s):  
Patrick D. Manapat ◽  
Michael C. Edwards

When fitting unidimensional item response theory (IRT) models, the population distribution of the latent trait (θ) is often assumed to be normally distributed. However, some psychological theories would suggest a nonnormal θ. For example, some clinical traits (e.g., alcoholism, depression) are believed to follow a positively skewed distribution where the construct is low for most people, medium for some, and high for few. Failure to account for nonnormality may compromise the validity of inferences and conclusions. Although corrections have been developed to account for nonnormality, these methods can be computationally intensive and have not yet been widely adopted. Previous research has recommended implementing nonnormality corrections when θ is not “approximately normal.” This research focused on examining how far θ can deviate from normal before the normality assumption becomes untenable. Specifically, our goal was to identify the type(s) and degree(s) of nonnormality that result in unacceptable parameter recovery for the graded response model (GRM) and 2-parameter logistic model (2PLM).


2022 ◽  
pp. 001316442110688
Author(s):  
Yasuo Miyazaki ◽  
Akihito Kamata ◽  
Kazuaki Uekawa ◽  
Yizhi Sun

This paper investigated consequences of measurement error in the pretest on the estimate of the treatment effect in a pretest–posttest design with the analysis of covariance (ANCOVA) model, focusing on both the direction and magnitude of its bias. Some prior studies have examined the magnitude of the bias due to measurement error and suggested ways to correct it. However, none of them clarified how the direction of bias is affected by measurement error. This study analytically derived a formula for the asymptotic bias for the treatment effect. The derived formula is a function of the reliability of the pretest, the standardized population group mean difference for the pretest, and the correlation between pretest and posttest true scores. It revealed a concerning consequence of ignoring measurement errors in pretest scores: treatment effects could be overestimated or underestimated, and positive treatment effects can be estimated as negative effects in certain conditions. A simulation study was also conducted to verify the derived bias formula.


2022 ◽  
pp. 001316442110684
Author(s):  
Natalie A. Koziol ◽  
J. Marc Goodrich ◽  
HyeonJin Yoon

Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A simulation study was performed to compare the new framework with traditional logistic regression, with respect to Type I error and power rates of the uniform DIF test statistics and bias and root mean square error of the corresponding effect size estimators. The new framework better controlled the Type I error rate and demonstrated minimal bias but suffered from low power and lack of precision. Implications for practice are discussed.


2022 ◽  
pp. 001316442110669
Author(s):  
Bitna Lee ◽  
Wonsook Sohn

A Monte Carlo study was conducted to compare the performance of a level-specific (LS) fit evaluation with that of a simultaneous (SI) fit evaluation in multilevel confirmatory factor analysis (MCFA) models. We extended previous studies by examining their performance under MCFA models with different factor structures across levels. In addition, various design factors and interaction effects between intraclass correlation (ICC) and misspecification type (MT) on their performance were considered. The simulation results demonstrate that the LS outperformed the SI in detecting model misspecification at the between-group level even in the MCFA model with different factor structures across levels. Especially, the performance of LS fit indices depended on the ICC, group size (GS), or MT. More specifically, the results are as follows. First, the performance of root mean square error of approximation (RMSEA) was more promising in detecting misspecified between-level models as GS or ICC increased. Second, the effect of ICC on the performance of comparative fit index (CFI) or Tucker–Lewis index (TLI) depended on the MT. Third, the performance of standardized root mean squared residual (SRMR) improved as ICC increased and this pattern was more clear in structure misspecification than in measurement misspecification. Finally, the summary and implications of the results are discussed.


2021 ◽  
pp. 001316442110590
Author(s):  
Tim Cosemans ◽  
Yves Rosseel ◽  
Sarah Gelper

Exploratory graph analysis (EGA) is a commonly applied technique intended to help social scientists discover latent variables. Yet, the results can be influenced by the methodological decisions the researcher makes along the way. In this article, we focus on the choice regarding the number of factors to retain: We compare the performance of the recently developed EGA with various traditional factor retention criteria. We use both continuous and binary data, as evidence regarding the accuracy of such criteria in the latter case is scarce. Simulation results, based on scenarios resulting from varying sample size, communalities from major factors, interfactor correlations, skewness, and correlation measure, show that EGA outperforms the traditional factor retention criteria considered in most cases in terms of bias and accuracy. In addition, we show that factor retention decisions for binary data are preferably made using Pearson, instead of tetrachoric, correlations, which is contradictory to popular belief.


2021 ◽  
pp. 001316442110618
Author(s):  
Brooke E. Magnus ◽  
Yang Liu

Questionnaires inquiring about psychopathology symptoms often produce data with excess zeros or the equivalent (e.g., none, never, and not at all). This type of zero inflation is especially common in nonclinical samples in which many people do not exhibit psychopathology, and if unaccounted for, can result in biased parameter estimates when fitting latent variable models. In the present research, we adopt a maximum likelihood approach in fitting multidimensional zero-inflated and hurdle graded response models to data from a psychological distress measure. These models include two latent variables: susceptibility, which relates to the probability of endorsing the symptom at all, and severity, which relates to the frequency of the symptom, given its presence. After estimating model parameters, we compute susceptibility and severity scale scores and include them as explanatory variables in modeling health-related criterion measures (e.g., suicide attempts, diagnosis of major depressive disorder). Results indicate that susceptibility and severity uniquely and differentially predict other health outcomes, which suggests that symptom presence and symptom severity are unique indicators of psychopathology and both may be clinically useful. Psychometric and clinical implications are discussed, including scale score reliability.


2021 ◽  
pp. 001316442110497
Author(s):  
Robert L. Brennan ◽  
Stella Y. Kim ◽  
Won-Chan Lee

This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and free-response items, with the latter involving variability attributable to both items and raters. In this case, two distinct designs are needed to fully characterize the design and capture potential sources of error associated with each item format. Another example involves tests containing both testlets and one or more stand-alone sets of items. Testlet effects need to be taken into account for the testlet-based items, but not the stand-alone sets of items. This article presents an extension of MGT that faithfully models such complex test designs, along with two real-data examples. Among other things, these examples illustrate that estimates of error variance, error–tolerance ratios, and reliability-like coefficients can be biased if there is a mismatch between the user-specified universe of generalization and the complex nature of the test.


2021 ◽  
pp. 001316442110462
Author(s):  
Mark Elliott ◽  
Paula Buttery

We investigate two non-iterative estimation procedures for Rasch models, the pair-wise estimation procedure (PAIR) and the Eigenvector method (EVM), and identify theoretical issues with EVM for rating scale model (RSM) threshold estimation. We develop a new procedure to resolve these issues—the conditional pairwise adjacent thresholds procedure (CPAT)—and test the methods using a large number of simulated datasets to compare the estimates against known generating parameters. We find support for our hypotheses, in particular that EVM threshold estimates suffer from theoretical issues which lead to biased estimates and that CPAT represents a means of resolving these issues. These findings are both statistically significant ( p < .001) and of a large effect size. We conclude that CPAT deserves serious consideration as a conditional, computationally efficient approach to Rasch parameter estimation for the RSM. CPAT has particular potential for use in contexts where computational load may be an issue, such as systems with multiple online algorithms and large test banks with sparse data designs.


Sign in / Sign up

Export Citation Format

Share Document