A Motivational-Developmental Free Response Assessment Through a Bifactor Lens

The Motivational-Developmental Assessment (MDA) measures a university student’s motivational and developmental attributes by utilizing overlapping constructs measured across four writing prompts. The MDA’s format may lead to the violation of the local item independence (LII) assumption for unidimensional item response theory (IRT) scoring models, or the uncorrelated errors assumption for scoring models in classical test theory (CTT) due to the measurement of overlapping constructs within a prompt. This assumption violation is known as a testlet effect, which can be viewed as a method effect. The application of a unidimensional IRT or CTT model to score the MDA can result in imprecise parameter estimates when this effect is ignored. To control for this effect in the MDA responses, we first examined the presence of local dependence via a restricted bifactor model and Yen’s Q3 statistic. Second, we applied bifactor models to account for the testlet effect in the responses, as this effect is modeled as an additional latent variable in a factor model. Results support the presence of local dependence in two of the four MDA prompts, and the use of the restricted bifactor model to account for the testlet effect in the responses. Modeling the testlet effect through the restricted bifactor model supports a scoring inference in a validation argument framework. Implications are discussed.

Download Full-text

Evaluating the Robustness of Graded Response Model and Classical Test Theory Parameter Estimates to Deviant Items

Applied Psychological Measurement ◽

10.1177/01421602026002005 ◽

2002 ◽

Vol 26 (2) ◽

pp. 181-191 ◽

Cited By ~ 7

Author(s):

Evan F. Sinar ◽

Michael J. Zickar

Keyword(s):

Classical Test Theory ◽

Test Theory ◽

Parameter Estimates ◽

Response Model ◽

Graded Response Model ◽

Classical Test ◽

Graded Response

Download Full-text

The Reliability Factor: Modeling individual reliability with multiple items from a single assessment

10.31234/osf.io/kr4xq ◽

2020 ◽

Author(s):

Stephen Ross Martin ◽

Philippe Rast

Keyword(s):

Factor Model ◽

Classical Test Theory ◽

Error Variance ◽

Repeated Measurements ◽

Test Theory ◽

Reliability Factor ◽

Reliability Estimates ◽

Factor Modeling ◽

Fixed Quantity ◽

Parallel Tests

Reliability is a crucial concept in psychometrics. Although it is typically estimated as a single fixed quantity, previous work suggests that reliability can vary across persons, groups, and covariates. We propose a novel method for estimating and modeling case-specific reliability without repeated measurements or parallel tests. The proposed method employs a “Reliability Factor” that models the error variance of each case across multiple indicators, thereby producing case-specific reliability estimates. Additionally, we use Gaussian process modeling to a estimate non-linear, non-monotonic function between the latent factor itself and the reliability of the measure, providing an analogue to test information functions in item response theory. The reliability factor model is a new tool for examining latent regions with poor conditional reliability, and correlates thereof, in a classical test theory framework.

Download Full-text

Development and Validation of the Perceived Stress Scale in Emergency Medical Teams During the Epidemic of COVID-19

10.21203/rs.3.rs-665920/v1 ◽

2021 ◽

Author(s):

Gaopei Zhu ◽

Zhongli Wang ◽

Yuhang Zhu ◽

Jiaojiao Li ◽

Peixia Guan ◽

...

Keyword(s):

Perceived Stress ◽

Bifactor Model ◽

Test Theory ◽

Local Dependence ◽

Medical Team ◽

Perceived Stress Scale ◽

Emergency Medical Team ◽

Stress Scale ◽

Emergency Medical ◽

Medical Teams

Abstract BackgroundDuring the epidemic of COVID-19 of China, the emergency medical teams are facing serious stress in the front-line. As far as we know, there are no studies to test the applicability and measurement properties of the 10-item Chinese perceived stress scale (CPSS-10) in the emergency medical team.MethodsFrom March 17 to 27, 2020, an online survey was conducted on the emergency medical teams of Liaoning Province who supporting Wuhan. The CPSS-10 was used to measure the stress of medical workers. Classical test theory (CTT), bifactor model and multidimensional graded response model (MGRM) were used to analyze the measurement characteristics and differential item functioning (DIF) of CPSS-10.ResultsThe Cronbach's alpha coefficient of CPSS-10 was 0.86. Bifactor model confirmed that CPSS-10 was a two-factor structure. MGRM showed ordered response categories of K10. Item 8 could distinguish individual stress, but the slope of this item was very large (slope is 7.97, which was higher than 4), showing local dependence. There was a significant age DIF, but no DIF in gender. After removing the items 2, 5, and 8, the CPSS-7 showed high reliability, without DIF of age and gender, and there was no local dependence.ConclusionsMGRM could provide useful measurement information about CPSS-10 and CPSS-7. MGRM found that CPSS-10 did not fully conform to the item response theory (IRT). CPSS-7 had proved to be a more effective and reliable tool for assessing the perceived stress of emergency medical team.

Download Full-text

A Robust Method for Detecting Item Misfit in Large Scale Assessments

10.31234/osf.io/75rqk ◽

2021 ◽

Author(s):

Matthias von Davier ◽

Ummugul Bezirhan

Keyword(s):

Latent Variable ◽

Large Scale ◽

Classical Test Theory ◽

Test Theory ◽

Model Data ◽

Adequate Model ◽

Item Fit ◽

Item Functioning ◽

Alternative Approach ◽

Perfect Model

Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical assumptions such as the monotonicity and population independence of item functions are present even in classical test theory but are more explicitly stated when using item response theory or other latent variable models for the assessment of item fit. The work presented here provides an alternative approach that does not assume perfect model data fit, but rather uses Tukey’s concept of contaminated distributions and proposes an application of robust outlier detection in order to flag items for which adequate model data fit cannot be established.

Download Full-text

Observed Score Reliability Indices in Diagnostic Classification Models

10.31234/osf.io/6hvst ◽

2020 ◽

Author(s):

Kazuhiro Yamaguchi ◽

Jonathan Templin

Keyword(s):

Latent Variable ◽

Empirical Test ◽

Classical Test Theory ◽

Test Theory ◽

Diagnostic Classification ◽

Classification Models ◽

Diagnostic Classification Models ◽

Reliability Indices ◽

Score Reliability ◽

Observed Score

Quantifying the reliability of latent variable estimates in diagnostic classification models has been a difficult topic, complicated by the classification-based nature of these models. In this study, we derive observed score reliability indices based on diagnostic classification models as an extension of classical test theory-based reliability. Additionally, we derive conditional observed sum- and sub-score distributions. In this manner, various conditional expectations and conditional standard error of measurement estimates can be calculated for both total- and sub-scores of a test. The proposed methods provide a variety of expectations and standard errors for attribute estimates, which we demonstrate in an analysis of an empirical test.

Download Full-text

On True Score Evaluation Using Item Response Theory Modeling

Educational and Psychological Measurement ◽

10.1177/0013164417741711 ◽

2017 ◽

Vol 79 (4) ◽

pp. 796-807 ◽

Cited By ~ 1

Author(s):

Tenko Raykov ◽

Dimiter M. Dimitrov ◽

George A. Marcoulides ◽

Michael Harrison

Keyword(s):

Item Response Theory ◽

Item Response ◽

Latent Variable ◽

Classical Test Theory ◽

Interval Estimation ◽

The Body ◽

Test Theory ◽

True Score ◽

Response Theory ◽

Classical Test

Building on prior research on the relationships between key concepts in item response theory and classical test theory, this note contributes to highlighting their important and useful links. A readily and widely applicable latent variable modeling procedure is discussed that can be used for point and interval estimation of the individual person true score on any item in a unidimensional multicomponent measuring instrument or item set under consideration. The method adds to the body of research on the connections between classical test theory and item response theory. The outlined estimation approach is illustrated on empirical data.

Download Full-text

A Robust Method for Detecting Item Misfit in Large Scale Assessments

10.31234/osf.io/mnsdg ◽

2021 ◽

Author(s):

Matthias von Davier ◽

Ummugul Bezirhan

Keyword(s):

Latent Variable ◽

Large Scale ◽

Classical Test Theory ◽

Test Theory ◽

Model Data ◽

Adequate Model ◽

Item Fit ◽

Item Functioning ◽

Alternative Approach ◽

Perfect Model

Download Full-text

Testing the Assumption of Uncorrelated Errors for Short Scales by Means of Structural Equation Modeling

Journal of Individual Differences ◽

10.1027/1614-0001/a000135 ◽

2014 ◽

Vol 35 (4) ◽

pp. 201-211 ◽

Cited By ~ 3

Author(s):

André Beauducel ◽

Anja Leue

Keyword(s):

Structural Equation Modeling ◽

Negative Correlation ◽

Structural Equation ◽

Error Score ◽

Classical Test Theory ◽

Test Theory ◽

Equation Modeling ◽

Eysenck Personality Questionnaire ◽

Personality Questionnaire ◽

True Score

It is shown that a minimal assumption should be added to the assumptions of Classical Test Theory (CTT) in order to have positive inter-item correlations, which are regarded as a basis for the aggregation of items. Moreover, it is shown that the assumption of zero correlations between the error score estimates is substantially violated in the population of individuals when the number of items is small. Instead, a negative correlation between error score estimates occurs. The reason for the negative correlation is that the error score estimates for different items of a scale are based on insufficient true score estimates when the number of items is small. A test of the assumption of uncorrelated error score estimates by means of structural equation modeling (SEM) is proposed that takes this effect into account. The SEM-based procedure is demonstrated by means of empirical examples based on the Edinburgh Handedness Inventory and the Eysenck Personality Questionnaire-Revised.

Download Full-text

The Effect of Alternative Scoring Procedures on the Measurement Properties of a Self-Administered Depression Scale

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000371 ◽

2019 ◽

Vol 35 (1) ◽

pp. 55-62 ◽

Cited By ~ 1

Author(s):

Noboru Iwata ◽

Akizumi Tsutsumi ◽

Takafumi Wakita ◽

Ryuichi Kumagai ◽

Hiroyuki Noguchi ◽

...

Keyword(s):

Classical Test Theory ◽

Depression Scale ◽

Psychometric Testing ◽

Epidemiologic Studies ◽

Test Theory ◽

Measurement Properties ◽

Irt Model ◽

Response Alternatives ◽

Polytomous Item Response ◽

Θ Point

Abstract. To investigate the effect of response alternatives/scoring procedures on the measurement properties of the Center for Epidemiologic Studies Depression Scale (CES-D) which has the four response alternatives, a polytomous item response theory (IRT) model was applied to the responses of 2,061 workers and university students (1,640 males, 421 females). Test information functions derived from the polytomous IRT analyses on the CES-D data with various scoring procedures indicated that: (1) the CES-D with its standard (0-1-2-3) scoring procedure should be useful for screening to detect subjects with “at high-risk” of depression if the θ point showing the highest information corresponds to the cut-off point, because of its extremely higher information; (2) the CES-D with the 0-1-1-2 scoring procedure could cover wider range of depressive severity, suggesting that this scoring procedure might be useful in cases where more exhaustive discrimination in symptomatology is of interest; and (3) the revised version of CES-D with replacing original positive items into negatively revised items outperformed the original version. These findings have never been demonstrated by the classical test theory analyses, and thus the utility of this kind of psychometric testing should be warranted to further investigation for the standard measures of psychological assessment.

Download Full-text

Construction and psychometric properties of a computer memory battery using classical test theory and item response theory

PsycEXTRA Dataset ◽

10.1037/e578442014-080 ◽

2011 ◽

Author(s):

Aristides Ferreira

Keyword(s):

Item Response Theory ◽

Psychometric Properties ◽

Item Response ◽

Classical Test Theory ◽

Test Theory ◽

Computer Memory ◽

Response Theory ◽

Classical Test

Download Full-text