scholarly journals Using Guttman errors to explore rater fit in rater-mediated performance assessments

2018 ◽  
Vol 11 (3) ◽  
pp. 205979911881439 ◽  
Author(s):  
Stefanie A Wind

Model-data fit indices for raters provide insight into the degree to which raters demonstrate psychometric properties defined as useful within a measurement framework. Fit statistics for raters are particularly relevant within frameworks based on invariant measurement, such as Rasch measurement theory and Mokken scale analysis. A simple approach to examining invariance is to examine assessment data for evidence of Guttman errors. I used real and simulated data to illustrate and explore a nonparametric procedure for evaluating rater errors based on Guttman errors and to examine the alignment between Guttman errors and other indices of rater fit. The results suggested that researchers and practitioners can use summaries of Guttman errors to identify raters who exhibit misfit. Furthermore, results from the comparisons between summaries of Guttman errors and parametric fit statistics suggested that both approaches detect similar problematic measurement characteristics. Specifically, raters who exhibit many Guttman errors tended to have higher-than-expected Outfit MSE statistics and lower-than-expected estimated slope statistics. I discuss implications of these results as they relate to research and practice for rater-mediated assessments.

2008 ◽  
Vol 39 (2) ◽  
pp. 277-285 ◽  
Author(s):  
E. Ferguson

BackgroundA long-standing issue in the health anxiety literature is the extent to which health anxiety is a dimensional or a categorical construct. This study explores this question directly using taxometric procedures.MethodSeven hundred and eleven working adults completed an index of health anxiety [the Whiteley Index (WI)] and indicated their current health status. Data from those who were currently healthy (n=501) and receiving no medical treatment were examined using three taxometric procedures: mean above minus below a cut (MAMBAC), maximum eigenvalue (MAXEIGEN) and L-mode factor analysis (L-MODE).ResultsGraphical representations (comparing actual to simulated data) and fit indices indicate that health anxiety is more accurately represented as a dimensional rather than a categorical construct.ConclusionsHealth anxiety is better represented as a dimensional construct. Implications for theory development and clinical practice are examined.


2021 ◽  
Author(s):  
Conrad J. Harrison ◽  
Bao Sheng Loe ◽  
Inge Apon ◽  
Chris J. Sidey-Gibbons ◽  
Marc C. Swan ◽  
...  

BACKGROUND There are two philosophical approaches to contemporary psychometrics: Rasch measurement theory (RMT) and item response theory (IRT). Either measurement strategy can be applied to computerized adaptive testing (CAT). There are potential benefits of IRT over RMT with regards to measurement precision, but also potential risks to measurement generalizability. RMT CAT assessments have demonstrated good performance with the CLEFT-Q, a patient-reported outcome measure for use in orofacial clefting. OBJECTIVE To test whether the post-hoc application of IRT (graded response models, GRMs, and multidimensional GRMs) to RMT-validated CLEFT-Q appearance scales could improve CAT accuracy at given assessment lengths. METHODS Partial credit Rasch models, unidimensional GRMs and a multidimensional GRM were calibrated for each of the 7 CLEFT-Q appearance scales (which measure the appearance of the: face, jaw, teeth, nose, nostrils, cleft lip scar and lips) using data from the CLEFT-Q field test. A second, simulated dataset was generated with 1000 plausible response sets to each scale. Rasch and GRM scores were calculated for each simulated response set, scaled to 0-100 scores, and compared by Pearson’s correlation coefficient, root mean square error (RMSE), mean absolute error (MAE) and 95% limits of agreement. For the face, teeth and jaw scales, we repeated this in a an independent, real patient dataset. We then used the simulated data to compare the performance of a range of fixed-length CAT assessments that were generated with partial credit Rasch models, unidimensional GRMs and the multidimensional GRM. Median standard error of measurement (SEM) was recorded for each assessment. CAT scores were scaled to 0-100 and compared to linear assessment Rasch scores with RMSE, MAE and 95% limits of agreement. This was repeated in the independent, real patient dataset with the RMT and unidimensional GRM CAT assessments for the face, teeth and jaw scales to test the generalizability of our simulated data analysis. RESULTS Linear assessment scores generated by Rasch models and unidimensional GRMs showed close agreement, with RMSE ranging from 2.2 to 6.1, and MAE ranging from 1.5 to 4.9 in the simulated dataset. These findings were closely reproduced in the real patient dataset. Unidimensional GRM CAT algorithms achieved lower median SEM than Rasch counterparts, but reproduced linear assessment scores with very similar accuracy (RMSE, MAE and 95% limits of agreement). The multidimensional GRM had poorer accuracy than the unidimensional models at comparable assessment lengths. CONCLUSIONS Partial credit Rasch models and GRMs produce very similar CAT scores. GRM CAT assessments achieve a lower SEM, but this does not translate into better accuracy. Commonly used SEM heuristics for target measurement reliability should not be generalized across CAT assessments built with different psychometric models. In this study, a relatively parsimonious multidimensional GRM CAT algorithm performed more poorly than unidimensional GRM comparators.


2019 ◽  
Vol 36 (4) ◽  
pp. 595-616 ◽  
Author(s):  
Stefanie A. Wind

Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups. Nonparametric procedures for exploring these differences are promising because they allow researchers and practitioners to examine important characteristics of ratings without potentially inappropriate parametric transformations or assumptions. This study illustrates a nonparametric method based on Mokken scale analysis (MSA) that researchers and practitioners can use to identify and explore differences in the quality of rater judgments between subgroups of test-takers. Overall, the results suggest that MSA provides insight into differences in rating quality across test-taker subgroups based on demographic characteristics. Differences in the degree to which raters adhere to basic measurement properties suggest that the interpretation of ratings may vary across subgroups. The implications of this study for research and practice are discussed.


2015 ◽  
Vol 13 ◽  
pp. 23-28
Author(s):  
Luca Dana Motoc ◽  
Tibor Bedő

The paper aims to provide an insight into the thermo-physical changes of in-situ tailored hybrid polymer based composite materials based on two different synthetic reinforcements with the aim of sizing both architecture stacking sequences and reinforcement type’s influence on their effective linear coefficient of thermal dilatation (CTE). The samples were subjected to a step ramp temperature increase up to 250 °C, followed by their CTE variation monitoring and assessment. Data processing and comparison further contribute to expand knowledge and appropriate personal database aiming material design, manufacturing error minimization and cost reduction.


2021 ◽  
Author(s):  
Daniel Lüdecke ◽  
Mattan S. Ben-Shachar ◽  
Indrajeet Patil ◽  
Philip Waggoner ◽  
Dominique Makowski

A crucial part of statistical analysis is evaluating a model's quality and fit, or performance. During analysis, especially with regression models, investigating the fit of models to data also often involves selecting the best fitting model amongst many competing models. Upon investigation, fit indices should also be reported both visually and numerically to bring readers in on the investigative effort. While functions to build and produce diagnostic plots or to compute fit statistics exist, these are located across many packages, which results in a lack of a unique and consistent approach to assess the performance of many types of models. The result is a difficult-to-navigate, unorganized ecosystem of individual packages with different syntax, making it onerous for researchers to locate and use fit indices relevant for their unique purposes. The performance package in R fills this gap by offering researchers a suite of intuitive functions with consistent syntax for computing, building, and presenting regression model fit statistics and visualizations.


2012 ◽  
Vol 1 (33) ◽  
pp. 21
Author(s):  
Franck Mazas ◽  
Luc Hamm ◽  
Philippe Garat

Within the framework of the GPD-Poisson model for determining extreme values of environmental variables, we examine the sensitivity of this methodology to the lowest and to the largest data of the sample. We show the need for a clear distinction between the threshold selecting the data to be fitted with a location parameter setting the origin of the distribution and that the introduction of this latter parameter yields stable results, consistent with the physics. We also show that the likelihood maximum of the classical model is reached at an open upper bound of the parameter space with non-null derivatives: hence, the asymptotic properties of MLE are not proven and one should be quite cautious when using it. Applications are presented with real and simulated data.


Author(s):  
Matteo Bottai ◽  
Nicola Orsini

In this article, we introduce the qmodel command, which fits parametric models for the conditional quantile function of an outcome variable given covariates. Ordinary quantile regression, implemented in the qreg command, is a popular, simple type of parametric quantile model. It is widely used but known to yield erratic estimates that often lead to uncertain inferences. Parametric quantile models overcome these limitations and extend modeling of conditional quantile functions beyond ordinary quantile regression. These models are flexible and efficient. qmodel can estimate virtually any possible linear or nonlinear parametric model because it allows the user to specify any combination of qmodel-specific built-in functions, standard mathematical and statistical functions, and substitutable expressions. We illustrate the potential of parametric quantile models and the use of the qmodel command and its postestimation commands through realand simulated-data examples that commonly arise in epidemiological and pharmacological research. In addition, this article may give insight into the close connection that exists between quantile functions and the true mathematical laws that generate data.


2017 ◽  
Vol 78 (5) ◽  
pp. 887-904 ◽  
Author(s):  
Stefanie A. Wind ◽  
Randall E. Schumacker

The interpretation of ratings from educational performance assessments assumes that rating scale categories are ordered as expected (i.e., higher ratings correspond to higher levels of judged student achievement). However, this assumption must be verified empirically using measurement models that do not impose ordering constraints on the rating scale category thresholds, such as item response theory models based on adjacent-categories probabilities. This study considers the application of an adjacent-categories formulation of polytomous Mokken scale analysis (ac-MSA) models as a method for evaluating the degree to which rating scale categories are ordered as expected for individual raters in performance assessments. Using simulated data, this study builds on the preliminary application of ac-MSA models to rater-mediated performance assessments, in which a real data analysis suggested that these models can be used to identify disordered rating scale categories. The results suggested that ac-MSA models are sensitive to disordered categories within individual raters. Implications are discussed as they relate to research, theory, and practice for rater-mediated educational performance assessments.


2018 ◽  
Vol 44 (2) ◽  
pp. 104-119 ◽  
Author(s):  
Linda A. Reddy ◽  
Todd Glover ◽  
Alexander Kurz ◽  
Stephen N. Elliott

The conceptual foundation and initial psychometric evidence are provided for the Instructional Coaching Rating Scales and Interaction Style Scales–Teacher Forms. These forms are part of a multicomponent online assessment system designed to evaluate the effectiveness of coaching skills and interactions that support the needs of teachers and students. Specifically, the article presents the theory, evidence, and measurement framework for the system. Findings indicate that the Rating Scales and Interaction Style Scales–Teacher Forms have very good internal structures based on multiple fit statistics for confirmatory factor analyses, high internal consistency, good item-to-scale total correlations, and freedom from item bias. Collectively, this promising statistical evidence is supportive of valid score inferences. Study limitations and directions for research are discussed.


2019 ◽  
Vol 44 (2) ◽  
pp. 166-174
Author(s):  
Ying Jin

This research examines the performance of the previously proposed cutoff values of alternative fit indices (i.e., change in comparative fit index [[Formula: see text]], change in Tucker–Lewis index [[Formula: see text]], and change in root mean squared error of approximation [[Formula: see text]]) to evaluate measurement invariance for exploratory structural equation modeling (ESEM) models with simulated data. It is important to revisit these cutoff values because they were used widely in validity studies utilizing ESEM models to evaluate measurement invariance for ordinal indicators, but in fact, these cutoff values were developed under confirmatory factor analysis models with continuous indicators. Results of this study show that different cutoff values of [Formula: see text], [Formula: see text], and [Formula: see text] should be used for ESEM models with ordinal indicators. Evaluation of partial invariance for ESEM models is also discussed.


Sign in / Sign up

Export Citation Format

Share Document