Representation of Competencies in Multidimensional IRT Models with Within-Item and Between-Item Multidimensionality

2008 ◽  
Vol 216 (2) ◽  
pp. 89-101 ◽  
Author(s):  
Johannes Hartig ◽  
Jana Höhler

Multidimensional item response theory (MIRT) holds considerable promise for the development of psychometric models of competence. It provides an ideal foundation for modeling performance in complex domains, simultaneously taking into account multiple basic abilities. The aim of this paper is to illustrate the relations between a two-dimensional IRT model with between-item multidimensionality and a nested-factor model with within-item multidimensionality, and the different substantive meanings of the ability dimensions in the two models. Both models are applied to empirical data from a large-scale assessment of reading and listening comprehension in a foreign language. In the between-item model, performance in the reading and listening items is modeled by two separate dimensions. In the within-item model, one dimension represents the abilities common to both tests, and a second dimension represents abilities specific to listening comprehension. Distinct relations of external variables, such as gender and cognitive abilities, with ability scores demonstrate that the alternative models have substantively different implications.

2018 ◽  
Vol 34 (6) ◽  
pp. 376-385 ◽  
Author(s):  
Marlit A. Lindner ◽  
Jan M. Ihme ◽  
Steffani Saß ◽  
Olaf Köller

Abstract. Pictures are often used in standardized educational large-scale assessment (LSA), but their impact on test parameters has received little attention up until now. Even less is known about pictures’ affective effects on students in testing (i.e., test-taking pleasure and motivation). However, such knowledge is crucial for a focused application of multiple representations in LSA. Therefore, this study investigated how adding representational pictures (RPs) to text-based item stems affects (1) item difficulty and (2) students’ test-taking pleasure. An experimental study with N = 305 schoolchildren was conducted, using 48 manipulated parallel science items (text-only vs. text-picture) in a rotated multimatrix design to realize within-subject measures. Students’ general cognitive abilities, reading abilities, and background variables were assessed to consider potential interactions between RPs’ effects and students’ performance. Students also rated their item-solving pleasure for each item. Results from item-response theory (IRT) model comparisons showed that RPs only reduced item difficulty when pictures visualized information mandatory for solving the task, while RPs substantially enhanced students’ test-taking pleasure even when they visualized optional context information. Overall, our findings suggest that RPs have a positive cognitive and affective influence on students’ performance in LSA (i.e., multimedia effect in testing) and should be considered more frequently.


2020 ◽  
pp. 073428292097138
Author(s):  
Chao Xu ◽  
Candace Schau

Numerous studies have been conducted using the Survey of Attitudes Toward Statistics-36 (SATS-36). Recently, large-scale assessment studies have begun to examine the extent to which students vary in their statistics attitudes across instructors. Yet, empirical evidence linking student responses to the SATS items to instructor-level constructs is still lacking. Using multilevel confirmatory factor analysis, we investigated the factor structure underlying the measure of students’ statistics attitudes at both the student and instructor levels. Results from 13,507 college students taught by 160 introductory statistics instructors support a correlated six-factor model at each level. Additionally, there is evidence for the structural validity of a shared teacher–student attitude impacts construct that may capture meaningful patterns of teaching characteristics and competencies tied to student development of statistics attitudes. These findings provide empirical support for the use of the SATS-36 in studying contextual variables in relation to statistics instructors. Implications for educational practice are discussed.


Author(s):  
Leonidas A. Zampetakis

Abstract. Job crafting is a multidimensional construct that can be conceptualized both at the general level and at the daily level. Several researchers have used aggregated scores across the dimensions of job crafting, to represent an overall job crafting construct. The purpose of the research presented herein is to investigate the factor structure of the general and daily versions of the job crafting scale developed by Petrou et al., (2012) (PJCS), using parametric multidimensional Item Response Theory (IRT) models. A sample of 675 employees working on different occupational sectors completed the Greek version of the scales. Results are in line with theoretical underpinnings and suggest that, although a bifactor IRT model offers an adequate fit, a correlated factors IRT model is more appropriate for both versions of the PJCS. Results caution against using aggregated scores across the dimensions of PJCS for both the general and daily versions.


2005 ◽  
Author(s):  
◽  
Yanyan Sheng

As item response theory models gain increased popularity in large scale educational and measurement testing situations, many studies have been conducted on the development and applications of unidimensional and multidimensional models. However, to date, no study has yet looked at models in the IRT framework with an overall ability dimension underlying all test items and several ability dimensions specific for each subtest. This study is to propose such a model and compare it with the conventional IRT models using Bayesian methodology. The results suggest that the proposed model offers a better way to represent the test situations not realized in existing models. The model specifications for the proposed model also give rise to implications for test developers on test designing. In addition, the proposed IRT model can be applied in other areas, such as intelligence or psychology, among others.


2021 ◽  
pp. 001316442110453
Author(s):  
Gabriel Nagy ◽  
Esther Ulitzsch

Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response times have been suggested, and item response theory (IRT) models for response engagement have been proposed. We outline that response time-based procedures for classifying response engagement and IRT models for response engagement are based on common ideas, and we propose the distinction between independent and dependent latent class IRT models. In all IRT models considered, response engagement is represented by an item-level latent class variable, but the models assume that response times either reflect or predict engagement. We summarize existing IRT models that belong to each group and extend them to increase their flexibility. Furthermore, we propose a flexible multilevel mixture IRT framework in which all IRT models can be estimated by means of marginal maximum likelihood. The framework is based on the widespread Mplus software, thereby making the procedure accessible to a broad audience. The procedures are illustrated on the basis of publicly available large-scale data. Our results show that the different IRT models for response engagement provided slightly different adjustments of item parameters of individuals’ proficiency estimates relative to a conventional IRT model.


2018 ◽  
Author(s):  
Julia M. Haaf ◽  
Edgar C. Merkle ◽  
Jeffrey N. Rouder

Invariant item ordering refers to the statement that if one item is harder than another for one person, then it is harder for all people. Whether item ordering holds is a psychological statement because it describes how people may qualitatively vary. Yet, modern item response theory (IRT) makes an a priori commitment to item ordering. The Rasch model, for example, posits that items must order. Conversely, the 2PL model posits that items never order. Needed is an IRT model where item ordering or its violation is a function of the data rather than an *a priori* commitment. We develop two-parameter shift-scale models for this purpose, and find that the two-parameter uniform offers many advantages. We show how item ordering may be assessed using Bayes factor model comparison, and discuss computational issues with shift-scale IRT models.


2014 ◽  
Vol 35 (4) ◽  
pp. 190-200 ◽  
Author(s):  
Stefan Schipolowski ◽  
Ulrich Schroeders ◽  
Oliver Wilhelm

Especially in survey research and large-scale assessment there is a growing interest in short scales for the cost-efficient measurement of psychological constructs. However, only relatively few standardized short forms are available for the measurement of cognitive abilities. In this article we point out pitfalls and challenges typically encountered in the construction of cognitive short forms. First we discuss item selection strategies, the analysis of binary response data, the problem of floor and ceiling effects, and issues related to measurement precision and validity. We subsequently illustrate these challenges and how to deal with them based on an empirical example, the development of short forms for the measurement of crystallized intelligence. Scale shortening had only small effects on associations with covariates. Even for an ultra-short six-item scale, a unidimensional measurement model showed excellent fit and yielded acceptable reliability. However, measurement precision on the individual level was very low and the short forms were more likely to produce skewed score distributions in ability-restricted subpopulations. We conclude that short scales may serve as proxies for cognitive abilities in typical research settings, but their use for decisions on the individual level should be discouraged in most cases.


2018 ◽  
Vol 43 (7) ◽  
pp. 562-576 ◽  
Author(s):  
Yue Liu ◽  
Zhen Li ◽  
Hongyun Liu

Recently, large-scale testing programs have an increasing interest in providing examinees with more accurate diagnostic information by reporting overall and domain scores simultaneously. However, there are few studies focusing on how to report and interpret reliable total scores and domain scores based on bi-factor models. In this study, the authors introduced six methods of reporting overall and domain scores as weighted composite scores of the general and specific factors in a bi-factor model, and compared their performance with Yao’s MIRT (multidimensional item response theory) method using both simulated and empirical data. In the simulation study, four factors were considered: test length, number of dimensions, correlation between dimensions, and sample size. Major findings are that Bifactor-M4 and Bifactor-M6, the methods utilizing discrimination parameters of the specific dimensions to compute the weights, provided the most accurate and reliable overall and domain scores in most conditions, especially when the test was long, the correlation between dimensions was high and the number of dimensions was large; additionally, Bifactor-M4 recovered the relationship of true ability parameters the best of all the proposed methods; On the contrary, Bifactor-M2, the method with equal weights, performed poor on the overall score estimation; Bifactor-M3 and Bifactor-M5, the methods where weights were computed using the discrimination parameters of all the dimensions, performed poor on the domain score estimation; Bifactor-M1, the original factor method, obtained the worst estimations.


2013 ◽  
Author(s):  
Laura S. Hamilton ◽  
Stephen P. Klein ◽  
William Lorie

Sign in / Sign up

Export Citation Format

Share Document