Testing and Test Theory Development

Author(s):  
Xiaogeng Sun

This chapter highlights the importance of testing and refining the behavior theory in individual-based models (IBMs). Establishing a model's credibility is not the only reason to test theory for behavior. Doing so also offers a new and productive approach to theoretical ecology: a way to develop a toolbox of across-level theory useful for modeling populations of adaptive individuals. One can refer to testing and refining behavior sub-models as theory development, and one can do it by following the classic inductive reasoning cycle of posing, testing, and falsifying alternative hypotheses. The chapter provides a brief introduction to the pattern-oriented theory development process and presents several examples.


2008 ◽  
Vol 103 (2) ◽  
pp. 545-565
Author(s):  
Gilbert Becker

This article addresses deficiencies in the most widely used estimators of reliability and draws attention to the reason that this issue is important. Accurate calibration of relationships between constructs is critical to theory development. Unless workers have accurate estimates of scale reliability, accurate estimates of those relationships will not be forthcoming because the classical disattenuation formula requires them. This article shows that classical test theory can easily accommodate the delineation of its error component E in test scores into two sources, inconsistency across content ( E1) and inconsistency across time ( E2). Viewed from this extended model, the alternate forms approach to reliability estimation is complete in that it gauges simultaneously both sources of error. Because that approach is rarely used today for that purpose, the integrity of estimation has been lost. In its place arose estimators of partial reliability—those for estimating generalizability over one medium or the other, but not both, thereby precluding the additivity of error components. Recent developments promise to restore the integrity of the alternate forms approach without the need for alternate forms and suggest an additive alternative to the current nonadditive coefficient of stability.


2014 ◽  
Vol 35 (4) ◽  
pp. 201-211 ◽  
Author(s):  
André Beauducel ◽  
Anja Leue

It is shown that a minimal assumption should be added to the assumptions of Classical Test Theory (CTT) in order to have positive inter-item correlations, which are regarded as a basis for the aggregation of items. Moreover, it is shown that the assumption of zero correlations between the error score estimates is substantially violated in the population of individuals when the number of items is small. Instead, a negative correlation between error score estimates occurs. The reason for the negative correlation is that the error score estimates for different items of a scale are based on insufficient true score estimates when the number of items is small. A test of the assumption of uncorrelated error score estimates by means of structural equation modeling (SEM) is proposed that takes this effect into account. The SEM-based procedure is demonstrated by means of empirical examples based on the Edinburgh Handedness Inventory and the Eysenck Personality Questionnaire-Revised.


2019 ◽  
Vol 35 (1) ◽  
pp. 55-62 ◽  
Author(s):  
Noboru Iwata ◽  
Akizumi Tsutsumi ◽  
Takafumi Wakita ◽  
Ryuichi Kumagai ◽  
Hiroyuki Noguchi ◽  
...  

Abstract. To investigate the effect of response alternatives/scoring procedures on the measurement properties of the Center for Epidemiologic Studies Depression Scale (CES-D) which has the four response alternatives, a polytomous item response theory (IRT) model was applied to the responses of 2,061 workers and university students (1,640 males, 421 females). Test information functions derived from the polytomous IRT analyses on the CES-D data with various scoring procedures indicated that: (1) the CES-D with its standard (0-1-2-3) scoring procedure should be useful for screening to detect subjects with “at high-risk” of depression if the θ point showing the highest information corresponds to the cut-off point, because of its extremely higher information; (2) the CES-D with the 0-1-1-2 scoring procedure could cover wider range of depressive severity, suggesting that this scoring procedure might be useful in cases where more exhaustive discrimination in symptomatology is of interest; and (3) the revised version of CES-D with replacing original positive items into negatively revised items outperformed the original version. These findings have never been demonstrated by the classical test theory analyses, and thus the utility of this kind of psychometric testing should be warranted to further investigation for the standard measures of psychological assessment.


2011 ◽  
Vol 27 (3) ◽  
pp. 164-170 ◽  
Author(s):  
Anna Sundström

This study evaluated the psychometric properties of a self-report scale for assessing perceived driver competence, labeled the Self-Efficacy Scale for Driver Competence (SSDC), using item response theory analyses. Two samples of Swedish driving-license examinees (n = 795; n = 714) completed two versions of the SSDC that were parallel in content. Prior work, using classical test theory analyses, has provided support for the validity and reliability of scores from the SSDC. This study investigated the measurement precision, item hierarchy, and differential functioning for males and females of the items in the SSDC as well as how the rating scale functions. The results confirmed the previous findings; that the SSDC demonstrates sound psychometric properties. In addition, the findings showed that measurement precision could be increased by adding items that tap higher self-efficacy levels. Moreover, the rating scale can be improved by reducing the number of categories or by providing each category with a label.


1987 ◽  
Vol 32 (4) ◽  
pp. 336-338
Author(s):  
David Thissen
Keyword(s):  

1994 ◽  
Vol 39 (4) ◽  
pp. 425-426
Author(s):  
Elizabeth Bull Danielson
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document