Bayesian Estimation of the True Score Multitrait–Multimethod Model With a Split-Ballot Design

Summary: In the United States' normative population for the WAIS-R, differences (Ds) between persons' verbal and performance IQs (VIQs and PIQs) tend to increase with an increase in full scale IQs (FSIQs). This suggests that norm-referenced interpretations of Ds should take FSIQs into account. Two new graphs are presented to facilitate this type of interpretation. One of these graphs estimates the mean of absolute values of D (called typical D) at each FSIQ level of the US normative population. The other graph estimates the absolute value of D that is exceeded only 5% of the time (called abnormal D) at each FSIQ level of this population. A graph for the identification of conventional “statistically significant Ds” (also called “reliable Ds”) is also presented. A reliable D is defined in the context of classical true score theory as an absolute D that is unlikely (p < .05) to be exceeded by a person whose true VIQ and PIQ are equal. As conventionally defined reliable Ds do not depend on the FSIQ. The graphs of typical and abnormal Ds are based on quadratic models of the relation of sizes of Ds to FSIQs. These models are generalizations of models described in Hsu (1996) . The new graphical method of identifying Abnormal Ds is compared to the conventional Payne-Jones method of identifying these Ds. Implications of the three juxtaposed graphs for the interpretation of VIQ-PIQ differences are discussed.

Download Full-text

Testing the Assumption of Uncorrelated Errors for Short Scales by Means of Structural Equation Modeling

Journal of Individual Differences ◽

10.1027/1614-0001/a000135 ◽

2014 ◽

Vol 35 (4) ◽

pp. 201-211 ◽

Cited By ~ 3

Author(s):

André Beauducel ◽

Anja Leue

Keyword(s):

Structural Equation Modeling ◽

Negative Correlation ◽

Structural Equation ◽

Error Score ◽

Classical Test Theory ◽

Test Theory ◽

Equation Modeling ◽

Eysenck Personality Questionnaire ◽

Personality Questionnaire ◽

True Score

It is shown that a minimal assumption should be added to the assumptions of Classical Test Theory (CTT) in order to have positive inter-item correlations, which are regarded as a basis for the aggregation of items. Moreover, it is shown that the assumption of zero correlations between the error score estimates is substantially violated in the population of individuals when the number of items is small. Instead, a negative correlation between error score estimates occurs. The reason for the negative correlation is that the error score estimates for different items of a scale are based on insufficient true score estimates when the number of items is small. A test of the assumption of uncorrelated error score estimates by means of structural equation modeling (SEM) is proposed that takes this effect into account. The SEM-based procedure is demonstrated by means of empirical examples based on the Edinburgh Handedness Inventory and the Eysenck Personality Questionnaire-Revised.

Download Full-text

A Modification of the Payne-Jones Method of Identifying Abnormal Differences in WISC-R Performance and Verbal IQ's

European Journal of Psychological Assessment ◽

10.1027/1015-5759.12.1.27 ◽

1996 ◽

Vol 12 (1) ◽

pp. 27-32 ◽

Cited By ~ 1

Author(s):

Louis M. Hsu

Keyword(s):

Full Scale ◽

True Score ◽

Verbal Iq ◽

Score Difference ◽

The Difference ◽

And Performance

The difference (D) between a person's Verbal IQ (VIQ) and Performance IQ (PIQ) has for some time been considered clinically meaningful ( Kaufman, 1976 , 1979 ; Matarazzo, 1990 , 1991 ; Matarazzo & Herman, 1985 ; Sattler, 1982 ; Wechsler, 1984 ). Particularly useful is information about the degree to which a difference (D) between scores is “abnormal” (i.e., deviant in a standardization group) as opposed to simply “reliable” (i.e., indicative of a true score difference) ( Mittenberg, Thompson, & Schwartz, 1991 ; Silverstein, 1981 ; Payne & Jones, 1957 ). Payne and Jones (1957) proposed a formula to identify “abnormal” differences, which has been used extensively in the literature, and which has generally yielded good approximations to empirically determined “abnormal” differences ( Silverstein, 1985 ; Matarazzo & Herman, 1985 ). However applications of this formula have not taken into account the dependence (demonstrated by Kaufman, 1976 , 1979 , and Matarazzo & Herman, 1985 ) of Ds on Full Scale IQs (FSIQs). This has led to overestimation of “abnormality” of Ds of high FSIQ children, and underestimation of “abnormality” of Ds of low FSIQ children. This article presents a formula for identification of abnormal WISC-R Ds, which overcomes these problems, by explicitly taking into account the dependence of Ds on FSIQs.

Download Full-text

Positive First or Negative First?

Methodology ◽

10.1027/1614-2241/a000013 ◽

2010 ◽

Vol 6 (3) ◽

pp. 118-127 ◽

Cited By ~ 17

Author(s):

Dagmar Krebs ◽

Juergen H.P. Hoffmeyer-Zlotnik

Keyword(s):

Job Characteristics ◽

Two Dimensions ◽

Response Behavior ◽

Point Scale ◽

Response Scale ◽

Job Motivation ◽

Ballot Design ◽

Subjective Importance ◽

Scale Format ◽

Scale Types

To examine whether starting a response scale with the positive or the negative categories affects response behavior, a split-ballot design using reverse forms of an 8-point scale assessing the subjective importance of job characteristics was used. Response behavior varied according to the scale format employed. Responses were more positive on the scale starting with the category “very important” (split 2). By contrast, the scale starting with the category “not at all important” (split 1) did not elicit more negative responses, but rather less positive ones. However, differences in response behavior did not systematically reflect the direction of the respective scales. Starting with the differences between the two split versions, the factorial structure of indicators assessing two dimensions of job motivation was tested for each scale type separately and then for both scale types simultaneously. Finally, models placing increasingly severe equality constraints on both scale types were tested. The paper concludes with a discussion of the results and desiderata for further research.

Download Full-text