standard errors of measurement
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 2)

H-INDEX

12
(FIVE YEARS 0)

2020 ◽  
Vol 45 (1) ◽  
pp. 11-18
Author(s):  
Kevin J. Grimm ◽  
Kimberly Fine ◽  
Gabriela Stegmann

Modeling within-person change over time and between-person differences in change over time is a primary goal in prevention science. When modeling change in an observed score over time with multilevel or structural equation modeling approaches, each observed score counts toward the estimation of model parameters equally. However, observed scores can differ in terms of their precision—both within and across participants. We propose an approach to weight observed scores by their level of precision, which is estimated as the inverse of their standard error of measurement in the context of item response modeling. Thus, scores with lower standard errors of measurement have greater weight, and scores with higher standard errors of measurement are down weighted. We discuss the weighting approaches and illustrate how to apply this approach with commonly available software. We then compare this approach to modeling change without weighting based on standard errors of measurement.


Education ◽  
2019 ◽  
Author(s):  
M. David Miller

There are three foundations identified by professional standards for examining the psychometric quality of assessments: validity, reliability, and fairness. Thus, reliability is a primary concern for all assessments. Reliability is defined as the consistency of scores across replications. In education, the sources of measurement error and the basis for replications include items, forms, raters, or occasions. The source of the measurement error will determine the type of reliability and ultimately the generalizations about the measurement. Thus, inconsistency in scores is potentially due to multiple sources of random error, and this definition can be applied to multiple types of replications depending on the generalization that is to be made (e.g., items, forms, raters, or occasions). There are also multiple indices for reporting reliability, including reliability coefficients, generalizability coefficients, standard errors of measurement, and information functions, to name a few. The indices are defined differently with different test theories. For example, classical test theory emphasizes reliability coefficients and standard errors of measurement; item response theory emphasizes information functions; generalizability theory emphasizes generalizability coefficients, dependability indices, and relative and absolute standard errors; and classification consistency emphasizes proportion agreement unadjusted or adjusted for chance agreement. The importance of reliability varies depending on the uses made of the assessment. Reliability is considered to be increasingly important when the consequences of test use are more high stakes. Thus, reliability is expected to be more rigorously adhered to when tests are used to make high-stakes decisions about individuals, such as employment or certification decisions and decisions about clinical placement. While validity, or the interpretations and uses of test scores, is considered the most important characteristic of a test, reliability provides a strong foundation for validity, providing a necessary condition for most test uses or interpretations. When scores are not consistent within a testing procedure, the scores are considered to be influenced instead by random errors of measurement. Thus, the scores will not have strong relationships to other variables, will not have strong internal structure, and will not accurately reflect score uses and interpretations that are necessary for validity. Consequently, reliability is often considered necessary to the valid use and interpretations of scores. On the other hand, the test could have high reliability and still not be valid for a particular use or interpretation, since validity would be dependent on measuring consistently and measuring the right construct.


2008 ◽  
Vol 14 (6) ◽  
pp. 1069-1073 ◽  
Author(s):  
JOHN R. CRAWFORD ◽  
DAVID SUTHERLAND ◽  
PAUL H. GARTHWAITE

AbstractA formula for the reliability of difference scores was used to estimate the reliability of Delis-Kaplan Executive Function System (D-KEFS; Delis et al., 2001) contrast measures from the reliabilities and correlations of their components. In turn these reliabilities were used to calculate standard errors of measurement. The majority of contrast measures had low reliabilities: of the 51 reliability coefficients calculated in the present study, none exceeded 0.7 and hence all failed to meet any of the criteria for acceptable reliability proposed by various experts in psychological measurement. The mean reliability of the contrast scores was 0.27, the median reliability was 0.30. The standard errors of measurement were large and, in many cases, equaled or were only marginally smaller than the contrast scores' standard deviations. The results suggest that, at present, D-KEFS contrast measures should not be used in neuropsychological decision making. (JINS, 2008, 14, 1069–1073.)


Sign in / Sign up

Export Citation Format

Share Document