A Simplified Probability Model of Error of Measurement

1969 ◽  
Vol 25 (1) ◽  
pp. 175-186 ◽  
Author(s):  
Donald W. Zimmerman

A model of variability in measurement which does not employ the concepts of “true score” and “error score” is presented. Reference to an observed score random variable, X, together with the usual axioms of probability, is shown to be a satisfactory basis for derivation of results of the classical test theory which relate observable quantities. In addition, reliability formulas such as the KR 20 and KR 21 are obtained by construction of the observed score random variable over a sample space of outcomes of a testing procedure and assignment of probabilities to outcomes. The approach is consistent with trends in psychological theory toward objectively defined constructs and avoids redundancy in derivations, as well as connotations which arise from reference to “true values” and “errors.” The present model is shown to be consistent with a relativistic, as opposed to an absolutistic, conception of measurement.

1971 ◽  
Vol 28 (1) ◽  
pp. 291-301 ◽  
Author(s):  
Donald W. Zimmerman

A model of variability in measurement, which is sufficiently general for a variety of applications and which includes the main content of traditional theories of error of measurement and psychological tests, can be derived from the axioms of probability, without introducing “true values” and “errors.” Beginning with probability spaces (Ω, P1) and (φ, P2), the set Ω representing the outcomes of a measurement procedure and the set * representing individuals or experimental objects, it is possible to construct suitable product probability spaces and collections of random variables which can yield all results needed to describe random variability and reliability. This paper attempts to fill gaps in the mathematical derivations in many classical theories and at the same time to overcome limitations in the language of “true values” and “errors” by presenting explicitly the essential constructions required for a general probability model.


2020 ◽  
Author(s):  
Kazuhiro Yamaguchi ◽  
Jonathan Templin

Quantifying the reliability of latent variable estimates in diagnostic classification models has been a difficult topic, complicated by the classification-based nature of these models. In this study, we derive observed score reliability indices based on diagnostic classification models as an extension of classical test theory-based reliability. Additionally, we derive conditional observed sum- and sub-score distributions. In this manner, various conditional expectations and conditional standard error of measurement estimates can be calculated for both total- and sub-scores of a test. The proposed methods provide a variety of expectations and standard errors for attribute estimates, which we demonstrate in an analysis of an empirical test.


2000 ◽  
Vol 177 (S39) ◽  
pp. s15-s20 ◽  
Author(s):  
Aart H. Schene ◽  
Maarten Koeter ◽  
Bob van Wijngaarden ◽  
Helle Charlotte Knudsen ◽  
Morven Leese ◽  
...  

BackgroundThe European Psychiatric Services: Inputs Linked to Outcome Domains and Needs (EPSILON) Study aims to produce standardised versions in five European languages of instruments measuring needs for care, family or caregiving burden, satisfaction with services, quality of life, and sociodemographic and service receipt.AimsTo describe background, rationale and design of the reliability study, focusing on reliable instruments, reliability testing theory, a general reliability testing procedure and sample size requirements.MethodA strict protocol was developed, consisting of definitions of the specific reliability measures used, the statistical methods used to assess these reliability coefficients, the development of statistical programmes to make inter-centre reliability comparisons, criteria for good reliability, and a general format for the reliability analysis.ConclusionThe reliability analyses are based on classical test theory. Reliability measures used are Cronbach's α, Cohen's κ and the intraclass correlation coefficient. Intersite comparisons were extended with a comparison of the standard error of measurement. Criteria for good reliability may need to be adapted for this type of study. The consequences of low reliability, and reliability differing between sites, must be considered before pooling data.


TESTFÓRUM ◽  
2015 ◽  
Vol 4 (6) ◽  
pp. 67-84
Author(s):  
Hynek Cígler ◽  
Martin Šmíra

Práce s chybou měření patří k základním dovednostem při interpretaci výsledků psychologických výsledků. Bohužel, řada českých psychologických metod však neobsahuje veškeré informace o chybě měření, například intervaly spolehlivosti či odhad standardní chyby měření pro různá použití. I v případě, že tyto informace jsou dostupné, je často nutné zvážit i další okolnosti a způsob výpočtu přizpůsobit – ne vždy je přitom možné se spolehnout na informace poskytnuté distributorem testu. Ani v současné počítačové době navíc nejsou jednoduše dostupné příslušné aplikace a řadu základních výpočtů by si tak psycholog v ideálním případě měl umět provést sám. Článek v krátkosti shrne běžné postupy při interpretaci chyby měření s využitím intervalů spolehlivosti v rámci klasické testové teorie, a to včetně podrobných příkladů, aby text mohl sloužit jako návod pro psychology z praxe. Cígler, H., & Šmíra, M.: Error of measurement and the estimation of true score: Selected methods of Classical test theoryOne of the elementary skills involved in the interpretation of the psychological results is handling the error of measurement. Unfortunately, many Czech psychological tests do not include all the necessary information about the error of measurement (e.g. confidence intervals and standard errors of measurement for different purposes). Even if such information is available, we might need to consider other circumstances of the assessment, and adjust the method of estimation and its application properly – it is not always possible to rely on the test developer in such cases. Since there are not many applications for such computations easily available for the test users, they should be capable of doing many of the elementary computations by hand. This paper briefly summarizes common techniques for the interpretation of the error of measurement using confidence intervals in the framework of Classical Test Theory. The theory is supported by detailed examples that should be helpful for applying these procedures in practice.


2014 ◽  
Vol 35 (4) ◽  
pp. 201-211 ◽  
Author(s):  
André Beauducel ◽  
Anja Leue

It is shown that a minimal assumption should be added to the assumptions of Classical Test Theory (CTT) in order to have positive inter-item correlations, which are regarded as a basis for the aggregation of items. Moreover, it is shown that the assumption of zero correlations between the error score estimates is substantially violated in the population of individuals when the number of items is small. Instead, a negative correlation between error score estimates occurs. The reason for the negative correlation is that the error score estimates for different items of a scale are based on insufficient true score estimates when the number of items is small. A test of the assumption of uncorrelated error score estimates by means of structural equation modeling (SEM) is proposed that takes this effect into account. The SEM-based procedure is demonstrated by means of empirical examples based on the Edinburgh Handedness Inventory and the Eysenck Personality Questionnaire-Revised.


2019 ◽  
Vol 35 (1) ◽  
pp. 55-62 ◽  
Author(s):  
Noboru Iwata ◽  
Akizumi Tsutsumi ◽  
Takafumi Wakita ◽  
Ryuichi Kumagai ◽  
Hiroyuki Noguchi ◽  
...  

Abstract. To investigate the effect of response alternatives/scoring procedures on the measurement properties of the Center for Epidemiologic Studies Depression Scale (CES-D) which has the four response alternatives, a polytomous item response theory (IRT) model was applied to the responses of 2,061 workers and university students (1,640 males, 421 females). Test information functions derived from the polytomous IRT analyses on the CES-D data with various scoring procedures indicated that: (1) the CES-D with its standard (0-1-2-3) scoring procedure should be useful for screening to detect subjects with “at high-risk” of depression if the θ point showing the highest information corresponds to the cut-off point, because of its extremely higher information; (2) the CES-D with the 0-1-1-2 scoring procedure could cover wider range of depressive severity, suggesting that this scoring procedure might be useful in cases where more exhaustive discrimination in symptomatology is of interest; and (3) the revised version of CES-D with replacing original positive items into negatively revised items outperformed the original version. These findings have never been demonstrated by the classical test theory analyses, and thus the utility of this kind of psychometric testing should be warranted to further investigation for the standard measures of psychological assessment.


2009 ◽  
Vol 31 (1) ◽  
pp. 81
Author(s):  
Takeaki Kumazawa

Classical test theory (CTT) has been widely used to estimate the reliability of measurements. Generalizability theory (G theory), an extension of CTT, is a powerful statistical procedure, particularly useful for performance testing, because it enables estimating the percentages of persons variance and multiple sources of error variance. This study focuses on a generalizability study (G study) conducted to investigate such variance components for a paper-pencil multiple-choice vocabulary test used as a diagnostic pretest. Further, a decision study (D study) was conducted to compute the generalizability coefficient (G coefficient) for absolute decisions. The results of the G and D studies indicated that 46% of the total variance was due to the items effect; further, the G coefficient for absolute decisions was low. 古典的テスト理論は尺度の信頼性を測定するため広く用いられている。古典的テスト理論の応用である一般化可能性理論(G理論)は特にパフォーマンステストにおいて有効な分析手法であり、受験者と誤差の要因となる分散成分の割合を測定することができる。本研究では診断テストとして用いられた多岐選択式語彙テストの分散成分を測定するため一般化可能性研究(G研究)を行った。さらに、決定研究(D研究)では絶対評価に用いる一般化可能性係数を算出した。G研究とD研究の結果、項目の分散成分が全体の分散の46%を占め、また信頼度指数は高くなかった。


Sign in / Sign up

Export Citation Format

Share Document