Revision of a Criterion-Referenced Vocabulary Test Using Generalizability Theory

2009 ◽  
Vol 31 (1) ◽  
pp. 81
Author(s):  
Takeaki Kumazawa

Classical test theory (CTT) has been widely used to estimate the reliability of measurements. Generalizability theory (G theory), an extension of CTT, is a powerful statistical procedure, particularly useful for performance testing, because it enables estimating the percentages of persons variance and multiple sources of error variance. This study focuses on a generalizability study (G study) conducted to investigate such variance components for a paper-pencil multiple-choice vocabulary test used as a diagnostic pretest. Further, a decision study (D study) was conducted to compute the generalizability coefficient (G coefficient) for absolute decisions. The results of the G and D studies indicated that 46% of the total variance was due to the items effect; further, the G coefficient for absolute decisions was low. 古典的テスト理論は尺度の信頼性を測定するため広く用いられている。古典的テスト理論の応用である一般化可能性理論(G理論)は特にパフォーマンステストにおいて有効な分析手法であり、受験者と誤差の要因となる分散成分の割合を測定することができる。本研究では診断テストとして用いられた多岐選択式語彙テストの分散成分を測定するため一般化可能性研究(G研究)を行った。さらに、決定研究(D研究)では絶対評価に用いる一般化可能性係数を算出した。G研究とD研究の結果、項目の分散成分が全体の分散の46%を占め、また信頼度指数は高くなかった。

2013 ◽  
Vol 93 (4) ◽  
pp. 562-569 ◽  
Author(s):  
Richard A. Preuss

Clinical assessment protocols must produce data that are reliable, with a clinically attainable minimal detectable change (MDC). In a reliability study, generalizability theory has 2 advantages over classical test theory. These advantages provide information that allows assessment protocols to be adjusted to match individual patient profiles. First, generalizability theory allows the user to simultaneously consider multiple sources of measurement error variance (facets). Second, it allows the user to generalize the findings of the main study across the different study facets and to recalculate the reliability and MDC based on different combinations of facet conditions. In doing so, clinical assessment protocols can be chosen based on minimizing the number of measures that must be taken to achieve a realistic MDC, using repeated measures to minimize the MDC, or simply based on the combination that best allows the clinician to monitor an individual patient's progress over a specified period of time.


Author(s):  
Hannah Bijlsma ◽  
Rikkert van der Lans ◽  
Tim Mainhard ◽  
Perry den Brok

AbstractThis chapter discusses student perceptions in terms of three psychometric perspectives that dominate contemporary research on teaching quality, namely, Classical Test Theory (CTT), Item Response Theory (IRT) and Generalizability Theory (GT). These perspectives function as being exemplars for the connection between psychometric theories and the different perspectives on “what a perception is” as well as on how and for what purposes student perceptions should be used. The main message of the chapter is that the choice of a psychometric theory is not merely a technical matter, but also has implications for how the nature of perceptions is conceptualized. After presenting and linking each psychometric theory, their strengths and weaknesses in the context of student perceptions of teaching quality and issues on practical implementations are discussed.


2020 ◽  
Author(s):  
Stephen Ross Martin ◽  
Philippe Rast

Reliability is a crucial concept in psychometrics. Although it is typically estimated as a single fixed quantity, previous work suggests that reliability can vary across persons, groups, and covariates. We propose a novel method for estimating and modeling case-specific reliability without repeated measurements or parallel tests. The proposed method employs a “Reliability Factor” that models the error variance of each case across multiple indicators, thereby producing case-specific reliability estimates. Additionally, we use Gaussian process modeling to a estimate non-linear, non-monotonic function between the latent factor itself and the reliability of the measure, providing an analogue to test information functions in item response theory. The reliability factor model is a new tool for examining latent regions with poor conditional reliability, and correlates thereof, in a classical test theory framework.


2018 ◽  
Author(s):  
Sam Parsons

The relationship between measurement reliability and statistical power is a complex one. Where reliability is defined by classical test theory as the proportion of 'true' variance to total variance (the sum of true score and error variance), power is only functionally related to total variance. Therefore, to explore direct relationships between reliability and power, one must hold either true-score variance or error variance constant while varying the other. Here, visualisations are used to illustrate the reliability-power relationship under conditions of fixed true-score variance and fixed error variance. From these visualisations, conceptual distinctions between fixing true-score or error variance can be raised. Namely, when true-score variance is fixed, low reliability (and low power) suggests a true effect may be hidden by error. Whereas, when error variance is fixed, high reliability (and low power) may simply suggest a very small effect. I raise several observations I hope will be useful in considering the utility of measurement reliability and it's relationship to effect sizes and statistical power.


2020 ◽  
Author(s):  
Pingguang lei ◽  
zheng yang ◽  
wei li ◽  
jingqing ou ◽  
yingli cun ◽  
...  

Abstract Background Quality of life (QOL) is now concerned worldwide in cancer clinical fields and the specific instrument FACT-Hep (Functional Assessment of Cancer Therapy- Hepatobiliary questionnaire) is widely used in English-spoken countries. However, the specific instruments for hepatocellular carcinoma patients in China were seldom and no formal validation on the Simplified Chinese Version of the FACT-Hep was carried out. This study was aimed to validate the Chinese FACT-Hep based on Combinations of Classical Test Theory and Generalizability Theory. Methods The Chinese Version of FACT-Hep and the QLICP-LI were used to measure QOL three times before and after treatments from a sample of 114 in-patients of hepatocellular carcinoma. The scale were evaluated by indicators such as validity and reliability coefficients Cronbach α, Pearson r, intra-class correlation (ICC), and standardized response mean. The Generalizability Theory (G theory) was also applied to addresses the dependability of measurements and estimation of multiple sources of variance. Results The Internal consistency Cronbach’s α coefficients were greater than 0.70 for all domains, and test-retest reliability coefficients for all domains and the overall were greater than 0.80 (exception of emotional Well-being 0.74) with the range from 0.81 to 0.96. G-coefficients and Ф-coefficients confirmed the reliability of the scale further with exact variance components. The domains of PWB, FWB and the overall scale had significant changes after treatments with SRM ranging from 0.40 to 0.69. Conclusions The Chinese version of FACT-Hep has good validity, reliability, and responsiveness, and can be used to measure QOL for patients with hepatocellular carcinoma in China.


Sign in / Sign up

Export Citation Format

Share Document