scholarly journals Methodology of a multi-site reliability study

2000 ◽  
Vol 177 (S39) ◽  
pp. s15-s20 ◽  
Author(s):  
Aart H. Schene ◽  
Maarten Koeter ◽  
Bob van Wijngaarden ◽  
Helle Charlotte Knudsen ◽  
Morven Leese ◽  
...  

BackgroundThe European Psychiatric Services: Inputs Linked to Outcome Domains and Needs (EPSILON) Study aims to produce standardised versions in five European languages of instruments measuring needs for care, family or caregiving burden, satisfaction with services, quality of life, and sociodemographic and service receipt.AimsTo describe background, rationale and design of the reliability study, focusing on reliable instruments, reliability testing theory, a general reliability testing procedure and sample size requirements.MethodA strict protocol was developed, consisting of definitions of the specific reliability measures used, the statistical methods used to assess these reliability coefficients, the development of statistical programmes to make inter-centre reliability comparisons, criteria for good reliability, and a general format for the reliability analysis.ConclusionThe reliability analyses are based on classical test theory. Reliability measures used are Cronbach's α, Cohen's κ and the intraclass correlation coefficient. Intersite comparisons were extended with a comparison of the standard error of measurement. Criteria for good reliability may need to be adapted for this type of study. The consequences of low reliability, and reliability differing between sites, must be considered before pooling data.

Author(s):  
Lusine Vaganian ◽  
Sonja Bussmann ◽  
Maren Boecker ◽  
Michael Kusch ◽  
Hildegard Labouvie ◽  
...  

Abstract Purpose The World Health Organization Disability Assessent Schedule 2.0 (WHODAS 2.0) assesses disability in individuals irrespective of their health condition. Previous studies validated the usefulness of the WHODAS 2.0 using classical test theory. This study is the first investigating the psychometric properties of the 12-items WHODAS 2.0 in patients with cancer using item analysis according to the Rasch model. Methods In total, 350 cancer patients participated in the study. Rasch analysis of the 12-items version of the WHODAS 2.0 was conducted and included testing unidimensionality, local independence, and testing for differential item functioning (DIF) with regard to age, gender, type of cancer, presence of metastases, psycho-oncological support, and duration of disease. Results After accounting for local dependence, which was mainly found across items of the same WHODAS domain, satisfactory overall fit to the Rasch model was established (χ2 = 36.14, p = 0.07) with good reliability (PSI = 0.82) and unidimensionality of the scale. DIF was found for gender (testlet ‘Life activities’) and age (testlet ‘Getting around/Self-care’), but the size of DIF was not substantial. Conclusion Overall, the analysis results according to the Rasch model support the use of the WHODAS 2.0 12-item version as a measure of disability in cancer patients.


2020 ◽  
Vol 8 (4_suppl3) ◽  
pp. 2325967120S0018
Author(s):  
Andrea Stracciolini ◽  
Laura Boucher ◽  
Sarah Jackson ◽  
Naomi Brown ◽  
Danielle Magrini ◽  
...  

Background The medial patellofemoral ligament (MPFL) is an important soft tissue constraint to preventing patellar dislocations in young athletes. The anatomy of the MPFL has been investigated in cadaveric studies and magnetic resonance studies. No studies to date have provided anatomical data of the MPFL on ultrasonography. Purpose To investigate the feasibility of musculoskeletal ultrasonography for the evaluation of the MPFL, and to determine interrater and intrarater reliability for MPFL ultrasound measures. Methods Ten control participants (20 knees) 20 to 50 years underwent ultrasonography performed by 3 researchers (musculoskeletal ultrasound radiologist, athletic trainer/biomechanist, primary care sports medicine physician) from 3 different institutions for interrater reliability testing. Intrarater reliability testing was performed at 2 separate institutions by 4 physicians, each performing the same knee ultrasound protocol on 20 knees in 10 study participants 2 to 3 weeks apart. In total, 180 images were created for interrater reliability, and 480 images for intrarater reliability. Examinations were performed with linear high-frequency transducers (10-18 MHz) with the participant in the supine position and the extremity flexed at 45°. Measurements included ligament length (long axis to ligament) from the patellar to the femoral attachment sites, ligament width (short axis to ligament) at the patellar attachment, and ligament thickness (long axis to ligament) midway between the patella and femur. Mean and SD were calculated for all measurements. Intraclass correlation coefficient (ICC) analysis was used to assess intrarater and interrater reliability. ICC values < 0.40 indicated poor reliability, whereas those between 0.40 and 0.75 indicated fair to good reliability, and those > 0.75 indicated excellent reliability. Results The mean US value for MPFL length was 44.83mm (SD 6.68), mean thickness 2.66mm (SD 0.85), and mean width 11.76mm (SD 2.99). The overall ICC values for interrater reliability testing indicated fair to good reliability for length measures (0.7) and poor reliability for thickness (–0.1) and width (0.3; Table 1.1). Overall ICC values for intrarater reliability indicated fair to good reliability for length (0.5), excellent for thickness (0.9), and poor reliability for width (–0.3; Table 1.2). Conclusions Musculoskeletal ultrasonography is a feasible and reliable office-based method of measuring MPFL length and thickness. These quantitative measures set the groundwork for establishing normative anatomical measures of the MPFL in athletes and establish a protocol for testing and measuring the MPFL using musculoskeletal ultrasonography. [Table: see text][Table: see text]


2007 ◽  
Vol 23 (1) ◽  
pp. 39-46 ◽  
Author(s):  
Víctor J. Rubio ◽  
David Aguado ◽  
Pedro M. Hontangas ◽  
José M. Hernández

Item response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. However, IRT has been mainly used for assessing achievements and ability rather than personality factors. This paper presents an application of the IRT to a personality measure. Thus, the psychometric properties of a new emotional adjustment measure that consists of a 28-six graded response items is shown. Classical test theory (CTT) analyses as well as IRT analyses are carried out. Samejima's (1969) graded-response model has been used for estimating item parameters. Results show that the bank of items fulfills model assumptions and fits the data reasonably well, demonstrating the suitability of the IRT models for the description and use of data originating from personality measures. In this sense, the model fulfills the expectations that IRT has undoubted advantages: (1) The invariance of the estimated parameters, (2) the treatment given to the standard error of measurement, and (3) the possibilities offered for the construction of computerized adaptive tests (CAT). The bank of items shows good reliability. It also shows convergent validity compared to the Eysenck Personality Inventory (EPQ-A; Eysenck & Eysenck, 1975 ) and the Big Five Questionnaire (BFQ; Caprara, Barbaranelli, & Borgogni, 1993 ).


2021 ◽  
Author(s):  
Lusine Vaganian ◽  
Sonja Bussmann ◽  
Maren Boecker ◽  
Michael Kusch ◽  
Hildegard Labouvie ◽  
...  

Abstract Purpose: The World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) assesses disability in individuals irrespective of their health condition. Previous studies validated the usefulness of the WHODAS 2.0 using classical test theory (CTT). This study is the first investigating the psychometric properties of the 12-items WHODAS 2.0 in patients with cancer using item response theory (IRT), i.e., item analysis according to the Rasch model. Methods: In total, 350 cancer patients participated in the study. Rasch analysis of the 12-items version of the WHODAS 2.0 included testing unidimensionality, local independence, and testing for differential item functioning (DIF) with regard to age, gender, type of cancer, presence of metastases, psycho-oncological support, and duration of disease. Results: After accounting for local dependence, which was mainly found across items of the same WHODAS-domain, satisfactory overall fit to the Rasch model was established (χ2 = 36.14, p = 0.07) with good reliability (PSI = 0.82) and unidimensionality of the scale. DIF was found for gender (testlet ‘Life activities’) and age (testlet ‘Getting around/Self-care’), but the size of DIF was not substantial. Conclusion: Overall, the analysis results according to the Rasch model support the use of the WHODAS 2.0 12-item version as a measure of disability in cancer patients.


1969 ◽  
Vol 25 (1) ◽  
pp. 175-186 ◽  
Author(s):  
Donald W. Zimmerman

A model of variability in measurement which does not employ the concepts of “true score” and “error score” is presented. Reference to an observed score random variable, X, together with the usual axioms of probability, is shown to be a satisfactory basis for derivation of results of the classical test theory which relate observable quantities. In addition, reliability formulas such as the KR 20 and KR 21 are obtained by construction of the observed score random variable over a sample space of outcomes of a testing procedure and assignment of probabilities to outcomes. The approach is consistent with trends in psychological theory toward objectively defined constructs and avoids redundancy in derivations, as well as connotations which arise from reference to “true values” and “errors.” The present model is shown to be consistent with a relativistic, as opposed to an absolutistic, conception of measurement.


2014 ◽  
Vol 35 (4) ◽  
pp. 201-211 ◽  
Author(s):  
André Beauducel ◽  
Anja Leue

It is shown that a minimal assumption should be added to the assumptions of Classical Test Theory (CTT) in order to have positive inter-item correlations, which are regarded as a basis for the aggregation of items. Moreover, it is shown that the assumption of zero correlations between the error score estimates is substantially violated in the population of individuals when the number of items is small. Instead, a negative correlation between error score estimates occurs. The reason for the negative correlation is that the error score estimates for different items of a scale are based on insufficient true score estimates when the number of items is small. A test of the assumption of uncorrelated error score estimates by means of structural equation modeling (SEM) is proposed that takes this effect into account. The SEM-based procedure is demonstrated by means of empirical examples based on the Edinburgh Handedness Inventory and the Eysenck Personality Questionnaire-Revised.


2019 ◽  
Vol 35 (1) ◽  
pp. 55-62 ◽  
Author(s):  
Noboru Iwata ◽  
Akizumi Tsutsumi ◽  
Takafumi Wakita ◽  
Ryuichi Kumagai ◽  
Hiroyuki Noguchi ◽  
...  

Abstract. To investigate the effect of response alternatives/scoring procedures on the measurement properties of the Center for Epidemiologic Studies Depression Scale (CES-D) which has the four response alternatives, a polytomous item response theory (IRT) model was applied to the responses of 2,061 workers and university students (1,640 males, 421 females). Test information functions derived from the polytomous IRT analyses on the CES-D data with various scoring procedures indicated that: (1) the CES-D with its standard (0-1-2-3) scoring procedure should be useful for screening to detect subjects with “at high-risk” of depression if the θ point showing the highest information corresponds to the cut-off point, because of its extremely higher information; (2) the CES-D with the 0-1-1-2 scoring procedure could cover wider range of depressive severity, suggesting that this scoring procedure might be useful in cases where more exhaustive discrimination in symptomatology is of interest; and (3) the revised version of CES-D with replacing original positive items into negatively revised items outperformed the original version. These findings have never been demonstrated by the classical test theory analyses, and thus the utility of this kind of psychometric testing should be warranted to further investigation for the standard measures of psychological assessment.


2009 ◽  
Vol 31 (1) ◽  
pp. 81
Author(s):  
Takeaki Kumazawa

Classical test theory (CTT) has been widely used to estimate the reliability of measurements. Generalizability theory (G theory), an extension of CTT, is a powerful statistical procedure, particularly useful for performance testing, because it enables estimating the percentages of persons variance and multiple sources of error variance. This study focuses on a generalizability study (G study) conducted to investigate such variance components for a paper-pencil multiple-choice vocabulary test used as a diagnostic pretest. Further, a decision study (D study) was conducted to compute the generalizability coefficient (G coefficient) for absolute decisions. The results of the G and D studies indicated that 46% of the total variance was due to the items effect; further, the G coefficient for absolute decisions was low. 古典的テスト理論は尺度の信頼性を測定するため広く用いられている。古典的テスト理論の応用である一般化可能性理論(G理論)は特にパフォーマンステストにおいて有効な分析手法であり、受験者と誤差の要因となる分散成分の割合を測定することができる。本研究では診断テストとして用いられた多岐選択式語彙テストの分散成分を測定するため一般化可能性研究(G研究)を行った。さらに、決定研究(D研究)では絶対評価に用いる一般化可能性係数を算出した。G研究とD研究の結果、項目の分散成分が全体の分散の46%を占め、また信頼度指数は高くなかった。


Sign in / Sign up

Export Citation Format

Share Document