scholarly journals Guideline for Selecting Types of Reliability and Suitable Intra-class Correlation Coefficients in Clinical Research

Author(s):  
Khalil Taherzadeh Chenani ◽  
Farzan Madadizadeh

Introduction: Reliability is an integral part of measuring the reproducibility of research information. Intra-cluster correlation coefficient (ICC) is one of the necessary indicators for reliability reporting, which can be misleading in terms of its diversity. The main purpose of this study was to introduce the types of reliability and appropriate ICC indices.  Methods: In this tutorial article, useful information about the types of reliability and indicators needed to report the results, as well as the types of ICC and its applications were explained for dummies. Results: Three general types of reliability include inter-rater reliability, test-retest reliability, and intra-rater reliability was presented. 10 different types of ICC were also introduced and explained. Conclusion: The research results may be misleading if any of the reliability types and calculation criteria types are chosen incorrectly. Therefore, to make the results of the study more accurate and valuable. Medical researchers must seek help from relevant guidelines such as this study before conducting reliability analysis.  

2021 ◽  
Vol 12 ◽  
Author(s):  
Wei Xia ◽  
William Ho Cheung Li ◽  
Tingna Liang ◽  
Yuanhui Luo ◽  
Laurie Long Kwan Ho ◽  
...  

Objectives: This study conducted a linguistic and psychometric evaluation of the Chinese Counseling Competencies Scale-Revised (CCS-R).Methods: The Chinese CCS-R was created from the original English version using a standard forward-backward translation process. The psychometric properties of the Chinese CCS-R were examined in a cohort of 208 counselors-in-training by two independent raters. Fifty-three counselors-in-training were asked to undergo another counseling performance evaluation for the test-retest. The confirmatory factor analysis (CFA) was conducted for the Chinese CCS-R, followed by internal consistency, test-retest reliability, inter-rater reliability, convergent validity, and concurrent validity.Results: The results of the CFA supported the factorial validity of the Chinese CCS-R, with adequate construct replicability. The scale had a McDonald's omega of 0.876, and intraclass correlation coefficients of 0.63 and 0.90 for test-retest reliability and inter-rater reliability, respectively. Significantly positive correlations were observed between the Chinese CCS-R score and scores of performance checklist (Pearson's γ = 0.781), indicating a large convergent validity, and knowledge on drug abuse (Pearson's γ = 0.833), indicating a moderate concurrent validity.Conclusion: The results support that the Chinese CCS-R is a valid and reliable measure of the counseling competencies.Practice implication: The CCS-R provides trainers with a reliable tool to evaluate counseling students' competencies and to facilitate discussions with trainees about their areas for growth.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5775 ◽  
Author(s):  
Yanxiang Yang ◽  
Moritz Schumann ◽  
Shenglong Le ◽  
Shulin Cheng

Background Objective assessments of sedentary behavior and physical activity (PA) by using accelerometer-based wearable devices are ever expanding, given their importance in the global context of health maintenance. This study aimed to determine the reliability and validity of a new accelerometer-based analyzer (Fibion) for detecting different PAs and estimating energy expenditure (EE) during a simulated free-living day. Methods The study consisted of two parts: a reliability (n = 18) and a validity (n = 19) test. Reliability was assessed by a 45 min protocol of repeated sitting, standing, and walking (i.e., 3 × 15 min, repeated twice), using both Fibion and ActiGraph. Validity was assessed by a 12 h continuous sequence tasks of different types (sitting, standing, walking, and cycling) and intensities (light [LPA], moderate [MPA], and vigorous [VPA]) of PA. Two Fibion devices were worn on the thigh (FT) and in the pocket (FP), respectively and were compared with criteria measures, such as direct observation (criterion 1) and oxygen consumption by a portable gas analyzer, K4b2 (criterion 2). Results FT (intra-class correlation coefficients (ICCs): 0.687–0.806) provided similar reliability as the Actigraph (ICCs: 0.661–0.806) for EE estimation. However, the measurement error (ME) of FT compared to the actual time records indicated an underestimation of duration by 5.1 ± 1.2%, 3.8 ± 0.3% and 14.9 ± 2.6% during sitting, walking, and standing, respectively. During the validity test, FT but not FP showed a moderate agreement but lager variance with the criteria (1 and 2) in assessing duration of sitting, long sitting, LPA, MPA, and VPA (p > 0.05, ICCs: 0.071–0.537), as well as for EE estimation of standing, LPA, MPA, and VPA (p > 0.05, ICCs: 0.673–0.894). Conclusions FT provided similar reliability to that of the Actigraph. However, low correlations between subsequent measurements of both devices indicated large random MEs, which were somewhat diminished during the simulated 12 h real-life test. Furthermore, FT may accurately determine the types, intensities of PA and EE during prolonged periods with substantial changes in postures, indicating that the location of the accelerometer is essential. Further study with a large cohort is needed to confirm the usability of Fibion, especially for detecting the low-intensity PAs.


2020 ◽  
Author(s):  
Victoria Long ◽  
Yin Bun Cheung ◽  
Debra Qu ◽  
Katherine Lim ◽  
Guozhang Lee ◽  
...  

Abstract Context: Measurement of patient-centred outcomes enables clinicians to focus on patient and family priorities and enables quality of palliative care to be assessed.Objectives: This study aimed to evaluate the validity and reliability of the English and translated Chinese IPOS among advanced cancer patients in Singapore.Methods: IPOS was forward and backward translated from English into Chinese. Structural validity was assessed by confirmatory factor analysis; known-group validity by comparing inpatients and community patients; construct validity by correlating IPOS with Edmonton Symptom Assessment System-revised (ESAS-r) and Functional Assessment of Cancer Therapy–General (FACT-G); internal consistency by Cronbach’s alpha; inter-rater reliability between patient and staff responses; test-retest reliability of patient responses between two timepoints.Results: 111 English-responding and 109 Chinese-responding patients participated. The three-factor structure (Physical Symptoms, Emotional Symptoms and Communication and Practical Issues) was confirmed with Comparative Fit Index and Tucker-Lewis-Index > 0.9 and Root Mean Square Error of Approximation < 0.08. Inpatients scored higher than outpatients as hypothesised. Construct validity (Pearson’s correlation coefficient, r≥|0.608|) was shown between the related subscales of IPOS and FACT-G and ESAS-r. Internal consistency was confirmed for total and subscale scores (Cronbach's alpha ≥ 0.84), except for the Communication and Practical Issues subscale (Cronbach’s alpha = 0.29–0.65). Inter-rater reliability (Intra-class correlation coefficient [ICC] ≤ 0.43) between patient and staff responses was insufficient. Test-retest reliability was confirmed with Intra-class correlation coefficient ICC = 0.80 (English) and 0.88 (Chinese) for IPOS Total.Conclusion: IPOS in English and Chinese showed good validity, good internal consistency, and good test-retest reliability, except for the Communication and Practical Issues subscale. There was poor inter-rater reliability between patients and staff.


2021 ◽  
pp. jrheum.210175
Author(s):  
Ying Ying Leung ◽  
William Tillett ◽  
Pil Hojgaard ◽  
Ana-Maria Orbai ◽  
Richard Holland ◽  
...  

Objective Due to no existing data, we aimed to derive evidence to support test-retest reliability for the Health Assessment Questionnaire-Disability Index (HAQ-DI) and Medical Outcome Survey Short-Form-36 item physical functioning domain (SF-36 PF) in psoriatic arthritis (PsA). Methods We identified datasets that collected relevant data for test-retest reliability for HAQ-DI and SF-36 PF; and evaluated them using OMERACT Filter 2.1 methodology. We calculated intra-class correlation coefficients (ICC) as a measure of test-retest reliability. We then conducted a quality assessment and evaluated the adequacy of test-retest reliability performance. Results Two datasets were identified for HAQ-DI and one for SF-36 PF in PsA. The quality of the datasets was good. The ICCs for HAQ-DI were excellent in both datasets: 0.94 (95% CI: 0.88 to 0.97) and 0.94 (95% CI: 0.89 to 0.97). The ICC of SF-36 PF was good (0.89, 95% CI: 0.76 to 0.95). The performance of test-retest reliability for both instruments was judged to be adequate. Conclusion The new data derived support good and reasonable test-retest reliability for HAQ-DI and SF-36 PF in PsA.


1998 ◽  
Vol 13 (5) ◽  
pp. 264-266 ◽  
Author(s):  
E Corruble ◽  
D Purper ◽  
C Payan ◽  
JD Guelfi

SummaryThe inter-rater reliability of the French versions of the MADRS and the DRRS was studied on the basis of 58 videotape records of structured standardised interviews of depressed inpatients under antidepressant treatment. Each patient was assessed by two trained raters, from the same videotape recording. The inter-rater reliability of total scores was high with both scales (intra-class correlation coefficients: 0.86 for MADRS and 0.77 for DRRS). However, the inter-rater reliability for individual items was higher and more homogeneous for the MADRS than for the DRRS. Finally, the structured interview in French appears to be relevant for the MADRS, but it should be improved for the DRRS.


2017 ◽  
Vol 9 (1) ◽  
pp. 5-9
Author(s):  
Konstantinos Kontoangelos ◽  
Sofia Tsiori ◽  
Garyfalia Poulakou ◽  
Konstantinos Protopapas ◽  
Ioannis Katsarolis ◽  
...  

The Greek version of the Davidson Trauma Scale (DTS) was developed to respond to the need of Greek-speaking individuals. The translated questionnaire was administered to 128 HIV outpatients (aged 37.1±9.1) and 166 control patients (aged 32.4±13.4). In addition to the DTS Greek scale, subjects were assessed with two other scales useful for assessing validity. For each factor analyses two components were extracted, based on Cattell's scree test. The two components solution accounted for 55.34% of the total variation in case of frequency variables and 61.45% in case of severity variables. The Cronbach's alpha coefficient and Guttman split-half coefficient of the DTS scale were 0.93 and 0.88 respectively. The test-retest reliability of the Greek version of DTS scale proved to be satisfactory. Individual items had good intra-class correlation coefficients higher than 0.5, which means that all questions have high levels of external validity. The psychometric strength of interview for post-traumatic stress disorder-Greek version it's reliable for its future use, particularly for screening subjects with possible diagnosis of posttraumatic stress disorder.


2003 ◽  
Vol 93 (8) ◽  
pp. 995-1005 ◽  
Author(s):  
M. Nita ◽  
M. A. Ellis ◽  
L. V. Madden

Six different individuals (raters) assessed the severity of Phomopsis leaf blight on strawberry leaflets in five experimental repetitions over 2 years by making a direct visual estimation of the percentage of diseased area of each leaflet or by using the Horsfall-Barratt (H-B) disease scale. Intra-rater and inter-rater reliability and accuracy were determined, and then the relationship between visually estimated severity values and actual severity values was evaluated. Agreement in estimated disease severity values between assessment times by the same raters (i.e., intra-rater reliability), and agreement in disease severity values among raters at a single assessment time (i.e., inter-rater reliability), were both high, with most correlation coefficients being greater than 0.85. The intra-class correlation for overall agreement among raters ranged from 0.80 to 0.96 for the five repetitions. Based on the concordance coefficient calculated for each rater in each repetition, agreement between estimated and actual severity (i.e., accuracy) was somewhat lower than reliability. The relationship between estimated and actual severity was linear, and there was a slight trend to overestimate disease severity. The H-B scale was not more reliable or accurate than direct estimation of severity, and the linear relationship between estimated and actual severity did not support the principles underling the H-B scale. Both size of leaflets and number of lesions per leaflet slightly affected the error in estimate of disease severity.


2014 ◽  
Vol 114 (1) ◽  
pp. 93-103 ◽  
Author(s):  
Tomas Larson ◽  
Eva Norén Selinus ◽  
Clara Hellner Gumpert ◽  
Thomas Nilsson ◽  
Nóra Kerekes ◽  
...  

The Autism-Tics, AD/HD, and other Comorbidities (A–TAC) inventory is used in epidemiological research to assess neurodevelopmental problems and coexisting conditions. Although the A–TAC has been applied in various populations, data on retest reliability are limited. The objective of the present study was to present additional reliability data. The A–TAC was administered by lay assessors and was completed on two occasions by parents of 400 individual twins, with an average interval of 70 days between test sessions. Intra- and inter-rater reliability were analysed with intraclass correlations and Cohen's κ. A–TAC showed excellent test-retest intraclass correlations for both autism spectrum disorder and attention deficit hyperactivity disorder (each at .84). Most modules in the A–TAC had intra- and inter-rater reliability intraclass correlation coefficients of ≥ .60. Cohen's κ indicated acceptable reliability. The current study provides statistical evidence that the A–TAC yields good test-retest reliability in a population-based cohort of children.


Author(s):  
Ghena Ismail ◽  
Jan Looman

Strong inter-rater reliability has been established for the Hare Psychopathy Checklist–Revised (PCL-R), specifically by examiners in research contexts. However, there is less support for inter-reliability in applied settings. This study examined archival data that included a sample of sex offenders ( n = 178) who entered federal custody between 1992 and 1998. The offenders were assessed using the PCL-R on two occasions. The first assessment occurred at Millhaven Institution, the intake unit for federally incarcerated offenders in the province of Ontario. The second assessment took place upon inmates’ transfer to the Regional Treatment Center, which admits federal inmates with intense psychological and psychiatric needs. Intra-class correlation coefficients (ICCs) were calculated for item, total, factor, and facet scores. The ICC absolute agreement for the PCL-R total and factor scores from raters across both settings was slightly better than what has been previously reported by Hare. Results of this study show that the reliability of PCL-R scores in field settings can be comparable to those in research settings. Authors conclude by highlighting the importance of training, consultation, considering different scores for a given item, following the guidelines of the manual in addition to considering measures that enhance neutrality and reliability of findings in the criminal justice system.


2017 ◽  
Vol 9 (1) ◽  
Author(s):  
Konstantinos Kontoangelos ◽  
Sofia Tsiori ◽  
Garyfalia Poulakou ◽  
Konstantinos Protopapas ◽  
Ioannis Katsarolis ◽  
...  

The Greek version of the Davidson Trauma Scale (DTS) was developed to respond to the need of Greek-speaking individuals. The translated questionnaire was administered to 128 HIV outpatients (aged 37.1±9.1) and 166 control patients (aged 32.4±13.4). In addition to the DTS Greek scale, subjects were assessed with two other scales useful for assessing validity. For each factor analyses two components were extracted, based on Cattell’s scree test. The two components solution accounted for 55.34% of the total variation in case of frequency variables and 61.45% in case of severity variables. The Cronbach’s alpha coefficient and Guttman split-half coefficient of the DTS scale were 0.93 and 0.88 respectively. The test-retest reliability of the Greek version of DTS scale proved to be satisfactory. Individual items had good intra-class correlation coefficients higher than 0.5, which means that all questions have high levels of external validity. The psychometric strength of interview for posttraumatic stress disorder-Greek version it’s reliable for its future use, particularly for screening subjects with possible diagnosis of posttraumatic stress disorder.


Sign in / Sign up

Export Citation Format

Share Document