scholarly journals Comparison of different rating scales for the use in Delphi studies: different scales lead to different consensus and show different test-retest reliability

2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Toni Lange ◽  
Christian Kopkow ◽  
Jörg Lützner ◽  
Klaus-Peter Günther ◽  
Sascha Gravius ◽  
...  
1983 ◽  
Vol 10 (1) ◽  
pp. 13-20 ◽  
Author(s):  
S. J. E. Lindsay ◽  
J. F. W. Hodgkins

Recognition by a parent or child of an occlusal abnormality is one of the many factors which may influence a desire for orthodontic treatment. Non-orthodontists may not estimate the severity of malocclusion reliably and may use different criteria from orthodontists for the process. The present study therefore sought to examine the reliability of parents' and children's perceptions of the children's own malocclusions with rating scales under two anchoring conditions and to test the discrepancy between their estimates and those of a panel of orthodontists. The children's and parents' assessments had limited test-retest reliability but instead of making guesses about the severity of their malocclusions they consistently gave low estimates. These effects were not influenced by the additional anchoring stimuli.


2020 ◽  
pp. 073428292097071
Author(s):  
Michal Jabůrek ◽  
Adam Ťápal ◽  
Šárka Portešová ◽  
Steven I. Pfeiffer

The factor structure, the concurrent validity, and test–retest reliability of the Czech translation of the Gifted Rating Scales-School Form [GRS-S; Pfeiffer, S. I., & Jarosewich, T. (2003). GRS (gifted rating scales) - manual. Pearson] were evaluated. Ten alternative models were tested. Four models were found to exhibit acceptable fit and interpretability. The factor structure was comparable for both parent ( n = 277) and teacher raters ( n = 137). High correlations between the factors suggest that raters might be subject to a halo effect. Ratings made by teachers show a closer relationship with criteria (WJ IE II COG, CFT 20-R, and TIM3–5) than ratings made by parents. Test–retest reliability of teacher rating (with median 93 days) was quite high for all GRS-S subscales ( r = .84–.87).


2012 ◽  
Vol 27 (5) ◽  
pp. 321-328 ◽  
Author(s):  
H. Christiansen ◽  
B. Kis ◽  
O. Hirsch ◽  
S. Matthies ◽  
J. Hebebrand ◽  
...  

AbstractBackgroundThe German version of the Conners Adult ADHD Rating Scales (CAARS) has proven to show very high model fit in confirmative factor analyses with the established factors inattention/memory problems, hyperactivity/restlessness, impulsivity/emotional lability, and problems with self-concept in both large healthy control and ADHD patient samples. This study now presents data on the psychometric properties of the German CAARS-self-report (CAARS-S) and observer-report (CAARS-O) questionnaires.MethodsCAARS-S/O and questions on sociodemographic variables were filled out by 466 patients with ADHD, 847 healthy control subjects that already participated in two prior studies, and a total of 896 observer data sets were available. Cronbach's-alpha was calculated to obtain internal reliability coefficients. Pearson correlations were performed to assess test-retest reliability, and concurrent, criterion, and discriminant validity. Receiver Operating Characteristics (ROC-analyses) were used to establish sensitivity and specificity for all subscales.ResultsCoefficient alphas ranged from .74 to .95, and test-retest reliability from .85 to .92 for the CAARS-S, and from .65 to .85 for the CAARS-O. All CAARS subscales, except problems with self-concept correlated significantly with the Barrett Impulsiveness Scale (BIS), but not with the Wender Utah Rating Scale (WURS). Criterion validity was established with ADHD subtype and diagnosis based on DSM-IV criteria. Sensitivity and specificity were high for all four subscales.ConclusionThe reported results confirm our previous study and show that the German CAARS-S/O do indeed represent a reliable and cross-culturally valid measure of current ADHD symptoms in adults.


Author(s):  
Z Paul Lorenc ◽  
Derek Jones ◽  
Jeongyun Kim ◽  
Hee Min Gwak ◽  
Samixa Batham ◽  
...  

Abstract Background Growing demand for minimally invasive aesthetic procedures to correct age-related facial changes and optimize facial proportions has been met with innovation, but has created an unmet need for objective assessment tools to evaluate results empirically. Objectives The purpose of this study is to establish the intra- and inter-rater reliability of ordinal, photonumeric, 4- or 5-point rating scales for clinical use to assess facial aesthetics. Methods Board-certified plastic surgeons and dermatologists (3 raters) performed live validation of jawline contour, temple volume, chin retrusion, nasolabial folds, vertical perioral lip lines, midface volume loss, lip fullness, and crow’s feet dynamic- and at rest- rating scales over 2 rounds, 2 weeks apart. Subjects selected for live validation represented the range of scores and included 54-83 subjects for each scale. Test-retest reliability was quantitated through intra- and inter-rater reliability, determined from the mean weighted kappa and Round 2 Intraclass Correlation Coefficients (ICC), respectively. The clinical significance of a one grade difference was assessed through rater comparison of 31 pairs of side-by-side photographs of subjects with the same grade or a different grade on the developed scales. Results The study demonstrated substantial to near-perfect intra-rater and inter-rater reliability of all scales when utilized by trained raters to assess a diverse group of live subjects. Furthermore, the clinical significance of a 1-point difference on all the developed scales was established. Conclusions The high test-retest reliability and intuitive layout of these scales provide an objective approach with standardized ratings for clinical assessment of various facial features.


2020 ◽  
Vol 77 (11) ◽  
pp. 1119-1125
Author(s):  
Petar Vojvodic ◽  
Ana Andonov ◽  
Dejan Stevanovic ◽  
Ivana Perunicic-Mladenovic ◽  
Goran Mihajlovic ◽  
...  

Background/Aim. Various rating scales for depression are avalable, but the Montgomery-Asberg Depression Rating Scale (MADRS) is one of the most frequently used scales. The aim of this study was to analyze the measurement properties of the MADRS Serbian version for quantifying depression severity in the clinical setting. Methods. Two studies have been conducted in order to validate the MADRS. The first study included sixty-four adult patients with major depressive disorder (MDD), with test-retest situ-ation, and the second one included 19 participants (also with MDD), who had six test-retest situations. Psychomet-ric evaluation included descriptive analysis, internal con-sistency and test-retest reliability, and concurrent validity (correlations with the Hamilton Depression Rating Scale 17 ? HAMD-17). Results. The internal consistency for test-retest reliability was 0.93 in total for the MADRS, and for six test-retest situations was 0.95. The MADRS had one fac-tor structure, with explained variance of 66.26% for the first testing, and 61.29% for the retest. There were statistical sig-nificant correlations between the MADRS and HAMD-17 (r = 0.96 for test and r = 0.94 for retest). Also, it was shown a great correlation between all items on the MADRS, and for the instrument in total (r = 0.89). Conclusion. The MADRS was shown good statistical results, and it could be used in everyday clinical practice for discriminating MDD.


1976 ◽  
Vol 70 (6) ◽  
pp. 251-256
Author(s):  
Linda J. Ross ◽  
Patricia A. Gallagher

This study examines how well Devereux behavior rating scales perform as sensitive and reliable instruments for delineating inappropriate behavior among visually impaired children at a residential school. Three Devereux scales were administered: the Child Behavior Rating Scale; the Adolescent Behavior Rating Scale; and the Elementary School Behavior Rating Scale. Students were rated on the scales, from which obviously inappropriate items had been deleted by houseparents and teachers. One week later, a random sample of students was selected for re-evaluation, as a measure of test-retest reliability. The results suggest that the scales could be viable evaluation instruments, though the Child Behavior Rating Scale showed unacceptable test-retest reliability.


Author(s):  
Matthew L. Hall ◽  
Stephanie De Anda

Purpose The purposes of this study were (a) to introduce “language access profiles” as a viable alternative construct to “communication mode” for describing experience with language input during early childhood for deaf and hard-of-hearing (DHH) children; (b) to describe the development of a new tool for measuring DHH children's language access profiles during infancy and toddlerhood; and (c) to evaluate the novelty, reliability, and validity of this tool. Method We adapted an existing retrospective parent report measure of early language experience (the Language Exposure Assessment Tool) to make it suitable for use with DHH populations. We administered the adapted instrument (DHH Language Exposure Assessment Tool [D-LEAT]) to the caregivers of 105 DHH children aged 12 years and younger. To measure convergent validity, we also administered another novel instrument: the Language Access Profile Tool. To measure test–retest reliability, half of the participants were interviewed again after 1 month. We identified groups of children with similar language access profiles by using hierarchical cluster analysis. Results The D-LEAT revealed DHH children's diverse experiences with access to language during infancy and toddlerhood. Cluster analysis groupings were markedly different from those derived from more traditional grouping rules (e.g., communication modes). Test–retest reliability was good, especially for the same-interviewer condition. Content, convergent, and face validity were strong. Conclusions To optimize DHH children's developmental potential, stakeholders who work at the individual and population levels would benefit from replacing communication mode with language access profiles. The D-LEAT is the first tool that aims to measure this novel construct. Despite limitations that future work aims to address, the present results demonstrate that the D-LEAT represents progress over the status quo.


1982 ◽  
Vol 25 (4) ◽  
pp. 521-527 ◽  
Author(s):  
David C. Shepherd

In 1977, Shepherd and colleagues reported significant correlations (–.90, –.91) between speechreading scores and the latency of a selected negative peak (VN 130 measure) on the averaged visual electroencephalic wave form. The primary purpose of this current study was to examine the stability, or repeatability, of this relation between these cognitive and neurophysiologic measures over a period of several months and thus support its test-retest reliability. Repeated speechreading word and sentence scores were gathered during three test-retest sessions from each of 20 normal-hearing adults. An average of 56 days occurred from the end of one to the beginning of another speechreading sessions. During each of four other test-retest sessions, averaged visual electroencephalic responses (AVER s ) were evoked from each subject. An average of 49 clays intervened between AVER sessions. Product-moment correlations computed among repeated word scores and VN l30 measures ranged from –.61 to –.89. Based on these findings, it was concluded that the VN l30 measure of visual neural firing time is a reliable correlate of speech-reading in normal-hearing adults.


2000 ◽  
Vol 16 (1) ◽  
pp. 53-58 ◽  
Author(s):  
Hans Ottosson ◽  
Martin Grann ◽  
Gunnar Kullgren

Summary: Short-term stability or test-retest reliability of self-reported personality traits is likely to be biased if the respondent is affected by a depressive or anxiety state. However, in some studies, DSM-oriented self-reported instruments have proved to be reasonably stable in the short term, regardless of co-occurring depressive or anxiety disorders. In the present study, we examined the short-term test-retest reliability of a new self-report questionnaire for personality disorder diagnosis (DIP-Q) on a clinical sample of 30 individuals, having either a depressive, an anxiety, or no axis-I disorder. Test-retest scorings from subjects with depressive disorders were mostly unstable, with a significant change in fulfilled criteria between entry and retest for three out of ten personality disorders: borderline, avoidant and obsessive-compulsive personality disorder. Scorings from subjects with anxiety disorders were unstable only for cluster C and dependent personality disorder items. In the absence of co-morbid depressive or anxiety disorders, mean dimensional scores of DIP-Q showed no significant differences between entry and retest. Overall, the effect from state on trait scorings was moderate, and it is concluded that test-retest reliability for DIP-Q is acceptable.


Sign in / Sign up

Export Citation Format

Share Document