Comparison of different rating scales for the use in Delphi studies: different scales lead to different consensus and show different test-retest reliability

Recognition by a parent or child of an occlusal abnormality is one of the many factors which may influence a desire for orthodontic treatment. Non-orthodontists may not estimate the severity of malocclusion reliably and may use different criteria from orthodontists for the process. The present study therefore sought to examine the reliability of parents' and children's perceptions of the children's own malocclusions with rating scales under two anchoring conditions and to test the discrepancy between their estimates and those of a panel of orthodontists. The children's and parents' assessments had limited test-retest reliability but instead of making guesses about the severity of their malocclusions they consistently gave low estimates. These effects were not influenced by the additional anchoring stimuli.

Download Full-text

Validity and Reliability of Gifted Rating Scales-School Form in Sample of Teachers and Parents – A Czech Contribution

Journal of Psychoeducational Assessment ◽

10.1177/0734282920970718 ◽

2020 ◽

pp. 073428292097071

Author(s):

Michal Jabůrek ◽

Adam Ťápal ◽

Šárka Portešová ◽

Steven I. Pfeiffer

Keyword(s):

Factor Structure ◽

Concurrent Validity ◽

Rating Scales ◽

Teacher Rating ◽

Halo Effect ◽

Validity And Reliability ◽

Retest Reliability ◽

Alternative Models ◽

Teachers And Parents ◽

Test Retest Reliability

The factor structure, the concurrent validity, and test–retest reliability of the Czech translation of the Gifted Rating Scales-School Form [GRS-S; Pfeiffer, S. I., & Jarosewich, T. (2003). GRS (gifted rating scales) - manual. Pearson] were evaluated. Ten alternative models were tested. Four models were found to exhibit acceptable fit and interpretability. The factor structure was comparable for both parent ( n = 277) and teacher raters ( n = 137). High correlations between the factors suggest that raters might be subject to a halo effect. Ratings made by teachers show a closer relationship with criteria (WJ IE II COG, CFT 20-R, and TIM3–5) than ratings made by parents. Test–retest reliability of teacher rating (with median 93 days) was quite high for all GRS-S subscales ( r = .84–.87).

Download Full-text

German validation of the Conners Adult ADHD Rating Scales (CAARS) II: Reliability, validity, diagnostic sensitivity and specificity

European Psychiatry ◽

10.1016/j.eurpsy.2010.12.010 ◽

2012 ◽

Vol 27 (5) ◽

pp. 321-328 ◽

Cited By ~ 44

Author(s):

H. Christiansen ◽

B. Kis ◽

O. Hirsch ◽

S. Matthies ◽

J. Hebebrand ◽

...

Keyword(s):

Sensitivity And Specificity ◽

Adult Adhd ◽

Rating Scales ◽

Self Report ◽

Self Concept ◽

Operating Characteristics ◽

Retest Reliability ◽

Healthy Control ◽

Test Retest Reliability ◽

Adhd Rating Scales

AbstractBackgroundThe German version of the Conners Adult ADHD Rating Scales (CAARS) has proven to show very high model fit in confirmative factor analyses with the established factors inattention/memory problems, hyperactivity/restlessness, impulsivity/emotional lability, and problems with self-concept in both large healthy control and ADHD patient samples. This study now presents data on the psychometric properties of the German CAARS-self-report (CAARS-S) and observer-report (CAARS-O) questionnaires.MethodsCAARS-S/O and questions on sociodemographic variables were filled out by 466 patients with ADHD, 847 healthy control subjects that already participated in two prior studies, and a total of 896 observer data sets were available. Cronbach's-alpha was calculated to obtain internal reliability coefficients. Pearson correlations were performed to assess test-retest reliability, and concurrent, criterion, and discriminant validity. Receiver Operating Characteristics (ROC-analyses) were used to establish sensitivity and specificity for all subscales.ResultsCoefficient alphas ranged from .74 to .95, and test-retest reliability from .85 to .92 for the CAARS-S, and from .65 to .85 for the CAARS-O. All CAARS subscales, except problems with self-concept correlated significantly with the Barrett Impulsiveness Scale (BIS), but not with the Wender Utah Rating Scale (WURS). Criterion validity was established with ADHD subtype and diagnosis based on DSM-IV criteria. Sensitivity and specificity were high for all four subscales.ConclusionThe reported results confirm our previous study and show that the German CAARS-S/O do indeed represent a reliable and cross-culturally valid measure of current ADHD symptoms in adults.

Download Full-text

Validating a Series of Photo-Numeric Rating Scales for Use in Facial Aesthetics Using Statistical Analysis of Intra- and Inter-Rater Reliability

Aesthetic Surgery Journal Open Forum ◽

10.1093/asjof/ojab039 ◽

2021 ◽

Author(s):

Z Paul Lorenc ◽

Derek Jones ◽

Jeongyun Kim ◽

Hee Min Gwak ◽

Samixa Batham ◽

...

Keyword(s):

Clinical Significance ◽

Rating Scales ◽

Objective Assessment ◽

Intraclass Correlation ◽

Unmet Need ◽

Assessment Tools ◽

Facial Aesthetics ◽

Rater Reliability ◽

Retest Reliability ◽

Test Retest Reliability

Abstract Background Growing demand for minimally invasive aesthetic procedures to correct age-related facial changes and optimize facial proportions has been met with innovation, but has created an unmet need for objective assessment tools to evaluate results empirically. Objectives The purpose of this study is to establish the intra- and inter-rater reliability of ordinal, photonumeric, 4- or 5-point rating scales for clinical use to assess facial aesthetics. Methods Board-certified plastic surgeons and dermatologists (3 raters) performed live validation of jawline contour, temple volume, chin retrusion, nasolabial folds, vertical perioral lip lines, midface volume loss, lip fullness, and crow’s feet dynamic- and at rest- rating scales over 2 rounds, 2 weeks apart. Subjects selected for live validation represented the range of scores and included 54-83 subjects for each scale. Test-retest reliability was quantitated through intra- and inter-rater reliability, determined from the mean weighted kappa and Round 2 Intraclass Correlation Coefficients (ICC), respectively. The clinical significance of a one grade difference was assessed through rater comparison of 31 pairs of side-by-side photographs of subjects with the same grade or a different grade on the developed scales. Results The study demonstrated substantial to near-perfect intra-rater and inter-rater reliability of all scales when utilized by trained raters to assess a diverse group of live subjects. Furthermore, the clinical significance of a 1-point difference on all the developed scales was established. Conclusions The high test-retest reliability and intuitive layout of these scales provide an objective approach with standardized ratings for clinical assessment of various facial features.

Download Full-text

Montgomery-Asberg depression rating scale in clinical practice: Psychometric properties on Serbian patients

Vojnosanitetski pregled ◽

10.2298/vsp171017176v ◽

2020 ◽

Vol 77 (11) ◽

pp. 1119-1125

Author(s):

Petar Vojvodic ◽

Ana Andonov ◽

Dejan Stevanovic ◽

Ivana Perunicic-Mladenovic ◽

Goran Mihajlovic ◽

...

Keyword(s):

Clinical Practice ◽

Rating Scales ◽

Rating Scale ◽

Descriptive Analysis ◽

Measurement Properties ◽

Everyday Clinical Practice ◽

Retest Reliability ◽

Serbian Version ◽

Explained Variance ◽

Test Retest Reliability

Background/Aim. Various rating scales for depression are avalable, but the Montgomery-Asberg Depression Rating Scale (MADRS) is one of the most frequently used scales. The aim of this study was to analyze the measurement properties of the MADRS Serbian version for quantifying depression severity in the clinical setting. Methods. Two studies have been conducted in order to validate the MADRS. The first study included sixty-four adult patients with major depressive disorder (MDD), with test-retest situ-ation, and the second one included 19 participants (also with MDD), who had six test-retest situations. Psychomet-ric evaluation included descriptive analysis, internal con-sistency and test-retest reliability, and concurrent validity (correlations with the Hamilton Depression Rating Scale 17 ? HAMD-17). Results. The internal consistency for test-retest reliability was 0.93 in total for the MADRS, and for six test-retest situations was 0.95. The MADRS had one fac-tor structure, with explained variance of 66.26% for the first testing, and 61.29% for the retest. There were statistical sig-nificant correlations between the MADRS and HAMD-17 (r = 0.96 for test and r = 0.94 for retest). Also, it was shown a great correlation between all items on the MADRS, and for the instrument in total (r = 0.89). Conclusion. The MADRS was shown good statistical results, and it could be used in everyday clinical practice for discriminating MDD.

Download Full-text

Devereux Scales as Behavioral Measures of Visually Impaired Residential Students

Journal of Visual Impairment & Blindness ◽

10.1177/0145482x7607000606 ◽

1976 ◽

Vol 70 (6) ◽

pp. 251-256

Author(s):

Linda J. Ross ◽

Patricia A. Gallagher

Keyword(s):

Child Behavior ◽

Visually Impaired ◽

Rating Scales ◽

Rating Scale ◽

Behavior Rating ◽

School Behavior ◽

Behavioral Measures ◽

Retest Reliability ◽

Behavior Rating Scale ◽

Test Retest Reliability

This study examines how well Devereux behavior rating scales perform as sensitive and reliable instruments for delineating inappropriate behavior among visually impaired children at a residential school. Three Devereux scales were administered: the Child Behavior Rating Scale; the Adolescent Behavior Rating Scale; and the Elementary School Behavior Rating Scale. Students were rated on the scales, from which obviously inappropriate items had been deleted by houseparents and teachers. One week later, a random sample of students was selected for re-evaluation, as a measure of test-retest reliability. The results suggest that the scales could be viable evaluation instruments, though the Child Behavior Rating Scale showed unacceptable test-retest reliability.

Download Full-text

P.8.010 Test-retest reliability is a major psychometric issue for selecting rating scales in psychiatry and must be assessed by appropriate indexes

European Neuropsychopharmacology ◽

10.1016/s0924-977x(05)81291-x ◽

2005 ◽

Vol 15 ◽

pp. S610-S611

Keyword(s):

Rating Scales ◽

Retest Reliability ◽

Test Retest Reliability

Download Full-text

Measuring “Language Access Profiles” in Deaf and Hard-of-Hearing Children With the DHH Language Exposure Assessment Tool

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-20-00439 ◽

2020 ◽

pp. 1-25

Author(s):

Matthew L. Hall ◽

Stephanie De Anda

Keyword(s):

Cluster Analysis ◽

Exposure Assessment ◽

Hard Of Hearing ◽

Assessment Tool ◽

Communication Mode ◽

Developmental Potential ◽

Language Access ◽

Retest Reliability ◽

Language Exposure ◽

Test Retest Reliability

Purpose The purposes of this study were (a) to introduce “language access profiles” as a viable alternative construct to “communication mode” for describing experience with language input during early childhood for deaf and hard-of-hearing (DHH) children; (b) to describe the development of a new tool for measuring DHH children's language access profiles during infancy and toddlerhood; and (c) to evaluate the novelty, reliability, and validity of this tool. Method We adapted an existing retrospective parent report measure of early language experience (the Language Exposure Assessment Tool) to make it suitable for use with DHH populations. We administered the adapted instrument (DHH Language Exposure Assessment Tool [D-LEAT]) to the caregivers of 105 DHH children aged 12 years and younger. To measure convergent validity, we also administered another novel instrument: the Language Access Profile Tool. To measure test–retest reliability, half of the participants were interviewed again after 1 month. We identified groups of children with similar language access profiles by using hierarchical cluster analysis. Results The D-LEAT revealed DHH children's diverse experiences with access to language during infancy and toddlerhood. Cluster analysis groupings were markedly different from those derived from more traditional grouping rules (e.g., communication modes). Test–retest reliability was good, especially for the same-interviewer condition. Content, convergent, and face validity were strong. Conclusions To optimize DHH children's developmental potential, stakeholders who work at the individual and population levels would benefit from replacing communication mode with language access profiles. The D-LEAT is the first tool that aims to measure this novel construct. Despite limitations that future work aims to address, the present results demonstrate that the D-LEAT represents progress over the status quo.

Download Full-text

Visual-Neural Correlate of Speechreading Ability in Normal-Hearing Adults

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.2504.521 ◽

1982 ◽

Vol 25 (4) ◽

pp. 521-527 ◽

Cited By ~ 7

Author(s):

David C. Shepherd

Keyword(s):

Normal Hearing ◽

Wave Form ◽

Firing Time ◽

Neural Firing ◽

Neural Correlate ◽

Retest Reliability ◽

Repeated Word ◽

Negative Peak ◽

The Stability ◽

Test Retest Reliability

In 1977, Shepherd and colleagues reported significant correlations (–.90, –.91) between speechreading scores and the latency of a selected negative peak (VN 130 measure) on the averaged visual electroencephalic wave form. The primary purpose of this current study was to examine the stability, or repeatability, of this relation between these cognitive and neurophysiologic measures over a period of several months and thus support its test-retest reliability. Repeated speechreading word and sentence scores were gathered during three test-retest sessions from each of 20 normal-hearing adults. An average of 56 days occurred from the end of one to the beginning of another speechreading sessions. During each of four other test-retest sessions, averaged visual electroencephalic responses (AVER s ) were evoked from each subject. An average of 49 clays intervened between AVER sessions. Product-moment correlations computed among repeated word scores and VN l30 measures ranged from –.61 to –.89. Based on these findings, it was concluded that the VN l30 measure of visual neural firing time is a reliable correlate of speech-reading in normal-hearing adults.

Download Full-text

Test-Retest Reliability of a Self-Report Questionnaire for DSM-IV and ICD-10 Personality Disorders

European Journal of Psychological Assessment ◽

10.1027//1015-5759.16.1.53 ◽

2000 ◽

Vol 16 (1) ◽

pp. 53-58 ◽

Cited By ~ 11

Author(s):

Hans Ottosson ◽

Martin Grann ◽

Gunnar Kullgren

Keyword(s):

Personality Disorder ◽

Anxiety Disorders ◽

Personality Disorders ◽

Clinical Sample ◽

Self Report ◽

Anxiety State ◽

Short Term ◽

Retest Reliability ◽

Axis I ◽

Test Retest Reliability

Summary: Short-term stability or test-retest reliability of self-reported personality traits is likely to be biased if the respondent is affected by a depressive or anxiety state. However, in some studies, DSM-oriented self-reported instruments have proved to be reasonably stable in the short term, regardless of co-occurring depressive or anxiety disorders. In the present study, we examined the short-term test-retest reliability of a new self-report questionnaire for personality disorder diagnosis (DIP-Q) on a clinical sample of 30 individuals, having either a depressive, an anxiety, or no axis-I disorder. Test-retest scorings from subjects with depressive disorders were mostly unstable, with a significant change in fulfilled criteria between entry and retest for three out of ten personality disorders: borderline, avoidant and obsessive-compulsive personality disorder. Scorings from subjects with anxiety disorders were unstable only for cluster C and dependent personality disorder items. In the absence of co-morbid depressive or anxiety disorders, mean dimensional scores of DIP-Q showed no significant differences between entry and retest. Overall, the effect from state on trait scorings was moderate, and it is concluded that test-retest reliability for DIP-Q is acceptable.

Download Full-text