Reliability of the Tuck Jump Assessment Using Standardized Rater Training

AbstractBackground:Lowering the cost of assessing clinicians’ competence could promote the scalability of evidence-based treatments such as cognitive behavioral therapy (CBT).Aims:This study examined the concordance between clinicians’, supervisors’ and independent observers’ session-specific ratings of clinician competence in school-based CBT and treatment as usual (TAU). It also investigated the association between clinician competence and supervisory session observation and rater agreement.Method:Fifty-nine school-based clinicians (90% female, 73% Caucasian) were randomly assigned to implement TAU or modular CBT for youth anxiety. Clinicians rated their confidence after each therapy session (n = 1898), and supervisors rated clinicians’ competence after each supervision session (n = 613). Independent observers rated clinicians’ competence from audio recordings (n = 395).Results:Patterns of rater discrepancies differed between the TAU and CBT groups. Correlations with independent raters were low across groups. Clinician competence and session observation were associated with higher agreement among TAU, but not CBT, supervisors and clinicians.Conclusions:These results support the gold standard practice of obtaining independent ratings of adherence and competence in implementation contexts. Further development of measures and/or rater training methods for clinicians and supervisors is needed.

Download Full-text

P.2.b.049 Increasing signal over noise in MDD clinical trials: improvement after efficacy scale rater training among experienced MDD investigators

European Neuropsychopharmacology ◽

10.1016/s0924-977x(13)70548-0 ◽

2013 ◽

Vol 23 ◽

pp. S348

Author(s):

J. Busner ◽

A. Kott ◽

G. Sachs

Keyword(s):

Clinical Trials ◽

Rater Training

Download Full-text

Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial

Journal of General Internal Medicine ◽

10.1007/s11606-008-0842-3 ◽

2008 ◽

Vol 24 (1) ◽

pp. 74-79 ◽

Cited By ~ 97

Author(s):

David A. Cook ◽

Denise M. Dupras ◽

Thomas J. Beckman ◽

Kris G. Thomas ◽

V. Shane Pankratz

Keyword(s):

Randomized Controlled Trial ◽

Controlled Trial ◽

Rater Training ◽

Randomized Controlled

Download Full-text

The effects of rater training, job analysis format and congruence of training on job evaluation ratings

Journal of Business and Psychology ◽

10.1007/bf01020707 ◽

1989 ◽

Vol 3 (4) ◽

pp. 387-401 ◽

Cited By ~ 1

Author(s):

Douglas F. Cellar ◽

John R. Curtis ◽

Kim Kohlepp ◽

Patricia Poczapski ◽

Sameena Mohiuddin

Keyword(s):

Job Analysis ◽

Job Evaluation ◽

Rater Training

Download Full-text

Rater Training and Recruitment in ESL Assessment: Current Issues and Practices

Studies in Foreign Language Education ◽

10.16933/sfle.2008.22.1.227 ◽

2008 ◽

Vol 22 (1) ◽

pp. 227-253

Author(s):

Matthew Dean

Keyword(s):

Rater Training

Download Full-text

A Study of the Rater Training Effect in English Speaking Assessment

The Journal of Mirae English Language and Literature ◽

10.46449/mjell.2019.02.24.1.329 ◽

2019 ◽

Vol 24 (1) ◽

pp. 329-356

Author(s):

Seokhan Kang ◽

Hyunkee Ahn

Keyword(s):

Training Effect ◽

Rater Training ◽

Speaking Assessment ◽

English Speaking

Download Full-text

Discussion on Scoring Issues in Second Signed or Spoken Language Assessment

10.1093/oso/9780190885052.003.0028 ◽

2021 ◽

pp. 329-332

Author(s):

Tobias Haug ◽

Ute Knoch ◽

Wolfgang Mann

Keyword(s):

Second Language ◽

Nonverbal Communication ◽

Rasch Analysis ◽

Language Assessment ◽

Spoken Language ◽

Rater Training ◽

Signed Language ◽

Language Assessments ◽

New Research

This chapter is a joint discussion of key items related to scoring issues related to signed and spoken language assessment that were discussed in Chapters 9.1 and 9.2. One aspect of signed language assessment that has the potential to stimulate new research in spoken second language (L2) assessment is the scoring of nonverbal speaker behaviors. This aspect is rarely represented in the scoring criteria of spoken assessments and in many cases not even available to raters during the scoring process. The authors argue, therefore, for a broadening of the construct of spoken language assessment to also include elements of nonverbal communication in the scoring descriptors. Additionally, the importance of rater training for signed language assessments, application of Rasch analysis to investigate possible reasons of disagreement between raters, and the need to conduct research on rasting scales are discussed.

Download Full-text

Influence of Rater Training on Inter- and Intrarater Reliability When Using the Rat Grimace Scale

Journal of the American Association for Laboratory Animal Science ◽

10.30802/aalas-jaalas-18-000044 ◽

2019 ◽

Vol 58 (2) ◽

pp. 178-183 ◽

Cited By ~ 8

Author(s):

Emily Q Zhang ◽

Vivian SY Leung ◽

Daniel SJ Pang

Keyword(s):

Acute Pain ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Training Group ◽

Intrarater Reliability ◽

Rater Training ◽

Trainee Group ◽

Pain Models ◽

Ongoing Pain ◽

And Performance

Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.

Download Full-text

M42. INDEPENDENT REVIEW & MONITORING IMPROVES QUALITY OF PANSS DATA IN GLOBAL CLINICAL TRIALS

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa030.354 ◽

2020 ◽

Vol 46 (Supplement_1) ◽

pp. S150-S150

Author(s):

Barbara Echevarria ◽

Cong Liu ◽

Selam Negash ◽

Mark Opler ◽

Patricio Molero ◽

...

Keyword(s):

Clinical Trials ◽

Data Quality ◽

Working Group ◽

Double Blind Study ◽

Open Label ◽

Rater Training ◽

Double Blind ◽

Open Label Extension ◽

Expert Working Group ◽

Independent Review

Abstract Background The Positive and Negative Syndrome Scale (PANSS) (1) is the most widely used endpoint for measuring change in schizophrenia clinical trials. A set of flags have been developed by ISCTM expert working group to identify potential scoring errors in PANSS assessments (2). Measures have been taken by sponsors (pharmaceutical industry) with the goal of increasing scoring reliability and data quality, such as the use of Independent Review (IRev). We evaluated changes in data quality when site raters stop being recorded and monitored via IRev by comparing two studies with the same cohort of raters, one with independent review and one without. Methods Data from PANSS assessments in two global multisite schizophrenia clinical trials were analyzed. We selected data from raters participating in both studies (which run concurrently for a significant period of time). Raters were rigorously trained on administration and scoring conventions and certified prior to the study through demonstration of adequate interrater reliability. In addition to these steps, raters in study A were required to audio record all PANSS assessments with a selected subset of visits being subject to IRev. PANSS assessments in study B were neither recorded nor monitored via IRev. Data quality after study completion was examined by calculating the frequency of anomalous data patterns identified as “high” (very probable or definite error) by the ISCTM Working Group in both studies. Additionally, we examined the percentage of assessments with lower than expected PANSS interview duration as captured via an eCOA platform. Results There were 9441 eCOA PANSS assessments in study A and 6178 in study B included in this analysis. The proportions of flags that represented highly probable/definite error differed significantly between the studies (9% vs 18% for Study A and B, respectively, p<.01). The most significant differences in ISCTM flags were related to overly consistent scoring patterns (27 or more items scored identically to the prior visit) occurring with higher frequency in study B. Additionally, study B also had a significantly higher frequency of assessments flagged for low interview duration (< 15 minutes) (1% vs 4% for Study A and B, respectively, p<.01). Discussion Initial rater training is necessary but not sufficient to ensure adequate data quality in schizophrenia trials. Implementation of additional in-study oversight through Independent Review or similar methods reduces the probability of data error in PANSS assessments, including the appearance of improbable rating patterns and decreased time spent interviewing study subjects. One potential limitation is that study A is a double-blind study whereas study B is an open label extension of study A.

Download Full-text