scholarly journals Reliability of the Tuck Jump Assessment Using Standardized Rater Training

Author(s):  
Kevin Racine ◽  
Meghan Warren ◽  
Craig Smith ◽  
Monica R. Lininger
Keyword(s):  
2019 ◽  
Vol 48 (3) ◽  
pp. 350-363 ◽  
Author(s):  
EB Caron ◽  
Michela A. Muggeo ◽  
Heather R. Souer ◽  
Jeffrey E. Pella ◽  
Golda S. Ginsburg

AbstractBackground:Lowering the cost of assessing clinicians’ competence could promote the scalability of evidence-based treatments such as cognitive behavioral therapy (CBT).Aims:This study examined the concordance between clinicians’, supervisors’ and independent observers’ session-specific ratings of clinician competence in school-based CBT and treatment as usual (TAU). It also investigated the association between clinician competence and supervisory session observation and rater agreement.Method:Fifty-nine school-based clinicians (90% female, 73% Caucasian) were randomly assigned to implement TAU or modular CBT for youth anxiety. Clinicians rated their confidence after each therapy session (n = 1898), and supervisors rated clinicians’ competence after each supervision session (n = 613). Independent observers rated clinicians’ competence from audio recordings (n = 395).Results:Patterns of rater discrepancies differed between the TAU and CBT groups. Correlations with independent raters were low across groups. Clinician competence and session observation were associated with higher agreement among TAU, but not CBT, supervisors and clinicians.Conclusions:These results support the gold standard practice of obtaining independent ratings of adherence and competence in implementation contexts. Further development of measures and/or rater training methods for clinicians and supervisors is needed.


2008 ◽  
Vol 24 (1) ◽  
pp. 74-79 ◽  
Author(s):  
David A. Cook ◽  
Denise M. Dupras ◽  
Thomas J. Beckman ◽  
Kris G. Thomas ◽  
V. Shane Pankratz

1989 ◽  
Vol 3 (4) ◽  
pp. 387-401 ◽  
Author(s):  
Douglas F. Cellar ◽  
John R. Curtis ◽  
Kim Kohlepp ◽  
Patricia Poczapski ◽  
Sameena Mohiuddin

2021 ◽  
pp. 329-332
Author(s):  
Tobias Haug ◽  
Ute Knoch ◽  
Wolfgang Mann

This chapter is a joint discussion of key items related to scoring issues related to signed and spoken language assessment that were discussed in Chapters 9.1 and 9.2. One aspect of signed language assessment that has the potential to stimulate new research in spoken second language (L2) assessment is the scoring of nonverbal speaker behaviors. This aspect is rarely represented in the scoring criteria of spoken assessments and in many cases not even available to raters during the scoring process. The authors argue, therefore, for a broadening of the construct of spoken language assessment to also include elements of nonverbal communication in the scoring descriptors. Additionally, the importance of rater training for signed language assessments, application of Rasch analysis to investigate possible reasons of disagreement between raters, and the need to conduct research on rasting scales are discussed.


Author(s):  
Emily Q Zhang ◽  
Vivian SY Leung ◽  
Daniel SJ Pang

Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.


2020 ◽  
Vol 46 (Supplement_1) ◽  
pp. S150-S150
Author(s):  
Barbara Echevarria ◽  
Cong Liu ◽  
Selam Negash ◽  
Mark Opler ◽  
Patricio Molero ◽  
...  

Abstract Background The Positive and Negative Syndrome Scale (PANSS) (1) is the most widely used endpoint for measuring change in schizophrenia clinical trials. A set of flags have been developed by ISCTM expert working group to identify potential scoring errors in PANSS assessments (2). Measures have been taken by sponsors (pharmaceutical industry) with the goal of increasing scoring reliability and data quality, such as the use of Independent Review (IRev). We evaluated changes in data quality when site raters stop being recorded and monitored via IRev by comparing two studies with the same cohort of raters, one with independent review and one without. Methods Data from PANSS assessments in two global multisite schizophrenia clinical trials were analyzed. We selected data from raters participating in both studies (which run concurrently for a significant period of time). Raters were rigorously trained on administration and scoring conventions and certified prior to the study through demonstration of adequate interrater reliability. In addition to these steps, raters in study A were required to audio record all PANSS assessments with a selected subset of visits being subject to IRev. PANSS assessments in study B were neither recorded nor monitored via IRev. Data quality after study completion was examined by calculating the frequency of anomalous data patterns identified as “high” (very probable or definite error) by the ISCTM Working Group in both studies. Additionally, we examined the percentage of assessments with lower than expected PANSS interview duration as captured via an eCOA platform. Results There were 9441 eCOA PANSS assessments in study A and 6178 in study B included in this analysis. The proportions of flags that represented highly probable/definite error differed significantly between the studies (9% vs 18% for Study A and B, respectively, p<.01). The most significant differences in ISCTM flags were related to overly consistent scoring patterns (27 or more items scored identically to the prior visit) occurring with higher frequency in study B. Additionally, study B also had a significantly higher frequency of assessments flagged for low interview duration (< 15 minutes) (1% vs 4% for Study A and B, respectively, p<.01). Discussion Initial rater training is necessary but not sufficient to ensure adequate data quality in schizophrenia trials. Implementation of additional in-study oversight through Independent Review or similar methods reduces the probability of data error in PANSS assessments, including the appearance of improbable rating patterns and decreased time spent interviewing study subjects. One potential limitation is that study A is a double-blind study whereas study B is an open label extension of study A.


Sign in / Sign up

Export Citation Format

Share Document