scholarly journals The influence of rater training on inter-and intra-rater reliability when using the rat grimace scale

Author(s):  
Emily Zhang ◽  
Vivian Leung ◽  
Daniel SJ Pang

Rodent grimace scales facilitate assessment of spontaneous pain and can identify a range of acute pain levels. Reported rater training in using these scales varies considerably and may contribute to observed variability in inter-rater reliability. This study evaluated the effect of training on inter-rater reliability with the Rat Grimace Scale (RGS). Two training sets, of 42 and 150 images, were prepared from several acute pain models. Four trainee raters progressed through 2 rounds of training, first scoring 42 images (S1) followed by 150 images (S2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then re-scored (S2b). Four years after training, all trainees re-scored the 150 images (S2c). Inter- and intra-rater reliability was evaluated using the intra-class correlation coefficient (ICC) and ICCs compared with a Feldt test. Inter-rater reliability increased from moderate (0.58 [95%CI: 0.43-0.72]) to very good (0.85 [0.81-0.88]) between S1 and S2b (p < 0.01) and also increased between S2a and S2b (p < 0.01). The action units with the highest and lowest ICCs at S2b were orbital tightening (0.84 [0.80-0.87]) and whiskers (0.63 [0.57-0.70]), respectively. In comparison to an experienced rater the ICCs for all trainees improved, ranging from 0.88 to 0.91 at S2b. Four years later, very good inter-rater reliability was retained (0.82 [0.76-0.84]) and intra-rater reliability was good or very good (0.78-0.87). Training improves inter-rater reliability between trainees, with an associated reduction in 95%CI. Additionally, training resulted in improved inter-rater reliability alongside an experienced rater. Performance was retained after several years. The beneficial effects of training potentially reduce data variability and improve experimental animal welfare.

2018 ◽  
Author(s):  
Emily Zhang ◽  
Vivian Leung ◽  
Daniel SJ Pang

Rodent grimace scales facilitate assessment of spontaneous pain and can identify a range of acute pain levels. Reported rater training in using these scales varies considerably and may contribute to observed variability in inter-rater reliability. This study evaluated the effect of training on inter-rater reliability with the Rat Grimace Scale (RGS). Two training sets, of 42 and 150 images, were prepared from several acute pain models. Four trainee raters progressed through 2 rounds of training, first scoring 42 images (S1) followed by 150 images (S2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then re-scored (S2b). Four years after training, all trainees re-scored the 150 images (S2c). Inter- and intra-rater reliability was evaluated using the intra-class correlation coefficient (ICC) and ICCs compared with a Feldt test. Inter-rater reliability increased from moderate (0.58 [95%CI: 0.43-0.72]) to very good (0.85 [0.81-0.88]) between S1 and S2b (p < 0.01) and also increased between S2a and S2b (p < 0.01). The action units with the highest and lowest ICCs at S2b were orbital tightening (0.84 [0.80-0.87]) and whiskers (0.63 [0.57-0.70]), respectively. In comparison to an experienced rater the ICCs for all trainees improved, ranging from 0.88 to 0.91 at S2b. Four years later, very good inter-rater reliability was retained (0.82 [0.76-0.84]) and intra-rater reliability was good or very good (0.78-0.87). Training improves inter-rater reliability between trainees, with an associated reduction in 95%CI. Additionally, training resulted in improved inter-rater reliability alongside an experienced rater. Performance was retained after several years. The beneficial effects of training potentially reduce data variability and improve experimental animal welfare.


2018 ◽  
Author(s):  
Emily Zhang ◽  
Vivian Leung ◽  
Daniel SJ Pang

Background. Rodent grimace scales facilitate evaluation of the affective component of pain and can identify a range of acute pain levels. Reported rater training in the use of these scales varies considerably and may contribute to observed variability in inter-rater reliability. This study evaluated the effect of training on inter-rater reliability with the Rat Grimace Scale (RGS). Methods. Two training sets, of 42 and 150 images, were prepared from several acute pain models. Four trainee raters, with no previous experience with the RGS, progressed through 2 rounds of training, first scoring 42 images (S1) followed by 150 images (S2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were re-scored in a final round (S2b). Inter-rater reliability was evaluated using the intra-class correlation coefficient (ICC) and ICCs compared with a Feldt test. Results. Inter-rater reliability increased from moderate (ICC 0.58 [95%CI: 0.43-0.72]) to very good (ICC 0.85 [0.81-0.88]) between S1 and S2b (p < 0.01) with a significant increase also observed between S2a and S2b (p < 0.01). The ICCs for individual action units orbital tightening, ears and nose/cheek also improved from S1 to S2b (p < 0.01). The action units with the highest and lowest ICCs at S2b were orbital tightening (0.84 [0.80-0.87]) and whiskers (0.63 [0.57-0.70]), respectively. In comparison to an experienced rater the ICCs for all trainees improved, ranging from 0.88 to 0.91 at S2b. Discussion. Training improves inter-rater reliability between trainees, with an associated reduction in 95%CI. Additionally, training resulted in improved inter-rater reliability alongside an experienced rater. Training improves the scoring of individual action units though scoring of whiskers is more difficult that other sites. Conclusion. The beneficial effects of training potentially reduce data variability and improve experimental animal welfare.


2021 ◽  
pp. 003151252110365
Author(s):  
Alessandra V. Prieto ◽  
Kênnea Martins Almeida Ayupe ◽  
Ana C. A. Abreu ◽  
Paulo J. B. Gutierres Filho

Improvement in rider mobility represents an important functional gain for people with disabilities undergoing hippotherapy. However, there is no validated measuring instrument to track and document the rider's progress in riding activities. In this study, we aimed to develop and establish validity evidence for an instrument to assess hippotherapy participants’ mobility on horseback. We report on this development through the stages of: (a) content validation, (b) construct validation, (c) inter- and intra-rater reliability and (d) internal consistency analysis. We evaluated its factor structure with exploratory factor analyses, calculated values for inter- and intra-rater reliability using the intra-class correlation coefficient, and calculated its internal consistency using Cronbach's alpha. We followed recommendations by the Guidelines for Reporting Reliability and Agreement Studies. We found good inter-rater reliability (intra-class correlation coefficient – ICC = 0.991–0.999) and good intra-rater reliability (ICC = 0.997–1.0), and there was excellent internal consistency (Cronbach's α = 0.937–0.999). The instrument’s factor structure grouped its three domains into one factor. As this instrument is theoretically consistent and has been found to be appropriate and reliable for its intended use, it is now available for the measurement of horseback mobility among hippotherapy riders.


Author(s):  
Emily Q Zhang ◽  
Vivian SY Leung ◽  
Daniel SJ Pang

Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.


2020 ◽  
Vol 21 (Supplement_1) ◽  
Author(s):  
M Gavazzoni ◽  
M Z Zuber ◽  
A P Pozzoli ◽  
M T Taramasso ◽  
F M Maisano

Abstract Background/Introduction. Recently the central role of hemodynamic invasive monitoring during MitraClip (Abbott Vascular, Santa Clara, CA, USA) procedure has been raised. After removal of Steerable Guide Catheter (SGC) at the end of procedure, iatrogenic interatrial septum defect determines acute sub-clinical hemodynamic changes depending on right atrial (RA) and left atrial (LA) pressures. The possibility to assess LAP non-invasively by Doppler -echocardiography at the end of the procedure allows to quantify real hemodynamic impact of reduction of MR and leaves the door open to further therapeutic decisions (such as closure of iatrogenic IASd). Purpose This prospective study aimed to assess the role of evaluation of post-procedural mean trans-atrial gradient with continuous-wave (CW) Doppler (DPmean-IAS) in estimating final m-LAP after removal of SGC. Methods We prospectively performed the computation of trans-atrial CW- Doppler tracing for estimation of mean-transatrial gradient (meanGp-LA-RA) in patients treated with MitraClip; we added the estimation of central venous pressure (CVP) according to: i) dilatation of superior vena cava (IVC, mm); ii) presence or not of systolic excursion of IVC (end-inspiratory excursion was not evaluable in patients under sedation); iii) hepatic vein dilatation. The sum of CVP estimated and meanGp-LA-RA (mmHg) represents the m-LAP-Echo-measured at the end of procedure. This value has been compared with m-LAP measured invasively before removal of SGC. We tested the inter-rater reliability with the Intra-class Correlation Coefficient for comparing this method with the gold standard (invasive assessment of LAP). Results we included 19 patients; aetiology of MR was degenerative in 89% of cases. Basal m-LAP was 15 ± 13,3 mmHg and decreased by 32% by the end of procedure (mean-LAP at the end: 10,1 ± 3,3 mmHg, p &lt; 0.001). At the end of the procedure mean Gp-LA-RA was 2.5 ± 1.2 mmHg and CVP 7.5 ± 3.5; the m-LAP-Echo-measured was 9.6 ± 2.4. The delay in time of computation of m-LAP by echocardiography with respect to last invasive assessment available was computed and settled around 5 minutes (IQR 3-9 min). The inter-rater reliability with the Intra-class Correlation Coefficient was high: 0.8, (CI95% 0.647-0.948, p &lt; 0.01); with Bland-Altman test we could assess that bias of measures was acceptable for this clinical context with upper concordance limit of 2,7 mmHg and lower of 4,7 mmHg, with a bias of 0,9 mmHg, not relevant for this clinical purpose. Conclusions The present study represents the first validation of a Doppler-based method for non invasively assessing post-procedural LAP in percutaneous mitral valve interventions requiring transeptal approach. Follow up is needed for correlate this value with clinical outcomes.


2020 ◽  
Author(s):  
Victoria Long ◽  
Yin Bun Cheung ◽  
Debra Qu ◽  
Katherine Lim ◽  
Guozhang Lee ◽  
...  

Abstract Context: Measurement of patient-centred outcomes enables clinicians to focus on patient and family priorities and enables quality of palliative care to be assessed.Objectives: This study aimed to evaluate the validity and reliability of the English and translated Chinese IPOS among advanced cancer patients in Singapore.Methods: IPOS was forward and backward translated from English into Chinese. Structural validity was assessed by confirmatory factor analysis; known-group validity by comparing inpatients and community patients; construct validity by correlating IPOS with Edmonton Symptom Assessment System-revised (ESAS-r) and Functional Assessment of Cancer Therapy–General (FACT-G); internal consistency by Cronbach’s alpha; inter-rater reliability between patient and staff responses; test-retest reliability of patient responses between two timepoints.Results: 111 English-responding and 109 Chinese-responding patients participated. The three-factor structure (Physical Symptoms, Emotional Symptoms and Communication and Practical Issues) was confirmed with Comparative Fit Index and Tucker-Lewis-Index > 0.9 and Root Mean Square Error of Approximation < 0.08. Inpatients scored higher than outpatients as hypothesised. Construct validity (Pearson’s correlation coefficient, r≥|0.608|) was shown between the related subscales of IPOS and FACT-G and ESAS-r. Internal consistency was confirmed for total and subscale scores (Cronbach's alpha ≥ 0.84), except for the Communication and Practical Issues subscale (Cronbach’s alpha = 0.29–0.65). Inter-rater reliability (Intra-class correlation coefficient [ICC] ≤ 0.43) between patient and staff responses was insufficient. Test-retest reliability was confirmed with Intra-class correlation coefficient ICC = 0.80 (English) and 0.88 (Chinese) for IPOS Total.Conclusion: IPOS in English and Chinese showed good validity, good internal consistency, and good test-retest reliability, except for the Communication and Practical Issues subscale. There was poor inter-rater reliability between patients and staff.


Author(s):  
Muthusamy Sivaguru ◽  
Subramaniam Ambusam ◽  
Balasubramanian K. ◽  
Purushothaman Vinosh Kumar ◽  
Vasanthi Rajkumar Krishnan

Maximal aerobic capacity (VO2 max) is one of the important factors that influence swimming performance. Currently, the Garmin Forerunner Fitness Watch 935 used to measure VO2 max are expensive, require skilled-trained personnel, not feasible for large-scale use, and land-based, which will not be accurate in measuring water-based activity. In order to measure the swimming performance, there is a need for an affordable, feasible, and reliable device. Therefore, the current study aimed to examine the intra-rater reliability of Garmin Forerunner Fitness Watch 935 accuracy in measuring the VO2 max among collegiate swimmers during the 200m swimming task. The VO2 max measurement of 10 collegiate swimmers was taken with Garmin Forerunner for two trials. The intra-class correlation coefficient (ICC), standard error of measurements (SEMs), and Bland-Altman plot was used in the current study to establish the inter-rater reliability measurement. The intra-rater reliability of Garmin Forerunner showed high reliability and accuracy with an intra-class correlation coefficient (2,1) of 0.869 and standard error of measurements of 0.231 ml/kg/min. Further, the results were strengthened with Bland-Altman plot showed an acceptable agreement between the two trials. The Garmin Forerunner would be a simple, objective and useful device to be used by physiotherapists, trainers and other sports-related disciplines to assess and improve the swimming performance by targeting the heart rate and VO2 max.


2011 ◽  
Vol 26 (S2) ◽  
pp. 453-453
Author(s):  
A. De Fries ◽  
S. Liechti ◽  
M. Opler ◽  
S. Lane ◽  
E. Ivanova ◽  
...  

Introduction/objectives/aimsWe compared cohorts of raters from different countries who received training on the PANSS. We attempted to determine if there was any consistent by-country impact on specific items, factors, or subscales. We also queried raters about their perceptions of the instrument they were asked to use vis-à-vis their local patient population.MethodsThe data set comes from standardized rater training events involving raters from four countries: India (n = 83), Russia (n = 59), the US (n = 63), and Romania (n = 76). Raters scored interviews of schizophrenic patients using the PANSS. Scores were compared and intra-class correlation coefficients (ICCs) and rater agreement with “gold standard” scores were evaluated. The results were viewed against raters’ responses to questions about how well the PANSS items correlated to the presentation of symptoms.ResultsRaters from the US and Russia demonstrated a higher level of inter-rater consistency with ICCs of 0.883 and 0.835, respectively. For eight PANSS items, all raters demonstrated at least 80% agreement with the gold standard scores. For ten PANSS items, there was at least one country whose raters scored below 60% agreement. The PANSS items with the lower inter-rater reliability were the same items raters indicated as problematic in local settings.ConclusionThe differences in rater performance indicate that standardized rater training is broadly effective but that there are some important differences in the way in which different groups conceptualize symptomatology and corresponding PANSS items. This suggests a need to tailor training to ensure reliability and validity in the use of this instrument.


Rheumatology ◽  
2021 ◽  
Author(s):  
Amber Vanhaecke ◽  
Sven Verschuere ◽  
Veronica Vilela ◽  
Lise Heeman ◽  
Maurizio Cutolo ◽  
...  

Abstract Objectives To investigate the reliability of durometry in systemic sclerosis (SSc), by means of a systematic review and additional pilot study. Methods Literature was systematically reviewed according to the PRISMA guidelines to identify all original studies assessing the reliability of durometry in SSc. Additionally, in the pilot study, intra-rater reliability was evaluated in a first cohort of 74 SSc patients (61 female, 13 LSSc/53 LcSSc/8 DcSSc). In a second separate set of 30 SSc patients (21 female, 4 LSSc/20 LcSSc/6 DcSSc), intra- and inter-rater reliability were evaluated. Results Only two unique records identified through the systematic review were qualified to generate conclusions. Regarding intra-rater reliability, Kissin reported excellent intra-class correlation coefficient values (ICC, 0.86–0.94) for measurements at nine skin sites in two DcSSc patients. Merkel and Kissin described, both in five DcSSc patients, good to excellent inter-rater reliability (ICC, 0.82–0.96 and 0.61–0.85) for measurements at respectively, six and nine skin sites. In our pilot study, ICC for intra-rater reliability at 17 standardized skin sites were excellent in both cohorts, ranging 0.93–0.99 and 0.78–0.98, respectively. ICC for inter-rater reliability at 17 standardized skin sites were good to excellent 0.63–0.93, except for the feet (0.48 and 0.52). Conclusion The preliminary findings in the literature are supported by our pilot study in which we have attested the reliability of durometry in SSc patients. However, prior to including durometry as an (additional) outcome measure in SSc clinical trials, its validation status in the assessment of skin fibrosis needs to be completely attested.


2018 ◽  
Vol 74 (1) ◽  
Author(s):  
Muhammad Dawood ◽  
Pieter J. Becker ◽  
Agatha J. Van Rooijen ◽  
Elzette Korkie

Background: Evidence-based practice requires the use of objective, valid and reliable tests for measuring the length of a muscle. Latissimus Dorsi is a muscle which undergoes length changes (loss of extensibility) and this muscle has a functional role in many aspects of sport and rehabilitation. The loss of extensibility may result in a decreased range of motion at the glenohumeral joint leading to dysfunction.Objectives: The aim of this study was to assess the inter-rater and intra-rater reliability of a technique adapted by Comerford and Mottram in 2012 for assessing the length of Latissimus Dorsi (LD) muscle.Method: Fifty-six students from a university’s physiotherapy department participated in this study. Four physiotherapists with clinical experience varying between 10 and 30 years independently performed the test for assessing the length of LD. The test was performed twice by each physiotherapist on every participant during two reading sessions.Results: The intra-class correlation coefficient (ICC) as determined in a mixed-effects, generalised least squares regression analysis was used to assess inter- and intra-rater reliability of the LD length test. A 0.05 level of significance was employed. A sample of 56 participants provided an ICC that varied between 0.76 and 0.55, which is regarded as moderate to poor reliability. The ICC between the experienced raters was found to be 0.48, with a novice rater having an ICC of 0.48 as well. The ICC between all the raters was 0.33, which constituted poor reliability.Conclusion: The poor to moderate reliability of the technique testing the length of LD test is not suitable for application in a research setting.Clinical implications: The small differences noted between Reading 1 and Reading 2 regarding the standard deviation of all the raters combined suggests that the LD length test may still prove to be useful in quantifying dysfunction in a clinical setting.


Sign in / Sign up

Export Citation Format

Share Document