Measurement of Scapular Asymmetry and Assessment of Shoulder Dysfunction Using the Lateral Scapular Slide Test: A Reliability and Validity Study

Abstract Background and Purpose. The Lateral Scapular Slide Test (LSST) is used to determine scapular position with the arm abducted 0, 45, and 90 degrees in the coronal plane. Assessment of scapular position is based on the derived difference measurement of bilateral scapular distances. The purpose of this study was to assess the reliability of measurements obtained using the LSST and whether they could be used to identify people with and without shoulder impairments. Subjects. Forty-six subjects ranging in age from 18 to 65 years (X̄=30.0, SD=11.1) participated in this study. One group consisted of 20 subjects being treated for shoulder impairments, and one group consisted of 26 subjects without shoulder impairments. Methods. Two measurements in each test position were obtained bilaterally. From the bilateral measurements, we derived the difference measurement. Intraclass correlation coefficients (ICC [1,1]) and the standard error of measurement (SEM) were calculated for intrarater and interrater reliability of the difference in side-to-side measures of scapular distance. Sensitivity and specificity of the LSST for classifying subjects with and without shoulder impairments were also determined. Results. The ICCs for intrarater reliability were .75, .77, and .80 and .52, .66, and .62, respectively, for subjects without and with shoulder impairments in 0, 45, and 90 degrees of abduction. The ICCs for interrater reliability were .67, .43, and .74 and .79, .45, and .57, respectively, for subjects without and with shoulder impairments in 0, 45 and 90 degrees of abduction. The SEMs ranged from 0.57 to 0.86 cm for intrarater reliability and from 0.79 to 1.20 cm for interrater reliability. Using the criterion of greater than 1.0 cm difference, sensitivity and specificity were 35% and 48%, 41% and 54%, and 43% and 56%, respectively, for 0, 45, and 90 degrees of abduction. Sensitivity and specificity based on the criterion of greater than 1.5 cm difference were 28% and 53%, 50% and 58%, and 34% and 52%, respectively, for the 3 scapular positions. Conclusion and Discussion. Our results suggest that measurements of scapular positioning based on the difference in side-to-side scapular distance measures are not reliable. Furthermore, the results suggest that sensitivity and specificity of the LSST measurements are poor and that the LSST should not be used to identify people with and without shoulder dysfunction.

Download Full-text

Assessment of reliability and validity of the 5-scale grading system of the point-of-care immunoassay for tear matrix metalloproteinase-9

Scientific Reports ◽

10.1038/s41598-021-92020-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Minjeong Kim ◽

Ja Young Oh ◽

Seon Ha Bae ◽

Seung Hyeun Lee ◽

Won Jun Lee ◽

...

Keyword(s):

Matrix Metalloproteinase ◽

Calibration Curve ◽

Point Of Care ◽

Interobserver Reliability ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Grading System ◽

Intraclass Correlation Coefficients ◽

The Difference

AbstractWe evaluated the reliability and validity of the 5-scale grading system to interpret the point-of-care immunoassay for tear matrix metalloproteinase (MMP)-9. Six observers graded red bands of photographs of the readout window in MMP-9 immunoassay kit (InflammaDry) two times with 2-week interval based on the 5-scale grading system (i.e. grade 0–4). Interobserver and intraobserver reliability were evaluated using intraclass correlation coefficients. The interobserver agreements were analyzed according to the severity of tear MMP-9 expression. To validate the system, a concentration calibration curve was made using MMP-9 solutions with reference concentrations, then the distribution of MMP-9 concentrations was analyzed according to the 5-scale grading system. Both intraobserver and interobserver reliability was excellent. The readout grades were significantly correlated with the quantified colorimetric densities. The interobserver variance of readout grades had no correlation with the severity of the measured densities. The band density continued to increase up to a maximal concentration (i.e. 5000 ng/mL) according to the calibration curve. The difference of grades reflected the change of MMP-9 concentrations sensitively, especially between grade 2 and 4. Together, our data indicate that the subjective 5-scale grading system in the point-of-care MMP-9 immunoassay is an easy and reliable method with acceptable accuracy.

Download Full-text

Development and Initial Validation of a Project-Based Rubric to Assess the Systems-Based Practice Competency of Residents in the Clinical Chemistry Rotation of a Pathology Residency

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2013-0046-oa ◽

2014 ◽

Vol 138 (6) ◽

pp. 809-813

Author(s):

Carolyn R. Vitek ◽

Jane C. Dale ◽

Henry A. Homburger ◽

Sandra C. Bryant ◽

Amy K. Saenger ◽

...

Keyword(s):

Critical Thinking ◽

Interrater Reliability ◽

Clinical Chemistry ◽

Core Competencies ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Thinking Skills ◽

Project Evaluation ◽

Critical Thinking Skills

Context.— Systems-based practice (SBP) is 1 of 6 core competencies required in all resident training programs accredited by the Accreditation Council for Graduate Medical Education. Reliable methods of assessing resident competency in SBP have not been described in the medical literature. Objective.— To develop and validate an analytic grading rubric to assess pathology residents' analyses of SBP problems in clinical chemistry. Design.— Residents were assigned an SBP project based upon unmet clinical needs in the clinical chemistry laboratories. Using an iterative method, we created an analytic grading rubric based on critical thinking principles. Four faculty raters used the SBP project evaluation rubric to independently grade 11 residents' projects during their clinical chemistry rotations. Interrater reliability and Cronbach α were calculated to determine the reliability and validity of the rubric. Project mean scores and range were also assessed to determine whether the rubric differentiated resident critical thinking skills related to the SBP projects. Results.— Overall project scores ranged from 6.56 to 16.50 out of a possible 20 points. Cronbach α ranged from 0.91 to 0.96, indicating that the 4 rubric categories were internally consistent without significant overlap. Intraclass correlation coefficients ranged from 0.63 to 0.81, indicating moderate to strong interrater reliability. Conclusions.— We report development and statistical analysis of a novel SBP project evaluation rubric. The results indicate the rubric can be used to reliably assess pathology residents' critical thinking skills in SBP.

Download Full-text

Reliability Assessment of Scores From Video-Recorded TGMD-3 Performances

Journal of Motor Learning and Development ◽

10.1123/jmld.2016-0007 ◽

2017 ◽

Vol 5 (1) ◽

pp. 59-68 ◽

Cited By ~ 16

Author(s):

Pauli Olavi Rintala ◽

Arja Kaarina Sääkslahti ◽

Susanna Iivonen

Keyword(s):

Motor Development ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Intrarater Reliability ◽

Gross Motor ◽

Gross Motor Development ◽

Percent Agreement ◽

Two Samples ◽

Ball Skills

This study examined the intrarater and interrater reliability of the Test of Gross Motor Development—3rd Edition (TGMD-3). Participants were 60 Finnish children aged between 3 and 9 years, divided into three separate samples of 20. Two samples of 20 were used to examine the intrarater reliability of two different assessors, and the third sample of 20 was used to establish interrater reliability. Children’s TGMD-3 performances were video-recorded and later assessed using an intraclass correlation coefficient, a kappa statistic, and a percent agreement calculation. The intrarater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.69 to 0.77, and percent agreement ranged from 87 to 91%. The interrater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.56 to 0.64. Percent agreement of 83% was observed for locomotor skills, ball skills, and total skills, respectively. Hop, horizontal jump, and two-hand strike assessments showed the most difference between the assessors. These results show acceptable reliability for the TGMD-3 to analyze children’s gross motor skills.

Download Full-text

Influence of Rater Training on Inter- and Intrarater Reliability When Using the Rat Grimace Scale

Journal of the American Association for Laboratory Animal Science ◽

10.30802/aalas-jaalas-18-000044 ◽

2019 ◽

Vol 58 (2) ◽

pp. 178-183 ◽

Cited By ~ 8

Author(s):

Emily Q Zhang ◽

Vivian SY Leung ◽

Daniel SJ Pang

Keyword(s):

Acute Pain ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Training Group ◽

Intrarater Reliability ◽

Rater Training ◽

Trainee Group ◽

Pain Models ◽

Ongoing Pain ◽

And Performance

Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.

Download Full-text

Could Residents Adequately Assess the Severity of Hidradenitis Suppurativa? Interrater and Intrarater Reliability Assessment of Major Scoring Systems

Dermatology ◽

10.1159/000501771 ◽

2019 ◽

Vol 236 (1) ◽

pp. 8-14 ◽

Cited By ~ 1

Author(s):

Katarzyna Włodarek ◽

Aleksandra Stefaniak ◽

Łukasz Matusiak ◽

Jacek C. Szepietowski

Keyword(s):

Interrater Reliability ◽

Hidradenitis Suppurativa ◽

Intraclass Correlation ◽

Scoring Systems ◽

Staging System ◽

Severity Index ◽

Assessment Tools ◽

Intrarater Reliability ◽

Global Assessment Scale ◽

Interrater Variability

A wide variety of assessment tools have been proposed for hidradenitis suppurativa (HS) until now, but none of them meets the criteria for an ideal score. Because there is no gold standard scoring system, the choice of the measure instrument depends on the purpose of use and even on the physician’s experience in the subject of HS. The aim of this study was to assess the intrarater and interrater reliability of 6 scoring systems commonly used for grading severity of HS: the Hurley Staging System, the Refined Hurley Staging, the Hidradenitis Suppurativa Severity Score System (IHS4), the Hidradenitis Suppurativa Severity Index (HSSI), the Sartorius Hidradenitis Suppurativa Score and the Hidradenitis Suppurativa Physician’s Global Assessment Scale (HS-PGA). On the scoring day, 9 HS patients underwent a physical examination and disease severity assessment by a group of 16 dermatology residents using all evaluated instruments. Then, intrarater reliability was calculated using intraclass correlation coefficient (ICC), and interrater variability was evaluated using the coefficient of variation (CV). In all 6 scorings the ICCs were >0.75, indicating high intrarater reliability of all presented scales. The study has also demonstrated moderate agreement between raters in most of the evaluated measure instruments. The most reproducible methods, according to CVs, seem to be the Hurley staging, IHS4, and HSSI. None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods. The interrater reliability was usually good, but the most repeatable results between researchers were obtained for the easiest scales, including Hurley scoring, IHS4 and HSSI.

Download Full-text

Intra- and Interrater Reliability of Infrared Image Analysis of Facial Acupoints in Individuals with Facial Paralysis

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2020/9079037 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Xulong Liu ◽

Jinghui Feng ◽

Jingmin Luan ◽

Chenxi Dong ◽

Hefei Fu ◽

...

Keyword(s):

Image Analysis ◽

Temperature Measurement ◽

Facial Paralysis ◽

Temperature Difference ◽

Interrater Reliability ◽

Influencing Factor ◽

Intraclass Correlation ◽

Infrared Image ◽

Infrared Thermal Imaging ◽

Difference Measurement

Infrared thermography (IRT), as a noncontact tool for temperature measurement, is widely applied in the study of acupuncture modernization. The aim of this study was to assess the intra- and interrater reliability of infrared image analysis of facial acupoints of subjects with facial paralysis and determine the factors influencing the variability of the measured values. A total of 26 patients with facial paralysis on one side, aged 26 to 53 years, participated voluntarily in the study. Facial infrared thermal images of all participants were analyzed by two trained raters at two different time points at a one-week interval. The intraclass correlation coefficient (ICC) was used to determine the intra- and interrater reliability of IRT measurements. The ICC values varied depending on the analyzed acupoints. The reliability of temperature measurement ranged from moderate to excellent (intrarater, ICC ranged from 0.669 to 0.990; interrater, ICC ranged from 0.661 to 0.987). The reliability of temperature difference measurement ranged from low to excellent (intrarater, ICC ranged from 0.412 to 0.882; interrater, ICC ranged from 0.334 to 0.828). The main influencing factor of reliability is the incomplete consistency in selecting acupoint positions when repeatedly positioning the same acupoint manually. Despite low reliability of temperature difference measurement at some acupoints, some auxiliary measures can be used to reduce the error of manual positioning. Thus, infrared thermal imaging still has the potential to assist in objective and quantitative research on acupuncture.

Download Full-text

Teleassessment of Gait and Gait Aids: Validity and Interrater Reliability

Physical Therapy ◽

10.1093/ptj/pzaa005 ◽

2020 ◽

Vol 100 (4) ◽

pp. 708-717

Author(s):

Kavita Venkataraman ◽

Kristopher Amis ◽

Lawrence R Landerman ◽

Kevin Caves ◽

Gerald C Koh ◽

...

Keyword(s):

Repeated Measures ◽

Interrater Reliability ◽

Video Quality ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Slow Motion ◽

Frame Rate ◽

Normal Speed ◽

Lower Frame ◽

Rehabilitation Needs

Abstract Background Gait and mobility aid assessments are important components of rehabilitation. Given the increasing use of telehealth to meet rehabilitation needs, it is important to examine the feasibility of such assessments within the constraints of telerehabilitation. Objective The objective of this study was to examine the reliability and validity of the Tinetti Performance-Oriented Mobility Assessment gait scale (POMA-G) and cane height assessment under various video and transmission settings to demonstrate the feasibility of teleassessment. Design This repeated-measures study compared the test performances of in-person, slow motion (SM) review, and normal-speed (NS) video ratings at various fixed frame rates (8, 15, and 30 frames per second) and bandwidth (128, 384, and 768 kB/s) configurations. Methods Overall bias, validity, and interrater reliability were assessed for in-person, SM video, and NS video ratings, with SM video rating as the gold standard, as well as for different frame rate and bandwidth configurations within NS videos. Results There was moderate to good interrater reliability for the POMA-G (intraclass correlation coefficient [ICC] = 0.66–0.77 across all configurations) and moderate validity for in-person (β = 0.62; 95% confidence interval [CI] = 0.37–0.87) and NS video (β = 0.74; 95% CI = 0.67–0.80) ratings compared with the SM video rating. For cane height, interrater reliability was good (ICC = 0.66–0.77), although it was significantly lower at the lowest frame rate (8 frames per second) (ICC = 0.66; 95% CI = 0.54–0.76) and bandwidth (128 kB/s) (ICC = 0.69; 95% CI = 0.57–0.78) configurations. Validity for cane height was good for both in-person (β = 0.80; 95% CI = 0.62–0.98) and NS video (β = 0.86; 95% CI = 0.81–0.90) ratings compared with SM video rating. Limitations Some lower frame rate and bandwidth configurations may limit the reliability of remote cane height assessments. Conclusions Teleassessment for POMA-G and cane height using typically available internet and video quality is feasible, valid, and reliable.

Download Full-text

Intrarater and Interrater Reliability of Infrared Image Analysis of Forearm Acupoints before and after Moxibustion

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2020/6328756 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Jiali Lou ◽

Yongliang Jiang ◽

Hantong Hu ◽

Xiaoyu Li ◽

Yajun Zhang ◽

...

Keyword(s):

Image Analysis ◽

Correlation Coefficient ◽

Temperature Change ◽

Intraclass Correlation Coefficient ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Infrared Image ◽

Infrared Images ◽

Intrarater Reliability ◽

Before And After

The objective of this study was to determine the intrarater and interrater reliabilities of infrared image analysis of forearm acupoints before and after moxibustion. In this work, infrared images of acupoints in the forearm of 20 volunteers (M/F, 10/10) were collected prior to and after moxibustion by infrared thermography (IRT). Two trained raters performed the analysis of infrared images in two different periods at a one-week interval. The intraclass correlation coefficient (ICC) was calculated to determine the intrarater and interrater reliabilities. With regard to the intrarater reliability, ICC values were between 0.758 and 0.994 (substantial to excellent). For the interrater reliability, ICC values ranged from 0.707 to 0.964 (moderate to excellent). Given that the intrarater and interrater reliability levels show excellent concordance, IRT could be a reliable tool to monitor the temperature change of forearm acupoints induced by moxibustion.

Download Full-text

Assessment of the Intrarater and Interrater Reliability of an Established Clinical Task Analysis Methodology

Anesthesiology ◽

10.1097/00000542-200205000-00016 ◽

2002 ◽

Vol 96 (5) ◽

pp. 1129-1139 ◽

Cited By ~ 46

Author(s):

Jason Slagle ◽

Matthew B. Weinger ◽

My-Than T. Dinh ◽

Vanessa V. Brumer ◽

Kevin Williams

Keyword(s):

Real Time ◽

Task Analysis ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Intrarater Reliability ◽

Intraclass Correlation Coefficients ◽

Percent Time ◽

Analysis Methodology ◽

And Task

Background Task analysis may be useful for assessing how anesthesiologists alter their behavior in response to different clinical situations. In this study, the authors examined the intraobserver and interobserver reliability of an established task analysis methodology. Methods During 20 routine anesthetic procedures, a trained observer sat in the operating room and categorized in real-time the anesthetist's activities into 38 task categories. Two weeks later, the same observer performed task analysis from videotapes obtained intraoperatively. A different observer performed task analysis from the videotapes on two separate occasions. Data were analyzed for percent of time spent on each task category, average task duration, and number of task occurrences. Rater reliability and agreement were assessed using intraclass correlation coefficients. Results Intrarater reliability was generally good for categorization of percent time on task and task occurrence (mean intraclass correlation coefficients of 0.84-0.97). There was a comparably high concordance between real-time and video analyses. Interrater reliability was generally good for percent time and task occurrence measurements. However, the interrater reliability of the task duration metric was unsatisfactory, primarily because of the technique used to capture multitasking. Conclusions A task analysis technique used in anesthesia research for several decades showed good intrarater reliability. Off-line analysis of videotapes is a viable alternative to real-time data collection. Acceptable interrater reliability requires the use of strict task definitions, sophisticated software, and rigorous observer training. New techniques must be developed to more accurately capture multitasking. Substantial effort is required to conduct task analyses that will have sufficient reliability for purposes of research or clinical evaluation.

Download Full-text

Alberta Infant Motor Scale: Reliability and Validity When Used on Preterm Infants in Taiwan

Physical Therapy ◽

10.1093/ptj/80.2.168 ◽

2000 ◽

Vol 80 (2) ◽

pp. 168-178 ◽

Cited By ~ 76

Author(s):

Suh-Fang Jeng ◽

Kuo-Inn Tsou Yau ◽

Li-Chiou Chen ◽

Shu-Fang Hsiao

Keyword(s):

Preterm Infants ◽

Interrater Reliability ◽

Physical Therapist ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Intraclass Correlation Coefficients ◽

Scale Reliability ◽

Scale Scores ◽

Acceptable Reliability

Abstract Background and Purpose. The goal of this study was to examine the reliability and validity of measurements obtained with the Alberta Infant Motor Scale (AIMS) for evaluation of preterm infants in Taiwan. Subjects. Two independent groups of preterm infants were used to investigate the reliability (n=45) and validity (n=41) for the AIMS. Methods. In the reliability study, the AIMS was administered to the infants by a physical therapist, and infant performance was videotaped. The performance was then rescored by the same therapist and by 2 other therapists to examine the intrarater and interrater reliability. In the validity study, the AIMS and the Bayley Motor Scale were administered to the infants at 6 and 12 months of age to examine criterion-related validity. Results. Intraclass correlation coefficients (ICCs) for intrarater and interrater reliability of measurements obtained with the AIMS were high (ICC=.97–.99). The AIMS scores correlated with the Bayley Motor Scale scores at 6 and 12 months (r=.78 and .90), although the AIMS scores at 6 months were only moderately predictive of the motor function at 12 months (r=.56). Conclusion and Discussion. The results suggest that measurements obtained with the AIMS have acceptable reliability and concurrent validity but limited predictive value for evaluating preterm Taiwanese infants.

Download Full-text