Alternative tests for the significance of the intraclass correlation coefficients under unequal family sizes

Subhamoy Pal ◽  
Rong Luo ◽  
Subhash Bagui ◽  
Sudhir Paul
1991 ◽  
Vol 34 (5) ◽  
pp. 989-999 ◽  
Stephanie Shaw ◽  
Truman E. Coggins

This study examines whether observers reliably categorize selected speech production behaviors in hearing-impaired children. A group of experienced speech-language pathologists was trained to score the elicited imitations of 5 profoundly and 5 severely hearing-impaired subjects using the Phonetic Level Evaluation (Ling, 1976). Interrater reliability was calculated using intraclass correlation coefficients. Overall, the magnitude of the coefficients was found to be considerably below what would be accepted in published behavioral research. Failure to obtain acceptably high levels of reliability suggests that the Phonetic Level Evaluation may not yet be an accurate and objective speech assessment measure for hearing-impaired children.

Marcos A Soriano ◽  
G Gregory Haff ◽  
Paul Comfort ◽  
Francisco J Amaro-Gahete ◽  
Antonio Torres-González ◽  

The aims of this study were to (I) determine the differences and relationship between the overhead press and split jerk performance in athletes involved in weightlifting training, and (II) explore the magnitude of these differences in one-repetition maximum (1RM) performances between sexes. Sixty-one men (age: 30.4 ± 6.7 years; height: 1.8 ± 0.5 m; body mass 82.5 ± 8.5 kg; weightlifting training experience: 3.7 ± 3.5 yrs) and 21 women (age: 29.5 ± 5.2 yrs; height: 1.7 ± 0.5 m; body mass: 62.6 ± 5.7 kg; weightlifting training experience: 3.0 ± 1.5 yrs) participated. The 1RM performance of the overhead press and split jerk were assessed for all participants, with the overhead press assessed on two occasions to determine between-session reliability. The intraclass correlation coefficients (ICC) and 95% confidence intervals showed a high reliability for the overhead press ICC = 0.98 (0.97 – 0.99). A very strong correlation and significant differences were found between the overhead press and split jerk 1RM performances for all participants (r = 0.90 [0.93 – 0.85], 60.2 ± 18.3 kg, 95.7 ± 29.3 kg, p ≤ 0.001). Men demonstrated stronger correlations between the overhead press and split jerk 1RM performances (r = 0.83 [0.73-0.90], p ≤ 0.001) compared with women (r = 0.56 [0.17-0.80], p = 0.008). These results provide evidence that 1RM performance of the overhead press and split jerk performance are highly related, highlighting the importance of upper-limb strength in the split jerk maximum performance.

Dysphagia ◽  
2021 ◽  
Sofie Albinsson ◽  
Lisa Tuomi ◽  
Christine Wennerås ◽  
Helen Larsson

AbstractThe lack of a Swedish patient-reported outcome instrument for eosinophilic esophagitis (EoE) has limited the assessment of the disease. The aims of the study were to translate and validate the Eosinophilic Esophagitis Activity Index (EEsAI) to Swedish and to assess the symptom severity of patients with EoE compared to a nondysphagia control group. The EEsAI was translated and adapted to a Swedish cultural context (S-EEsAI) based on international guidelines. The S-EEsAI was validated using adult Swedish patients with EoE (n = 97) and an age- and sex-matched nondysphagia control group (n = 97). All participants completed the S-EEsAI, the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Oesophageal Module 18 (EORTC QLQ-OES18), and supplementary questions regarding feasibility and demographics. Reliability and validity of the S-EEsAI were evaluated by Cronbach’s alpha and Spearman correlation coefficients between the domains of the S-EEsAI and the EORTC QLQ-OES18. A test–retest analysis of 29 patients was evaluated through intraclass correlation coefficients. The S-EEsAI had sufficient reliability with Cronbach’s alpha values of 0.83 and 0.85 for the “visual dysphagia question” and the “avoidance, modification and slow eating score” domains, respectively. The test–retest reliability was sufficient, with good to excellent intraclass correlation coefficients (0.60–0.89). The S-EEsAI domains showed moderate correlation to 6/10 EORTC QLQ-OES18 domains, indicating adequate validity. The patient S-EEsAI results differed significantly from those of the nondysphagia controls (p < 0.001). The S-EEsAI appears to be a valid and reliable instrument for monitoring adult patients with EoE in Sweden.

Jens Sörensen ◽  
Jonny Nordström ◽  
Tomasz Baron ◽  
Stellan Mörner ◽  
Sven-Olof Granstam ◽  

Abstract Aim To develop a method for diagnosing left ventricular (LV) hypertrophy from cardiac perfusion 15O-water positron emission tomography (PET). Methods We retrospectively pooled data from 139 subjects in four research cohorts. LV remodeling patterns ranged from normal to severe eccentric and concentric hypertrophy. 15O-water PET scans (n = 197) were performed with three different PET devices. A low-end scanner (66 scans) was used for method development, and remaining scans with newer devices for a blinded evaluation. Dynamic data were converted into parametric images of perfusable tissue fraction for semi-automatic delineation of the LV wall and calculation of LV mass (LVM) and septal wall thickness (WT). LVM and WT from PET were compared to cardiac magnetic resonance (CMR, n = 47) and WT to 2D-echocardiography (2DE, n = 36). PET accuracy was tested using linear regression, Bland–Altman plots, and ROC curves. Observer reproducibility were evaluated using intraclass correlation coefficients. Results High correlations were found in the blinded analyses (r ≥ 0.87, P < 0.0001 for all). AUC for detecting increased LVM and WT (> 12 mm and > 15 mm) was ≥ 0.95 (P < 0.0001 for all). Reproducibility was excellent (ICC ≥ 0.93, P < 0.0001). Conclusion 15O-water PET might detect LV hypertrophy with high accuracy and precision.

Igor Junio de Oliveira Custódio ◽  
Gibson Moreira Praça ◽  
Leandro Vinhas de Paula ◽  
Sarah da Glória Teles Bredt ◽  
Fabio Yuzo Nakamura ◽  

This study aimed to analyze the intersession reliability of global positioning system (GPS-based) distances and accelerometer-based (acceleration) variables in small-sided soccer games (SSG) with and without the offside rule, as well as compare variables between the tasks. Twenty-four high-level U-17 soccer athletes played 3 versus 3 (plus goalkeepers) SSG in two formats (with and without the offside rule). SSG were performed on eight consecutive weeks (4 weeks for each group), twice a week. The physical demands were recorded using a GPS with an embedded triaxial accelerometer. GPS-based variables (total distance, average speed, and distances covered at different speeds) and accelerometer-based variables (Player Load™, root mean square of the acceleration recorded in each movement axis, and the root mean square of resultant acceleration) were calculated. Results showed that the inclusion of the offside rule reduced the total distance covered (large effect) and the distances covered at moderate speed zones (7–12.9 km/h – moderate effect; 13–17.9 km/h – large effect). In both SSG formats, GPS-based variables presented good to excellent reliability (intraclass correlation coefficients – ICC > 0.62) and accelerometer-based variables presented excellent reliability (ICC values > 0.89). Based on the results of this study, the offside rule decreases the physical demand of 3 versus 3 SSG and the physical demands required in these SSG present high intersession reliability.

2021 ◽  
Vol 11 (1) ◽  
Minjeong Kim ◽  
Ja Young Oh ◽  
Seon Ha Bae ◽  
Seung Hyeun Lee ◽  
Won Jun Lee ◽  

AbstractWe evaluated the reliability and validity of the 5-scale grading system to interpret the point-of-care immunoassay for tear matrix metalloproteinase (MMP)-9. Six observers graded red bands of photographs of the readout window in MMP-9 immunoassay kit (InflammaDry) two times with 2-week interval based on the 5-scale grading system (i.e. grade 0–4). Interobserver and intraobserver reliability were evaluated using intraclass correlation coefficients. The interobserver agreements were analyzed according to the severity of tear MMP-9 expression. To validate the system, a concentration calibration curve was made using MMP-9 solutions with reference concentrations, then the distribution of MMP-9 concentrations was analyzed according to the 5-scale grading system. Both intraobserver and interobserver reliability was excellent. The readout grades were significantly correlated with the quantified colorimetric densities. The interobserver variance of readout grades had no correlation with the severity of the measured densities. The band density continued to increase up to a maximal concentration (i.e. 5000 ng/mL) according to the calibration curve. The difference of grades reflected the change of MMP-9 concentrations sensitively, especially between grade 2 and 4. Together, our data indicate that the subjective 5-scale grading system in the point-of-care MMP-9 immunoassay is an easy and reliable method with acceptable accuracy.

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3065
Ernest Kwesi Ofori ◽  
Shuaijie Wang ◽  
Tanvi Bhatt

Inertial sensors (IS) enable the kinematic analysis of human motion with fewer logistical limitations than the silver standard optoelectronic motion capture (MOCAP) system. However, there are no data on the validity of IS for perturbation training and during the performance of dance. The aim of this present study was to determine the concurrent validity of IS in the analysis of kinematic data during slip and trip-like perturbations and during the performance of dance. Seven IS and the MOCAP system were simultaneously used to capture the reactive response and dance movements of fifteen healthy young participants (Age: 18–35 years). Bland Altman (BA) plots, root mean square errors (RMSE), Pearson’s correlation coefficients (R), and intraclass correlation coefficients (ICC) were used to compare kinematic variables of interest between the two systems for absolute equivalency and accuracy. Limits of agreements (LOA) of the BA plots ranged from −0.23 to 0.56 and −0.21 to 0.43 for slip and trip stability variables, respectively. The RMSE for slip and trip stabilities were from 0.11 to 0.20 and 0.11 to 0.16, respectively. For the joint mobility in dance, LOA varied from −6.98–18.54, while RMSE ranged from 1.90 to 13.06. Comparison of IS and optoelectronic MOCAP system for reactive balance and body segmental kinematics revealed that R varied from 0.59 to 0.81 and from 0.47 to 0.85 while ICC was from 0.50 to 0.72 and 0.45 to 0.84 respectively for slip–trip perturbations and dance. Results of moderate to high concurrent validity of IS and MOCAP systems. These results were consistent with results from similar studies. This suggests that IS are valid tools to quantitatively analyze reactive balance and mobility kinematics during slip–trip perturbation and the performance of dance at any location outside, including the laboratory, clinical and home settings.

2021 ◽  
Vol 11 (1) ◽  
Pieter-Jan Verhelst ◽  
H. Matthews ◽  
L. Verstraete ◽  
F. Van der Cruyssen ◽  
D. Mulier ◽  

AbstractAutomatic craniomaxillofacial (CMF) three dimensional (3D) dense phenotyping promises quantification of the complete CMF shape compared to the limiting use of sparse landmarks in classical phenotyping. This study assesses the accuracy and reliability of this new approach on the human mandible. Classic and automatic phenotyping techniques were applied on 30 unaltered and 20 operated human mandibles. Seven observers indicated 26 anatomical landmarks on each mandible three times. All mandibles were subjected to three rounds of automatic phenotyping using Meshmonk. The toolbox performed non-rigid surface registration of a template mandibular mesh consisting of 17,415 quasi landmarks on each target mandible and the quasi landmarks corresponding to the 26 anatomical locations of interest were identified. Repeated-measures reliability was assessed using root mean square (RMS) distances of repeated landmark indications to their centroid. Automatic phenotyping showed very low RMS distances confirming excellent repeated-measures reliability. The average Euclidean distance between manual and corresponding automatic landmarks was 1.40 mm for the unaltered and 1.76 mm for the operated sample. Centroid sizes from the automatic and manual shape configurations were highly similar with intraclass correlation coefficients (ICC) of > 0.99. Reproducibility coefficients for centroid size were < 2 mm, accounting for < 1% of the total variability of the centroid size of the mandibles in this sample. ICC’s for the multivariate set of 325 interlandmark distances were all > 0.90 indicating again high similarity between shapes quantified by classic or automatic phenotyping. Combined, these findings established high accuracy and repeated-measures reliability of the automatic approach. 3D dense CMF phenotyping of the human mandible using the Meshmonk toolbox introduces a novel improvement in quantifying CMF shape.

Charles L. Nagle ◽  
Ivana Rehman

Abstract Listener-based ratings have become a prominent means of defining second language (L2) users’ global speaking ability. In most cases, local listeners are recruited to evaluate speech samples in person. However, in many teaching and research contexts, recruiting local listeners may not be possible or advisable. The goal of this study was to hone a reliable method of recruiting listeners to evaluate L2 speech samples online through Amazon Mechanical Turk (AMT) using a blocked rating design. Three groups of listeners were recruited: local laboratory raters and two AMT groups, one inclusive of the dialects to which L2 speakers had been exposed and another inclusive of a variety of dialects. Reliability was assessed using intraclass correlation coefficients, Rasch models, and mixed-effects models. Results indicate that online ratings can be highly reliable as long as appropriate quality control measures are adopted. The method and results can guide future work with online samples.

2018 ◽  
Vol 25 (3) ◽  
pp. 286-290 ◽  
Elif Bilgic ◽  
Madoka Takao ◽  
Pepa Kaneva ◽  
Satoshi Endo ◽  
Toshitatsu Takao ◽  

Background. Needs assessment identified a gap regarding laparoscopic suturing skills targeted in simulation. This study collected validity evidence for an advanced laparoscopic suturing task using an Endo StitchTM device. Methods. Experienced (ES) and novice surgeons (NS) performed continuous suturing after watching an instructional video. Scores were based on time and accuracy, and Global Operative Assessment of Laparoscopic Surgery. Data are shown as medians [25th-75th percentiles] (ES vs NS). Interrater reliability was calculated using intraclass correlation coefficients (confidence interval). Results. Seventeen participants were enrolled. Experienced surgeons had significantly greater task (980 [964-999] vs 666 [391-711], P = .0035) and Global Operative Assessment of Laparoscopic Surgery scores (25 [24-25] vs 14 [12-17], P = .0029). Interrater reliability for time and accuracy were 1.0 and 0.9 (0.74-0.96), respectively. All experienced surgeons agreed that the task was relevant to practice. Conclusion. This study provides validity evidence for the task as a measure of laparoscopic suturing skill using an automated suturing device. It could help trainees acquire the skills they need to better prepare for clinical learning.

Sign in / Sign up

Export Citation Format

Share Document