DOING L2 SPEECH RESEARCH ONLINE: WHY AND HOW TO COLLECT ONLINE RATINGS DATA

Abstract Listener-based ratings have become a prominent means of defining second language (L2) users’ global speaking ability. In most cases, local listeners are recruited to evaluate speech samples in person. However, in many teaching and research contexts, recruiting local listeners may not be possible or advisable. The goal of this study was to hone a reliable method of recruiting listeners to evaluate L2 speech samples online through Amazon Mechanical Turk (AMT) using a blocked rating design. Three groups of listeners were recruited: local laboratory raters and two AMT groups, one inclusive of the dialects to which L2 speakers had been exposed and another inclusive of a variety of dialects. Reliability was assessed using intraclass correlation coefficients, Rasch models, and mixed-effects models. Results indicate that online ratings can be highly reliable as long as appropriate quality control measures are adopted. The method and results can guide future work with online samples.

Download Full-text

Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk

Journal of Second Language Pronunciation ◽

10.1075/jslp.18016.nag ◽

2019 ◽

Vol 5 (2) ◽

pp. 294-323 ◽

Cited By ~ 3

Author(s):

Charles Nagle

Keyword(s):

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Spanish Speakers ◽

Mechanical Turk ◽

Amazon Mechanical Turk ◽

Native Spanish Speakers ◽

Intraclass Correlation Coefficients ◽

Future Data ◽

Rater Severity

Abstract Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data collection.

Download Full-text

Reliability of Mechanical Trunk Responses During Known and Unknown Trunk Perturbations

Journal of Applied Biomechanics ◽

10.1123/jab.2015-0120 ◽

2016 ◽

Vol 32 (1) ◽

pp. 86-92

Author(s):

Markus D. Jakobsen ◽

Mikkel Brandt ◽

Emil Sundstrup ◽

Kenneth Jay ◽

Per Aagaard ◽

...

Keyword(s):

Mechanical Response ◽

Intraclass Correlation ◽

Stopping Time ◽

Correlation Coefficients ◽

Low Back ◽

Work Related ◽

Intraclass Correlation Coefficients ◽

Low Back Injuries ◽

Loading And Unloading ◽

Future Work

This study evaluates the between-day reliability of a newly developed trunk perturbation test and compares mechanical response during known and unknown conditions. Mechanical trunk responses were measured in 17 female subjects during unloading and loading perturbations of the abdomen (A: preloaded abdomen condition) and low back (B: preloaded back condition). The loading perturbation increased the preload from 5.5 kg to a 10.9 kg pull on the trunk whereas the unloading perturbation decreased the pull from 5.5 kg to 0.1 kg. A sequence of loading (known), unloading (known), and randomized loading/unloading (unknown) perturbations were performed for A and B. Between-day reliability of stopping time, trunk displacement, and velocity was quantified using intraclass correlation coefficients (ICCs). ICCs were good to excellent for all loading and unloading measures during the known (0.70–0.98) and unknown (0.64–0.94) perturbations of A and B. In general, larger trunk displacements were seen after the unknown perturbations compared with the known perturbation. The method may be used as a diagnostic tool for screening workers who are in risk of future work-related low back injuries.

Download Full-text

Confidence Intervals and F Tests for Intraclass Correlation Coefficients Based on Three-Way Mixed Effects Models

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998610381399 ◽

2011 ◽

Vol 36 (5) ◽

pp. 638-671 ◽

Cited By ~ 12

Author(s):

Hong Zhou ◽

Paige Muellerleile ◽

Debra Ingram ◽

Seok P. Wong

Keyword(s):

Confidence Intervals ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Mixed Effects ◽

Mixed Effects Models ◽

Intraclass Correlation Coefficients

Download Full-text

A Correction Equation for Jump Height Measured Using the Just Jump System

International Journal of Sports Physiology and Performance ◽

10.1123/ijspp.2015-0194 ◽

2016 ◽

Vol 11 (4) ◽

pp. 555-557 ◽

Cited By ~ 12

Author(s):

John J. McMahon ◽

Paul A. Jones ◽

Paul Comfort

Keyword(s):

Intraclass Correlation ◽

Correlation Coefficients ◽

Force Platform ◽

Coefficient Of Determination ◽

Jump Height ◽

Validity And Reliability ◽

Height Measurement ◽

Intraclass Correlation Coefficients ◽

Future Work ◽

Criterion Method

Purpose:To determine the concurrent validity and reliability of the popular Just Jump system (JJS) for determining jump height and, if necessary, provide a correction equation for future reference.Methods:Eighteen male college athletes performed 3 bilateral countermovement jumps (CMJs) on 2 JJSs (alternative method) that were placed on top of a force platform (criterion method). Two JJSs were used to establish consistency between systems. Jump height was calculated from flight time obtained from the JJS and force platform.Results:Intraclass correlation coefficients (ICCs) demonstrated excellent within-session reliability of the CMJ height measurement derived from both the JJS (ICC = .96, P < .001) and the force platform (ICC = .96, P < .001). Dependent t tests revealed that the JJS yielded a significantly greater CMJ jump height (0.46 ± 0.09 m vs 0.33 ± 0.08 m) than the force platform (P < .001, Cohen d = 1.39, power = 1.00). There was, however, an excellent relationship between CMJ heights derived from the JJS and force platform (r = .998, P < .001, power = 1.00), with a coefficient of determination (R2) of .995. Therefore, the following correction equation was produced: Criterion jump height = (0.8747 × alternative jump height) – 0.0666.Conclusions:The JJS provides a reliable but overestimated measure of jump height. It is suggested, therefore, that practitioners who use the JJS as part of future work apply the correction equation presented in this study to resultant jump-height values.

Download Full-text

Interobserver Reliability Using the Phonetic Level Evaluation With Severely and Profoundly Hearing-Impaired Children

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3405.989 ◽

1991 ◽

Vol 34 (5) ◽

pp. 989-999 ◽

Cited By ~ 6

Author(s):

Stephanie Shaw ◽

Truman E. Coggins

Keyword(s):

Interrater Reliability ◽

Interobserver Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Hearing Impaired ◽

Intraclass Correlation Coefficients ◽

Assessment Measure ◽

Impaired Children ◽

Speech Assessment ◽

Hearing Impaired Children

This study examines whether observers reliably categorize selected speech production behaviors in hearing-impaired children. A group of experienced speech-language pathologists was trained to score the elicited imitations of 5 profoundly and 5 severely hearing-impaired subjects using the Phonetic Level Evaluation (Ling, 1976). Interrater reliability was calculated using intraclass correlation coefficients. Overall, the magnitude of the coefficients was found to be considerably below what would be accepted in published behavioral research. Failure to obtain acceptably high levels of reliability suggests that the Phonetic Level Evaluation may not yet be an accurate and objective speech assessment measure for hearing-impaired children.

Download Full-text

Is there a relationship between the overhead press and split jerk maximum performance? Influence of sex

International Journal of Sports Science & Coaching ◽

10.1177/17479541211020452 ◽

2021 ◽

pp. 174795412110204

Author(s):

Marcos A Soriano ◽

G Gregory Haff ◽

Paul Comfort ◽

Francisco J Amaro-Gahete ◽

Antonio Torres-González ◽

...

Keyword(s):

Confidence Intervals ◽

Body Mass ◽

Upper Limb ◽

High Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Training Experience ◽

Maximum Performance ◽

Repetition Maximum ◽

Intraclass Correlation Coefficients

The aims of this study were to (I) determine the differences and relationship between the overhead press and split jerk performance in athletes involved in weightlifting training, and (II) explore the magnitude of these differences in one-repetition maximum (1RM) performances between sexes. Sixty-one men (age: 30.4 ± 6.7 years; height: 1.8 ± 0.5 m; body mass 82.5 ± 8.5 kg; weightlifting training experience: 3.7 ± 3.5 yrs) and 21 women (age: 29.5 ± 5.2 yrs; height: 1.7 ± 0.5 m; body mass: 62.6 ± 5.7 kg; weightlifting training experience: 3.0 ± 1.5 yrs) participated. The 1RM performance of the overhead press and split jerk were assessed for all participants, with the overhead press assessed on two occasions to determine between-session reliability. The intraclass correlation coefficients (ICC) and 95% confidence intervals showed a high reliability for the overhead press ICC = 0.98 (0.97 – 0.99). A very strong correlation and significant differences were found between the overhead press and split jerk 1RM performances for all participants (r = 0.90 [0.93 – 0.85], 60.2 ± 18.3 kg, 95.7 ± 29.3 kg, p ≤ 0.001). Men demonstrated stronger correlations between the overhead press and split jerk 1RM performances (r = 0.83 [0.73-0.90], p ≤ 0.001) compared with women (r = 0.56 [0.17-0.80], p = 0.008). These results provide evidence that 1RM performance of the overhead press and split jerk performance are highly related, highlighting the importance of upper-limb strength in the split jerk maximum performance.

Download Full-text

Patient-Reported Dysphagia in Adults with Eosinophilic Esophagitis: Translation and Validation of the Swedish Eosinophilic Esophagitis Activity Index

Dysphagia ◽

10.1007/s00455-021-10277-5 ◽

2021 ◽

Author(s):

Sofie Albinsson ◽

Lisa Tuomi ◽

Christine Wennerås ◽

Helen Larsson

Keyword(s):

Eosinophilic Esophagitis ◽

Activity Index ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Control Group ◽

Cronbach’S Alpha ◽

Intraclass Correlation Coefficients ◽

Cronbach's Alpha ◽

Patient Reported ◽

Eortc Qlq

AbstractThe lack of a Swedish patient-reported outcome instrument for eosinophilic esophagitis (EoE) has limited the assessment of the disease. The aims of the study were to translate and validate the Eosinophilic Esophagitis Activity Index (EEsAI) to Swedish and to assess the symptom severity of patients with EoE compared to a nondysphagia control group. The EEsAI was translated and adapted to a Swedish cultural context (S-EEsAI) based on international guidelines. The S-EEsAI was validated using adult Swedish patients with EoE (n = 97) and an age- and sex-matched nondysphagia control group (n = 97). All participants completed the S-EEsAI, the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Oesophageal Module 18 (EORTC QLQ-OES18), and supplementary questions regarding feasibility and demographics. Reliability and validity of the S-EEsAI were evaluated by Cronbach’s alpha and Spearman correlation coefficients between the domains of the S-EEsAI and the EORTC QLQ-OES18. A test–retest analysis of 29 patients was evaluated through intraclass correlation coefficients. The S-EEsAI had sufficient reliability with Cronbach’s alpha values of 0.83 and 0.85 for the “visual dysphagia question” and the “avoidance, modification and slow eating score” domains, respectively. The test–retest reliability was sufficient, with good to excellent intraclass correlation coefficients (0.60–0.89). The S-EEsAI domains showed moderate correlation to 6/10 EORTC QLQ-OES18 domains, indicating adequate validity. The patient S-EEsAI results differed significantly from those of the nondysphagia controls (p < 0.001). The S-EEsAI appears to be a valid and reliable instrument for monitoring adult patients with EoE in Sweden.

Download Full-text

Diagnosis of left ventricular hypertrophy using non-ECG-gated 15O-water PET

Journal of Nuclear Cardiology ◽

10.1007/s12350-021-02734-3 ◽

2021 ◽

Author(s):

Jens Sörensen ◽

Jonny Nordström ◽

Tomasz Baron ◽

Stellan Mörner ◽

Sven-Olof Granstam ◽

...

Keyword(s):

Method Development ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Roc Curves ◽

Left Ventricular ◽

Intraclass Correlation Coefficients ◽

Concentric Hypertrophy ◽

2D Echocardiography ◽

Positron Emission ◽

Septal Wall Thickness

Abstract Aim To develop a method for diagnosing left ventricular (LV) hypertrophy from cardiac perfusion 15O-water positron emission tomography (PET). Methods We retrospectively pooled data from 139 subjects in four research cohorts. LV remodeling patterns ranged from normal to severe eccentric and concentric hypertrophy. 15O-water PET scans (n = 197) were performed with three different PET devices. A low-end scanner (66 scans) was used for method development, and remaining scans with newer devices for a blinded evaluation. Dynamic data were converted into parametric images of perfusable tissue fraction for semi-automatic delineation of the LV wall and calculation of LV mass (LVM) and septal wall thickness (WT). LVM and WT from PET were compared to cardiac magnetic resonance (CMR, n = 47) and WT to 2D-echocardiography (2DE, n = 36). PET accuracy was tested using linear regression, Bland–Altman plots, and ROC curves. Observer reproducibility were evaluated using intraclass correlation coefficients. Results High correlations were found in the blinded analyses (r ≥ 0.87, P < 0.0001 for all). AUC for detecting increased LVM and WT (> 12 mm and > 15 mm) was ≥ 0.95 (P < 0.0001 for all). Reproducibility was excellent (ICC ≥ 0.93, P < 0.0001). Conclusion 15O-water PET might detect LV hypertrophy with high accuracy and precision.

Download Full-text

Intersession reliability of GPS-based and accelerometer-based physical variables in small-sided games with and without the offside rule

Proceedings of the Institution of Mechanical Engineers Part P Journal of Sports Engineering and Technology ◽

10.1177/1754337120987646 ◽

2021 ◽

pp. 175433712098764

Author(s):

Igor Junio de Oliveira Custódio ◽

Gibson Moreira Praça ◽

Leandro Vinhas de Paula ◽

Sarah da Glória Teles Bredt ◽

Fabio Yuzo Nakamura ◽

...

Keyword(s):

Root Mean Square ◽

Global Positioning System ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Mean Square ◽

Physical Demands ◽

Intraclass Correlation Coefficients ◽

Total Distance ◽

Global Positioning ◽

High Level

This study aimed to analyze the intersession reliability of global positioning system (GPS-based) distances and accelerometer-based (acceleration) variables in small-sided soccer games (SSG) with and without the offside rule, as well as compare variables between the tasks. Twenty-four high-level U-17 soccer athletes played 3 versus 3 (plus goalkeepers) SSG in two formats (with and without the offside rule). SSG were performed on eight consecutive weeks (4 weeks for each group), twice a week. The physical demands were recorded using a GPS with an embedded triaxial accelerometer. GPS-based variables (total distance, average speed, and distances covered at different speeds) and accelerometer-based variables (Player Load™, root mean square of the acceleration recorded in each movement axis, and the root mean square of resultant acceleration) were calculated. Results showed that the inclusion of the offside rule reduced the total distance covered (large effect) and the distances covered at moderate speed zones (7–12.9 km/h – moderate effect; 13–17.9 km/h – large effect). In both SSG formats, GPS-based variables presented good to excellent reliability (intraclass correlation coefficients – ICC > 0.62) and accelerometer-based variables presented excellent reliability (ICC values > 0.89). Based on the results of this study, the offside rule decreases the physical demand of 3 versus 3 SSG and the physical demands required in these SSG present high intersession reliability.

Download Full-text

Assessment of reliability and validity of the 5-scale grading system of the point-of-care immunoassay for tear matrix metalloproteinase-9

Scientific Reports ◽

10.1038/s41598-021-92020-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Minjeong Kim ◽

Ja Young Oh ◽

Seon Ha Bae ◽

Seung Hyeun Lee ◽

Won Jun Lee ◽

...

Keyword(s):

Matrix Metalloproteinase ◽

Calibration Curve ◽

Point Of Care ◽

Interobserver Reliability ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Grading System ◽

Intraclass Correlation Coefficients ◽

The Difference

AbstractWe evaluated the reliability and validity of the 5-scale grading system to interpret the point-of-care immunoassay for tear matrix metalloproteinase (MMP)-9. Six observers graded red bands of photographs of the readout window in MMP-9 immunoassay kit (InflammaDry) two times with 2-week interval based on the 5-scale grading system (i.e. grade 0–4). Interobserver and intraobserver reliability were evaluated using intraclass correlation coefficients. The interobserver agreements were analyzed according to the severity of tear MMP-9 expression. To validate the system, a concentration calibration curve was made using MMP-9 solutions with reference concentrations, then the distribution of MMP-9 concentrations was analyzed according to the 5-scale grading system. Both intraobserver and interobserver reliability was excellent. The readout grades were significantly correlated with the quantified colorimetric densities. The interobserver variance of readout grades had no correlation with the severity of the measured densities. The band density continued to increase up to a maximal concentration (i.e. 5000 ng/mL) according to the calibration curve. The difference of grades reflected the change of MMP-9 concentrations sensitively, especially between grade 2 and 4. Together, our data indicate that the subjective 5-scale grading system in the point-of-care MMP-9 immunoassay is an easy and reliable method with acceptable accuracy.

Download Full-text