scholarly journals DOING L2 SPEECH RESEARCH ONLINE: WHY AND HOW TO COLLECT ONLINE RATINGS DATA

Author(s):  
Charles L. Nagle ◽  
Ivana Rehman

Abstract Listener-based ratings have become a prominent means of defining second language (L2) users’ global speaking ability. In most cases, local listeners are recruited to evaluate speech samples in person. However, in many teaching and research contexts, recruiting local listeners may not be possible or advisable. The goal of this study was to hone a reliable method of recruiting listeners to evaluate L2 speech samples online through Amazon Mechanical Turk (AMT) using a blocked rating design. Three groups of listeners were recruited: local laboratory raters and two AMT groups, one inclusive of the dialects to which L2 speakers had been exposed and another inclusive of a variety of dialects. Reliability was assessed using intraclass correlation coefficients, Rasch models, and mixed-effects models. Results indicate that online ratings can be highly reliable as long as appropriate quality control measures are adopted. The method and results can guide future work with online samples.

2019 ◽  
Vol 5 (2) ◽  
pp. 294-323 ◽  
Author(s):  
Charles Nagle

Abstract Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data collection.


2016 ◽  
Vol 32 (1) ◽  
pp. 86-92
Author(s):  
Markus D. Jakobsen ◽  
Mikkel Brandt ◽  
Emil Sundstrup ◽  
Kenneth Jay ◽  
Per Aagaard ◽  
...  

This study evaluates the between-day reliability of a newly developed trunk perturbation test and compares mechanical response during known and unknown conditions. Mechanical trunk responses were measured in 17 female subjects during unloading and loading perturbations of the abdomen (A: preloaded abdomen condition) and low back (B: preloaded back condition). The loading perturbation increased the preload from 5.5 kg to a 10.9 kg pull on the trunk whereas the unloading perturbation decreased the pull from 5.5 kg to 0.1 kg. A sequence of loading (known), unloading (known), and randomized loading/unloading (unknown) perturbations were performed for A and B. Between-day reliability of stopping time, trunk displacement, and velocity was quantified using intraclass correlation coefficients (ICCs). ICCs were good to excellent for all loading and unloading measures during the known (0.70–0.98) and unknown (0.64–0.94) perturbations of A and B. In general, larger trunk displacements were seen after the unknown perturbations compared with the known perturbation. The method may be used as a diagnostic tool for screening workers who are in risk of future work-related low back injuries.


2016 ◽  
Vol 11 (4) ◽  
pp. 555-557 ◽  
Author(s):  
John J. McMahon ◽  
Paul A. Jones ◽  
Paul Comfort

Purpose:To determine the concurrent validity and reliability of the popular Just Jump system (JJS) for determining jump height and, if necessary, provide a correction equation for future reference.Methods:Eighteen male college athletes performed 3 bilateral countermovement jumps (CMJs) on 2 JJSs (alternative method) that were placed on top of a force platform (criterion method). Two JJSs were used to establish consistency between systems. Jump height was calculated from flight time obtained from the JJS and force platform.Results:Intraclass correlation coefficients (ICCs) demonstrated excellent within-session reliability of the CMJ height measurement derived from both the JJS (ICC = .96, P < .001) and the force platform (ICC = .96, P < .001). Dependent t tests revealed that the JJS yielded a significantly greater CMJ jump height (0.46 ± 0.09 m vs 0.33 ± 0.08 m) than the force platform (P < .001, Cohen d = 1.39, power = 1.00). There was, however, an excellent relationship between CMJ heights derived from the JJS and force platform (r = .998, P < .001, power = 1.00), with a coefficient of determination (R2) of .995. Therefore, the following correction equation was produced: Criterion jump height = (0.8747 × alternative jump height) – 0.0666.Conclusions:The JJS provides a reliable but overestimated measure of jump height. It is suggested, therefore, that practitioners who use the JJS as part of future work apply the correction equation presented in this study to resultant jump-height values.


1991 ◽  
Vol 34 (5) ◽  
pp. 989-999 ◽  
Author(s):  
Stephanie Shaw ◽  
Truman E. Coggins

This study examines whether observers reliably categorize selected speech production behaviors in hearing-impaired children. A group of experienced speech-language pathologists was trained to score the elicited imitations of 5 profoundly and 5 severely hearing-impaired subjects using the Phonetic Level Evaluation (Ling, 1976). Interrater reliability was calculated using intraclass correlation coefficients. Overall, the magnitude of the coefficients was found to be considerably below what would be accepted in published behavioral research. Failure to obtain acceptably high levels of reliability suggests that the Phonetic Level Evaluation may not yet be an accurate and objective speech assessment measure for hearing-impaired children.


Author(s):  
Marcos A Soriano ◽  
G Gregory Haff ◽  
Paul Comfort ◽  
Francisco J Amaro-Gahete ◽  
Antonio Torres-González ◽  
...  

The aims of this study were to (I) determine the differences and relationship between the overhead press and split jerk performance in athletes involved in weightlifting training, and (II) explore the magnitude of these differences in one-repetition maximum (1RM) performances between sexes. Sixty-one men (age: 30.4 ± 6.7 years; height: 1.8 ± 0.5 m; body mass 82.5 ± 8.5 kg; weightlifting training experience: 3.7 ± 3.5 yrs) and 21 women (age: 29.5 ± 5.2 yrs; height: 1.7 ± 0.5 m; body mass: 62.6 ± 5.7 kg; weightlifting training experience: 3.0 ± 1.5 yrs) participated. The 1RM performance of the overhead press and split jerk were assessed for all participants, with the overhead press assessed on two occasions to determine between-session reliability. The intraclass correlation coefficients (ICC) and 95% confidence intervals showed a high reliability for the overhead press ICC = 0.98 (0.97 – 0.99). A very strong correlation and significant differences were found between the overhead press and split jerk 1RM performances for all participants (r = 0.90 [0.93 – 0.85], 60.2 ± 18.3 kg, 95.7 ± 29.3 kg, p ≤ 0.001). Men demonstrated stronger correlations between the overhead press and split jerk 1RM performances (r = 0.83 [0.73-0.90], p ≤ 0.001) compared with women (r = 0.56 [0.17-0.80], p = 0.008). These results provide evidence that 1RM performance of the overhead press and split jerk performance are highly related, highlighting the importance of upper-limb strength in the split jerk maximum performance.


Dysphagia ◽  
2021 ◽  
Author(s):  
Sofie Albinsson ◽  
Lisa Tuomi ◽  
Christine Wennerås ◽  
Helen Larsson

AbstractThe lack of a Swedish patient-reported outcome instrument for eosinophilic esophagitis (EoE) has limited the assessment of the disease. The aims of the study were to translate and validate the Eosinophilic Esophagitis Activity Index (EEsAI) to Swedish and to assess the symptom severity of patients with EoE compared to a nondysphagia control group. The EEsAI was translated and adapted to a Swedish cultural context (S-EEsAI) based on international guidelines. The S-EEsAI was validated using adult Swedish patients with EoE (n = 97) and an age- and sex-matched nondysphagia control group (n = 97). All participants completed the S-EEsAI, the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Oesophageal Module 18 (EORTC QLQ-OES18), and supplementary questions regarding feasibility and demographics. Reliability and validity of the S-EEsAI were evaluated by Cronbach’s alpha and Spearman correlation coefficients between the domains of the S-EEsAI and the EORTC QLQ-OES18. A test–retest analysis of 29 patients was evaluated through intraclass correlation coefficients. The S-EEsAI had sufficient reliability with Cronbach’s alpha values of 0.83 and 0.85 for the “visual dysphagia question” and the “avoidance, modification and slow eating score” domains, respectively. The test–retest reliability was sufficient, with good to excellent intraclass correlation coefficients (0.60–0.89). The S-EEsAI domains showed moderate correlation to 6/10 EORTC QLQ-OES18 domains, indicating adequate validity. The patient S-EEsAI results differed significantly from those of the nondysphagia controls (p < 0.001). The S-EEsAI appears to be a valid and reliable instrument for monitoring adult patients with EoE in Sweden.


Author(s):  
Jens Sörensen ◽  
Jonny Nordström ◽  
Tomasz Baron ◽  
Stellan Mörner ◽  
Sven-Olof Granstam ◽  
...  

Abstract Aim To develop a method for diagnosing left ventricular (LV) hypertrophy from cardiac perfusion 15O-water positron emission tomography (PET). Methods We retrospectively pooled data from 139 subjects in four research cohorts. LV remodeling patterns ranged from normal to severe eccentric and concentric hypertrophy. 15O-water PET scans (n = 197) were performed with three different PET devices. A low-end scanner (66 scans) was used for method development, and remaining scans with newer devices for a blinded evaluation. Dynamic data were converted into parametric images of perfusable tissue fraction for semi-automatic delineation of the LV wall and calculation of LV mass (LVM) and septal wall thickness (WT). LVM and WT from PET were compared to cardiac magnetic resonance (CMR, n = 47) and WT to 2D-echocardiography (2DE, n = 36). PET accuracy was tested using linear regression, Bland–Altman plots, and ROC curves. Observer reproducibility were evaluated using intraclass correlation coefficients. Results High correlations were found in the blinded analyses (r ≥ 0.87, P < 0.0001 for all). AUC for detecting increased LVM and WT (> 12 mm and > 15 mm) was ≥ 0.95 (P < 0.0001 for all). Reproducibility was excellent (ICC ≥ 0.93, P < 0.0001). Conclusion 15O-water PET might detect LV hypertrophy with high accuracy and precision.


Author(s):  
Igor Junio de Oliveira Custódio ◽  
Gibson Moreira Praça ◽  
Leandro Vinhas de Paula ◽  
Sarah da Glória Teles Bredt ◽  
Fabio Yuzo Nakamura ◽  
...  

This study aimed to analyze the intersession reliability of global positioning system (GPS-based) distances and accelerometer-based (acceleration) variables in small-sided soccer games (SSG) with and without the offside rule, as well as compare variables between the tasks. Twenty-four high-level U-17 soccer athletes played 3 versus 3 (plus goalkeepers) SSG in two formats (with and without the offside rule). SSG were performed on eight consecutive weeks (4 weeks for each group), twice a week. The physical demands were recorded using a GPS with an embedded triaxial accelerometer. GPS-based variables (total distance, average speed, and distances covered at different speeds) and accelerometer-based variables (Player Load™, root mean square of the acceleration recorded in each movement axis, and the root mean square of resultant acceleration) were calculated. Results showed that the inclusion of the offside rule reduced the total distance covered (large effect) and the distances covered at moderate speed zones (7–12.9 km/h – moderate effect; 13–17.9 km/h – large effect). In both SSG formats, GPS-based variables presented good to excellent reliability (intraclass correlation coefficients – ICC > 0.62) and accelerometer-based variables presented excellent reliability (ICC values > 0.89). Based on the results of this study, the offside rule decreases the physical demand of 3 versus 3 SSG and the physical demands required in these SSG present high intersession reliability.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Minjeong Kim ◽  
Ja Young Oh ◽  
Seon Ha Bae ◽  
Seung Hyeun Lee ◽  
Won Jun Lee ◽  
...  

AbstractWe evaluated the reliability and validity of the 5-scale grading system to interpret the point-of-care immunoassay for tear matrix metalloproteinase (MMP)-9. Six observers graded red bands of photographs of the readout window in MMP-9 immunoassay kit (InflammaDry) two times with 2-week interval based on the 5-scale grading system (i.e. grade 0–4). Interobserver and intraobserver reliability were evaluated using intraclass correlation coefficients. The interobserver agreements were analyzed according to the severity of tear MMP-9 expression. To validate the system, a concentration calibration curve was made using MMP-9 solutions with reference concentrations, then the distribution of MMP-9 concentrations was analyzed according to the 5-scale grading system. Both intraobserver and interobserver reliability was excellent. The readout grades were significantly correlated with the quantified colorimetric densities. The interobserver variance of readout grades had no correlation with the severity of the measured densities. The band density continued to increase up to a maximal concentration (i.e. 5000 ng/mL) according to the calibration curve. The difference of grades reflected the change of MMP-9 concentrations sensitively, especially between grade 2 and 4. Together, our data indicate that the subjective 5-scale grading system in the point-of-care MMP-9 immunoassay is an easy and reliable method with acceptable accuracy.


Sign in / Sign up

Export Citation Format

Share Document