Interobserver Reliability Using the Phonetic Level Evaluation With Severely and Profoundly Hearing-Impaired Children

1991 ◽  
Vol 34 (5) ◽  
pp. 989-999 ◽  
Author(s):  
Stephanie Shaw ◽  
Truman E. Coggins

This study examines whether observers reliably categorize selected speech production behaviors in hearing-impaired children. A group of experienced speech-language pathologists was trained to score the elicited imitations of 5 profoundly and 5 severely hearing-impaired subjects using the Phonetic Level Evaluation (Ling, 1976). Interrater reliability was calculated using intraclass correlation coefficients. Overall, the magnitude of the coefficients was found to be considerably below what would be accepted in published behavioral research. Failure to obtain acceptably high levels of reliability suggests that the Phonetic Level Evaluation may not yet be an accurate and objective speech assessment measure for hearing-impaired children.

2021 ◽  
pp. 1-23
Author(s):  
Kara Vasil ◽  
Jessica Lewis ◽  
Christin Ray ◽  
Jodi Baxter ◽  
Claire Bernstein ◽  
...  

Purpose The Cochlear Implant Skills Review (CISR) was developed as a measure of cochlear implant (CI) users' skills and knowledge regarding device use. This study aimed to determine intra- and interrater reliability and agreement and establish construct validity for the CISR. Method In this study, the CISR was developed and administered to a cohort of 30 adult CI users. Participants included new CI users with less than 1 year of CI experience and experienced CI users with greater than 1 year of CI experience. The CISR administration required participants to demonstrate skills using the various features of their CI processors. Intra- and interrater reliability were assessed using intraclass correlation coefficients, agreement was assessed using Cohen's kappa, and construct validity was assessed by relating CISR performance to duration of CI use. Results Overall reliability for the entire instrument was 92.7%. Inter- and intrarater agreement were generally substantial or higher. Duration of CI use was a significant predictor of CISR performance. Conclusions The CISR is a reliable and valid assessment measure of device skills and knowledge for adult CI users. Clinicians can use this tool to evaluate areas of needed instruction and counseling and to assess users' skills over time.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Minjeong Kim ◽  
Ja Young Oh ◽  
Seon Ha Bae ◽  
Seung Hyeun Lee ◽  
Won Jun Lee ◽  
...  

AbstractWe evaluated the reliability and validity of the 5-scale grading system to interpret the point-of-care immunoassay for tear matrix metalloproteinase (MMP)-9. Six observers graded red bands of photographs of the readout window in MMP-9 immunoassay kit (InflammaDry) two times with 2-week interval based on the 5-scale grading system (i.e. grade 0–4). Interobserver and intraobserver reliability were evaluated using intraclass correlation coefficients. The interobserver agreements were analyzed according to the severity of tear MMP-9 expression. To validate the system, a concentration calibration curve was made using MMP-9 solutions with reference concentrations, then the distribution of MMP-9 concentrations was analyzed according to the 5-scale grading system. Both intraobserver and interobserver reliability was excellent. The readout grades were significantly correlated with the quantified colorimetric densities. The interobserver variance of readout grades had no correlation with the severity of the measured densities. The band density continued to increase up to a maximal concentration (i.e. 5000 ng/mL) according to the calibration curve. The difference of grades reflected the change of MMP-9 concentrations sensitively, especially between grade 2 and 4. Together, our data indicate that the subjective 5-scale grading system in the point-of-care MMP-9 immunoassay is an easy and reliable method with acceptable accuracy.


2018 ◽  
Vol 25 (3) ◽  
pp. 286-290 ◽  
Author(s):  
Elif Bilgic ◽  
Madoka Takao ◽  
Pepa Kaneva ◽  
Satoshi Endo ◽  
Toshitatsu Takao ◽  
...  

Background. Needs assessment identified a gap regarding laparoscopic suturing skills targeted in simulation. This study collected validity evidence for an advanced laparoscopic suturing task using an Endo StitchTM device. Methods. Experienced (ES) and novice surgeons (NS) performed continuous suturing after watching an instructional video. Scores were based on time and accuracy, and Global Operative Assessment of Laparoscopic Surgery. Data are shown as medians [25th-75th percentiles] (ES vs NS). Interrater reliability was calculated using intraclass correlation coefficients (confidence interval). Results. Seventeen participants were enrolled. Experienced surgeons had significantly greater task (980 [964-999] vs 666 [391-711], P = .0035) and Global Operative Assessment of Laparoscopic Surgery scores (25 [24-25] vs 14 [12-17], P = .0029). Interrater reliability for time and accuracy were 1.0 and 0.9 (0.74-0.96), respectively. All experienced surgeons agreed that the task was relevant to practice. Conclusion. This study provides validity evidence for the task as a measure of laparoscopic suturing skill using an automated suturing device. It could help trainees acquire the skills they need to better prepare for clinical learning.


Author(s):  
Ian S. MacLean ◽  
Taylor M. Southworth ◽  
Ian J. Dempsey ◽  
Neal B. Naveen ◽  
Hailey P. Huddleston ◽  
...  

AbstractThe tibial tubercle–trochlear groove (TT-TG) distance is currently utilized to evaluate knee alignment in patients with patellar instability. Sagittal plane pathology measured by the sagittal tibial tubercle–trochlear groove (sTT-TG) distance has been described in instability but may also be important to consider in patients with cartilage injury. This study aims to (1) describe interobserver reliability of the sTT-TG distance and (2) characterize the change in the sTT-TG distance with respect to changing knee flexion angles. In this cadaveric study, six nonpaired cadaveric knees underwent magnetic resonance imaging (MRI) studies at each of the following degrees of knee flexion: −5, 0, 5, 10, 15, and 20. The sTT-TG distance was measured on the axial T2 sequence. Four reviewers measured this distance for each cadaver at each flexion angle. Intraclass correlation coefficients were calculated to determine interobserver reliability and reproducibility of the sTT-TG measurement. Analysis of variance (ANOVA) tests and Friedman's tests with a Bonferroni's correction were performed for each cadaver to compare sTT-TG distances at each flexion angle. Significance was defined as p < 0.05. There was excellent interobserver reliability of the sTT-TG distance with all intraclass correlation coefficients >0.9. The tibial tubercle progressively becomes more posterior in relation to the trochlear groove (more negative sTT-TG distance) with increasing knee flexion. The sTT-TG distance is a measurement that is reliable between attending surgeons and across training levels. The sTT-TG distance is affected by small changes in knee flexion angle. Awareness of knee flexion angle on MRI is important when this measurement is utilized by surgeons.


2019 ◽  
Vol 40 (6) ◽  
pp. 720-726 ◽  
Author(s):  
Jian Zhong Zhang ◽  
François Lintz ◽  
Alessio Bernasconi ◽  
Shu Zhang ◽  

Background: Weightbearing computed tomography (WBCT) is a useful tool for the assessment of hindfoot alignment (HA). Foot ankle offset (FAO) is a recently introduced parameter, determined from WBCT images using semiautomatic software. The aim of this study was to determine the clinical relevance and reproducibility of FAO for the evaluation of HA. Methods: A prospective comparative study was performed on consecutive patients requiring bilateral WBCT between September 2017 and April 2018. Based on the clinical assessment of HA, patients were divided into 3 groups: (1) normal alignment group (G1), (2) valgus (G2), and (3) varus (G3). FAO and long axial view (HACT) were measured on WBCT images, and the groups were compared. The reproducibility of FAO and HACT was determined through intraclass correlation coefficients (ICCs). Regression analysis was performed to investigate the correlation between the 2 methods. Overall, 249 feet (126 patients) were included (G1 = 115, G2 = 78, and G3 = 56 feet). Results: The mean values for FAO and HACT were 1.2% ± 2.8% and 3.9 ± 3.1, respectively, in G1; 8.1% ± 3.7% and 9.7 ± 4.9 in G2; and −6.6% ± 4.8% and −8.2 ± 6.6 in G3. Intra- and interobserver reliability was 0.987 and 0.988 for FAO and 0.949 and 0.949 for HACT, respectively. There was a good linear correlation between HACT and FAO ( R2 = 0.744), with a regression slope of 1.064. Conclusions: WBCT was a useful method for the characterization of HA. FAO was reproducible and correlated well with physical examination. Level of Evidence: Level II, prospective comparative study.


1997 ◽  
Vol 17 (4) ◽  
pp. 280-287 ◽  
Author(s):  
Margaret Wallen ◽  
Mary-Ann Bonney ◽  
Lyn Lennox

The Handwriting Speed Test (HST), a standardized, norm-referenced test, was developed to provide an objective evaluation of the handwriting speed of school students from approximately 8 to 18 years of age. Part of the test development involved an examination of interrater reliability. Two raters scored 165 (13%) of the total 1292 handwriting samples. Using intraclass correlation coefficients, the interrater reliability was found to be excellent (ICC=1.00, P<0.0001). The process of examining interrater reliability resulted in modification to the scoring criteria of the test. Excellent interrater reliability provides support for the HST as a valuable clinical and research tool.


2013 ◽  
Vol 5 (2) ◽  
pp. 252-256 ◽  
Author(s):  
Hans B. Kersten ◽  
John G. Frohna ◽  
Erin L. Giudice

Abstract Background Competence in evidence-based medicine (EBM) is an important clinical skill. Pediatrics residents are expected to acquire competence in EBM during their education, yet few validated tools exist to assess residents' EBM skills. Objective We sought to develop a reliable tool to evaluate residents' EBM skills in the critical appraisal of a research article, the development of a written EBM critically appraised topic (CAT) synopsis, and a presentation of the findings to colleagues. Methods Instrument development used a modified Delphi technique. We defined the skills to be assessed while reviewing (1) a written CAT synopsis and (2) a resident's EBM presentation. We defined skill levels for each item using the Dreyfus and Dreyfus model of skill development and created behavioral anchors using a frame-of-reference training technique to describe performance for each skill level. We evaluated the assessment instrument's psychometric properties, including internal consistency and interrater reliability. Results The EBM Critically Appraised Topic Presentation Evaluation Tool (EBM C-PET) is composed of 14 items that assess residents' EBM and global presentation skills. Resident presentations (N  =  27) and the corresponding written CAT synopses were evaluated using the EBM C-PET. The EBM C-PET had excellent internal consistency (Cronbach α  =  0.94). Intraclass correlation coefficients were used to assess interrater reliability. Intraclass correlation coefficients for individual items ranged from 0.31 to 0.74; the average intraclass correlation coefficients for the 14 items was 0.67. Conclusions We identified essential components of an assessment tool for an EBM CAT synopsis and presentation with excellent internal consistency and a good level of interrater reliability across 3 different institutions. The EBM C-PET is a reliable tool to document resident competence in higher-level EBM skills.


2019 ◽  
Vol 5 (2) ◽  
pp. 294-323 ◽  
Author(s):  
Charles Nagle

Abstract Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data collection.


2002 ◽  
Vol 96 (5) ◽  
pp. 1129-1139 ◽  
Author(s):  
Jason Slagle ◽  
Matthew B. Weinger ◽  
My-Than T. Dinh ◽  
Vanessa V. Brumer ◽  
Kevin Williams

Background Task analysis may be useful for assessing how anesthesiologists alter their behavior in response to different clinical situations. In this study, the authors examined the intraobserver and interobserver reliability of an established task analysis methodology. Methods During 20 routine anesthetic procedures, a trained observer sat in the operating room and categorized in real-time the anesthetist's activities into 38 task categories. Two weeks later, the same observer performed task analysis from videotapes obtained intraoperatively. A different observer performed task analysis from the videotapes on two separate occasions. Data were analyzed for percent of time spent on each task category, average task duration, and number of task occurrences. Rater reliability and agreement were assessed using intraclass correlation coefficients. Results Intrarater reliability was generally good for categorization of percent time on task and task occurrence (mean intraclass correlation coefficients of 0.84-0.97). There was a comparably high concordance between real-time and video analyses. Interrater reliability was generally good for percent time and task occurrence measurements. However, the interrater reliability of the task duration metric was unsatisfactory, primarily because of the technique used to capture multitasking. Conclusions A task analysis technique used in anesthesia research for several decades showed good intrarater reliability. Off-line analysis of videotapes is a viable alternative to real-time data collection. Acceptable interrater reliability requires the use of strict task definitions, sophisticated software, and rigorous observer training. New techniques must be developed to more accurately capture multitasking. Substantial effort is required to conduct task analyses that will have sufficient reliability for purposes of research or clinical evaluation.


2000 ◽  
Vol 80 (2) ◽  
pp. 168-178 ◽  
Author(s):  
Suh-Fang Jeng ◽  
Kuo-Inn Tsou Yau ◽  
Li-Chiou Chen ◽  
Shu-Fang Hsiao

Abstract Background and Purpose. The goal of this study was to examine the reliability and validity of measurements obtained with the Alberta Infant Motor Scale (AIMS) for evaluation of preterm infants in Taiwan. Subjects. Two independent groups of preterm infants were used to investigate the reliability (n=45) and validity (n=41) for the AIMS. Methods. In the reliability study, the AIMS was administered to the infants by a physical therapist, and infant performance was videotaped. The performance was then rescored by the same therapist and by 2 other therapists to examine the intrarater and interrater reliability. In the validity study, the AIMS and the Bayley Motor Scale were administered to the infants at 6 and 12 months of age to examine criterion-related validity. Results. Intraclass correlation coefficients (ICCs) for intrarater and interrater reliability of measurements obtained with the AIMS were high (ICC=.97–.99). The AIMS scores correlated with the Bayley Motor Scale scores at 6 and 12 months (r=.78 and .90), although the AIMS scores at 6 months were only moderately predictive of the motor function at 12 months (r=.56). Conclusion and Discussion. The results suggest that measurements obtained with the AIMS have acceptable reliability and concurrent validity but limited predictive value for evaluating preterm Taiwanese infants.


Sign in / Sign up

Export Citation Format

Share Document