Assessment of Reliability and Validity of the Cochlear Implant Skills Review: A New Measure to Evaluate Cochlear Implant Users' Device Skills and Knowledge

Purpose The Cochlear Implant Skills Review (CISR) was developed as a measure of cochlear implant (CI) users' skills and knowledge regarding device use. This study aimed to determine intra- and interrater reliability and agreement and establish construct validity for the CISR. Method In this study, the CISR was developed and administered to a cohort of 30 adult CI users. Participants included new CI users with less than 1 year of CI experience and experienced CI users with greater than 1 year of CI experience. The CISR administration required participants to demonstrate skills using the various features of their CI processors. Intra- and interrater reliability were assessed using intraclass correlation coefficients, agreement was assessed using Cohen's kappa, and construct validity was assessed by relating CISR performance to duration of CI use. Results Overall reliability for the entire instrument was 92.7%. Inter- and intrarater agreement were generally substantial or higher. Duration of CI use was a significant predictor of CISR performance. Conclusions The CISR is a reliable and valid assessment measure of device skills and knowledge for adult CI users. Clinicians can use this tool to evaluate areas of needed instruction and counseling and to assess users' skills over time.

Download Full-text

Interobserver Reliability Using the Phonetic Level Evaluation With Severely and Profoundly Hearing-Impaired Children

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3405.989 ◽

1991 ◽

Vol 34 (5) ◽

pp. 989-999 ◽

Cited By ~ 6

Author(s):

Stephanie Shaw ◽

Truman E. Coggins

Keyword(s):

Interrater Reliability ◽

Interobserver Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Hearing Impaired ◽

Intraclass Correlation Coefficients ◽

Assessment Measure ◽

Impaired Children ◽

Speech Assessment ◽

Hearing Impaired Children

This study examines whether observers reliably categorize selected speech production behaviors in hearing-impaired children. A group of experienced speech-language pathologists was trained to score the elicited imitations of 5 profoundly and 5 severely hearing-impaired subjects using the Phonetic Level Evaluation (Ling, 1976). Interrater reliability was calculated using intraclass correlation coefficients. Overall, the magnitude of the coefficients was found to be considerably below what would be accepted in published behavioral research. Failure to obtain acceptably high levels of reliability suggests that the Phonetic Level Evaluation may not yet be an accurate and objective speech assessment measure for hearing-impaired children.

Download Full-text

Alberta Infant Motor Scale: Reliability and Validity When Used on Preterm Infants in Taiwan

Physical Therapy ◽

10.1093/ptj/80.2.168 ◽

2000 ◽

Vol 80 (2) ◽

pp. 168-178 ◽

Cited By ~ 76

Author(s):

Suh-Fang Jeng ◽

Kuo-Inn Tsou Yau ◽

Li-Chiou Chen ◽

Shu-Fang Hsiao

Keyword(s):

Preterm Infants ◽

Interrater Reliability ◽

Physical Therapist ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Intraclass Correlation Coefficients ◽

Scale Reliability ◽

Scale Scores ◽

Acceptable Reliability

Abstract Background and Purpose. The goal of this study was to examine the reliability and validity of measurements obtained with the Alberta Infant Motor Scale (AIMS) for evaluation of preterm infants in Taiwan. Subjects. Two independent groups of preterm infants were used to investigate the reliability (n=45) and validity (n=41) for the AIMS. Methods. In the reliability study, the AIMS was administered to the infants by a physical therapist, and infant performance was videotaped. The performance was then rescored by the same therapist and by 2 other therapists to examine the intrarater and interrater reliability. In the validity study, the AIMS and the Bayley Motor Scale were administered to the infants at 6 and 12 months of age to examine criterion-related validity. Results. Intraclass correlation coefficients (ICCs) for intrarater and interrater reliability of measurements obtained with the AIMS were high (ICC=.97–.99). The AIMS scores correlated with the Bayley Motor Scale scores at 6 and 12 months (r=.78 and .90), although the AIMS scores at 6 months were only moderately predictive of the motor function at 12 months (r=.56). Conclusion and Discussion. The results suggest that measurements obtained with the AIMS have acceptable reliability and concurrent validity but limited predictive value for evaluating preterm Taiwanese infants.

Download Full-text

Reliability and Validity of Trunk Assessment for People With Multiple Sclerosis

Physical Therapy ◽

10.1093/ptj/86.1.66 ◽

2006 ◽

Vol 86 (1) ◽

pp. 66-76 ◽

Cited By ~ 35

Author(s):

Geert Verheyden ◽

Godelieve Nuyens ◽

Alice Nieuwboer ◽

Pol Van Asch ◽

Piet Ketelaer ◽

...

Keyword(s):

Multiple Sclerosis ◽

Construct Validity ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Weighted Kappa ◽

Altman Analysis ◽

Bland Altman Analysis ◽

Intraclass Correlation Coefficients ◽

Test Retest Reliability

Abstract Background and Purpose. Standardized scales are a prerequisite for rehabilitation and research. This study was designed to determine the reliability and validity of scores on items of the trunk assessment of the Melsbroek Disability Scoring Test (MDST) and Trunk Impairment Scale (TIS) in people with multiple sclerosis (MS). Subjects. Thirty people with MS participated in the study. Methods. Interrater and test-retest reliability and construct validity were assessed. Results. Kappa and weighted kappa values for the items of the trunk assessment of the MDST ranged from .74 to .95, and the kappa and weighted kappa values for the TIS items ranged from .46 to 1.00. Intraclass correlation coefficients for interrater and test-retest agreement were .93 and .92, respectively, for the trunk assessment of the MDST and .97 and .95, respectively, for the TIS. Bland-Altman analysis showed consistency of scores without observer bias. Construct validity was established. Discussion and Conclusion. The MDST and TIS provide reliable assessments of the trunk and are valid scales for measuring trunk performance in people with MS. [Verheyden G, Nuyens G, Nieuwboer A, et al. Reliability and validity of trunk assessment for people with multiple sclerosis.

Download Full-text

Assessment of reliability and validity of the 5-scale grading system of the point-of-care immunoassay for tear matrix metalloproteinase-9

Scientific Reports ◽

10.1038/s41598-021-92020-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Minjeong Kim ◽

Ja Young Oh ◽

Seon Ha Bae ◽

Seung Hyeun Lee ◽

Won Jun Lee ◽

...

Keyword(s):

Matrix Metalloproteinase ◽

Calibration Curve ◽

Point Of Care ◽

Interobserver Reliability ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Grading System ◽

Intraclass Correlation Coefficients ◽

The Difference

AbstractWe evaluated the reliability and validity of the 5-scale grading system to interpret the point-of-care immunoassay for tear matrix metalloproteinase (MMP)-9. Six observers graded red bands of photographs of the readout window in MMP-9 immunoassay kit (InflammaDry) two times with 2-week interval based on the 5-scale grading system (i.e. grade 0–4). Interobserver and intraobserver reliability were evaluated using intraclass correlation coefficients. The interobserver agreements were analyzed according to the severity of tear MMP-9 expression. To validate the system, a concentration calibration curve was made using MMP-9 solutions with reference concentrations, then the distribution of MMP-9 concentrations was analyzed according to the 5-scale grading system. Both intraobserver and interobserver reliability was excellent. The readout grades were significantly correlated with the quantified colorimetric densities. The interobserver variance of readout grades had no correlation with the severity of the measured densities. The band density continued to increase up to a maximal concentration (i.e. 5000 ng/mL) according to the calibration curve. The difference of grades reflected the change of MMP-9 concentrations sensitively, especially between grade 2 and 4. Together, our data indicate that the subjective 5-scale grading system in the point-of-care MMP-9 immunoassay is an easy and reliable method with acceptable accuracy.

Download Full-text

Development and Initial Validation of a Project-Based Rubric to Assess the Systems-Based Practice Competency of Residents in the Clinical Chemistry Rotation of a Pathology Residency

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2013-0046-oa ◽

2014 ◽

Vol 138 (6) ◽

pp. 809-813

Author(s):

Carolyn R. Vitek ◽

Jane C. Dale ◽

Henry A. Homburger ◽

Sandra C. Bryant ◽

Amy K. Saenger ◽

...

Keyword(s):

Critical Thinking ◽

Interrater Reliability ◽

Clinical Chemistry ◽

Core Competencies ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Thinking Skills ◽

Project Evaluation ◽

Critical Thinking Skills

Context.— Systems-based practice (SBP) is 1 of 6 core competencies required in all resident training programs accredited by the Accreditation Council for Graduate Medical Education. Reliable methods of assessing resident competency in SBP have not been described in the medical literature. Objective.— To develop and validate an analytic grading rubric to assess pathology residents' analyses of SBP problems in clinical chemistry. Design.— Residents were assigned an SBP project based upon unmet clinical needs in the clinical chemistry laboratories. Using an iterative method, we created an analytic grading rubric based on critical thinking principles. Four faculty raters used the SBP project evaluation rubric to independently grade 11 residents' projects during their clinical chemistry rotations. Interrater reliability and Cronbach α were calculated to determine the reliability and validity of the rubric. Project mean scores and range were also assessed to determine whether the rubric differentiated resident critical thinking skills related to the SBP projects. Results.— Overall project scores ranged from 6.56 to 16.50 out of a possible 20 points. Cronbach α ranged from 0.91 to 0.96, indicating that the 4 rubric categories were internally consistent without significant overlap. Intraclass correlation coefficients ranged from 0.63 to 0.81, indicating moderate to strong interrater reliability. Conclusions.— We report development and statistical analysis of a novel SBP project evaluation rubric. The results indicate the rubric can be used to reliably assess pathology residents' critical thinking skills in SBP.

Download Full-text

Development of a Model for the Acquisition and Assessment of Advanced Laparoscopic Suturing Skills Using an Automated Device

Surgical Innovation ◽

10.1177/1553350618764221 ◽

2018 ◽

Vol 25 (3) ◽

pp. 286-290 ◽

Cited By ~ 2

Author(s):

Elif Bilgic ◽

Madoka Takao ◽

Pepa Kaneva ◽

Satoshi Endo ◽

Toshitatsu Takao ◽

...

Keyword(s):

Laparoscopic Surgery ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Instructional Video ◽

Validity Evidence ◽

Laparoscopic Suturing ◽

Intraclass Correlation Coefficients ◽

Operative Assessment ◽

Suturing Skills

Background. Needs assessment identified a gap regarding laparoscopic suturing skills targeted in simulation. This study collected validity evidence for an advanced laparoscopic suturing task using an Endo StitchTM device. Methods. Experienced (ES) and novice surgeons (NS) performed continuous suturing after watching an instructional video. Scores were based on time and accuracy, and Global Operative Assessment of Laparoscopic Surgery. Data are shown as medians [25th-75th percentiles] (ES vs NS). Interrater reliability was calculated using intraclass correlation coefficients (confidence interval). Results. Seventeen participants were enrolled. Experienced surgeons had significantly greater task (980 [964-999] vs 666 [391-711], P = .0035) and Global Operative Assessment of Laparoscopic Surgery scores (25 [24-25] vs 14 [12-17], P = .0029). Interrater reliability for time and accuracy were 1.0 and 0.9 (0.74-0.96), respectively. All experienced surgeons agreed that the task was relevant to practice. Conclusion. This study provides validity evidence for the task as a measure of laparoscopic suturing skill using an automated suturing device. It could help trainees acquire the skills they need to better prepare for clinical learning.

Download Full-text

Using Smart Bracelets to Assess Heart Rate Among Students During Physical Education Lessons: Feasibility, Reliability, and Validity Study (Preprint)

10.2196/preprints.17699 ◽

2020 ◽

Author(s):

Jiangang Sun ◽

Yang Liu

Keyword(s):

Heart Rate ◽

Physical Education ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Left Wrist ◽

Intraclass Correlation Coefficients ◽

Estimate Reliability ◽

Left And Right ◽

The Right

BACKGROUND An increasing number of wrist-worn wearables are being examined in the context of health care. However, studies of their use during physical education (PE) lessons remain scarce. OBJECTIVE We aim to examine the reliability and validity of the Fizzo Smart Bracelet (Fizzo) in measuring heart rate (HR) in the laboratory and during PE lessons. METHODS In Study 1, 11 healthy subjects (median age 22.0 years, IQR 3.75 years) twice completed a test that involved running on a treadmill at 6 km/h for 12 minutes and 12 km/h for 5 minutes. During the test, participants wore two Fizzo devices, one each on their left and right wrists, to measure their HR. At the same time, the Polar Team2 Pro (Polar), which is worn on the chest, was used as the standard. In Study 2, we went to 10 schools and measured the HR of 24 students (median age 14.0 years, IQR 2.0 years) during PE lessons. During the PE lessons, each student wore a Polar device on their chest and a Fizzo on their right wrist to measure HR data. At the end of the PE lessons, the students and their teachers completed a questionnaire where they assessed the feasibility of Fizzo. The measurements taken by the left wrist Fizzo and the right wrist Fizzo were compared to estimate reliability, while the Fizzo measurements were compared to the Polar measurements to estimate validity. To measure reliability, intraclass correlation coefficients (ICC), mean difference (MD), standard error of measurement (SEM), and mean absolute percentage errors (MAPE) were used. To measure validity, ICC, limits of agreement (LOA), and MAPE were calculated and Bland-Altman plots were constructed. Percentage values were used to estimate the feasibility of Fizzo. RESULTS The Fizzo showed excellent reliability and validity in the laboratory and moderate validity in a PE lesson setting. In Study 1, reliability was excellent (ICC>0.97; MD<0.7; SEM<0.56; MAPE<1.45%). The validity as determined by comparing the left wrist Fizzo and right wrist Fizzo was excellent (ICC>0.98; MAPE<1.85%). Bland-Altman plots showed a strong correlation between left wrist Fizzo measurements (bias=0.48, LOA=–3.94 to 4.89 beats per minute) and right wrist Fizzo measurements (bias=0.56, LOA=–4.60 to 5.72 beats per minute). In Study 2, the validity of the Fizzo was lower compared to that found in Study 1 but still moderate (ICC>0.70; MAPE<9.0%). The Fizzo showed broader LOA in the Bland-Altman plots during the PE lessons (bias=–2.60, LOA=–38.89 to 33.69 beats per minute). Most participants considered the Fizzo very comfortable and easy to put on. All teachers thought the Fizzo was helpful. CONCLUSIONS When participants ran on a treadmill in the laboratory, both left and right wrist Fizzo measurements were accurate. The validity of the Fizzo was lower in PE lessons but still reached a moderate level. The Fizzo is feasible for use during PE lessons.

Download Full-text

Interrater Reliability of the Handwriting Speed Test

The Occupational Therapy Journal of Research ◽

10.1177/153944929701700404 ◽

1997 ◽

Vol 17 (4) ◽

pp. 280-287 ◽

Cited By ~ 8

Author(s):

Margaret Wallen ◽

Mary-Ann Bonney ◽

Lyn Lennox

Keyword(s):

Interrater Reliability ◽

Test Development ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Objective Evaluation ◽

Research Tool ◽

School Students ◽

Speed Test ◽

Intraclass Correlation Coefficients ◽

Handwriting Speed

The Handwriting Speed Test (HST), a standardized, norm-referenced test, was developed to provide an objective evaluation of the handwriting speed of school students from approximately 8 to 18 years of age. Part of the test development involved an examination of interrater reliability. Two raters scored 165 (13%) of the total 1292 handwriting samples. Using intraclass correlation coefficients, the interrater reliability was found to be excellent (ICC=1.00, P<0.0001). The process of examining interrater reliability resulted in modification to the scoring criteria of the test. Excellent interrater reliability provides support for the HST as a valuable clinical and research tool.

Download Full-text

Validation of an Evidence-Based Medicine Critically Appraised Topic Presentation Evaluation Tool (EBM C-PET)

Journal of Graduate Medical Education ◽

10.4300/jgme-d-12-00049.1 ◽

2013 ◽

Vol 5 (2) ◽

pp. 252-256 ◽

Cited By ~ 2

Author(s):

Hans B. Kersten ◽

John G. Frohna ◽

Erin L. Giudice

Keyword(s):

Internal Consistency ◽

Evidence Based Medicine ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Evidence Based ◽

Evaluation Tool ◽

Intraclass Correlation Coefficients ◽

Excellent Internal Consistency ◽

Based Medicine

Abstract Background Competence in evidence-based medicine (EBM) is an important clinical skill. Pediatrics residents are expected to acquire competence in EBM during their education, yet few validated tools exist to assess residents' EBM skills. Objective We sought to develop a reliable tool to evaluate residents' EBM skills in the critical appraisal of a research article, the development of a written EBM critically appraised topic (CAT) synopsis, and a presentation of the findings to colleagues. Methods Instrument development used a modified Delphi technique. We defined the skills to be assessed while reviewing (1) a written CAT synopsis and (2) a resident's EBM presentation. We defined skill levels for each item using the Dreyfus and Dreyfus model of skill development and created behavioral anchors using a frame-of-reference training technique to describe performance for each skill level. We evaluated the assessment instrument's psychometric properties, including internal consistency and interrater reliability. Results The EBM Critically Appraised Topic Presentation Evaluation Tool (EBM C-PET) is composed of 14 items that assess residents' EBM and global presentation skills. Resident presentations (N = 27) and the corresponding written CAT synopses were evaluated using the EBM C-PET. The EBM C-PET had excellent internal consistency (Cronbach α = 0.94). Intraclass correlation coefficients were used to assess interrater reliability. Intraclass correlation coefficients for individual items ranged from 0.31 to 0.74; the average intraclass correlation coefficients for the 14 items was 0.67. Conclusions We identified essential components of an assessment tool for an EBM CAT synopsis and presentation with excellent internal consistency and a good level of interrater reliability across 3 different institutions. The EBM C-PET is a reliable tool to document resident competence in higher-level EBM skills.

Download Full-text

Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk

Journal of Second Language Pronunciation ◽

10.1075/jslp.18016.nag ◽

2019 ◽

Vol 5 (2) ◽

pp. 294-323 ◽

Cited By ~ 3

Author(s):

Charles Nagle

Keyword(s):

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Spanish Speakers ◽

Mechanical Turk ◽

Amazon Mechanical Turk ◽

Native Spanish Speakers ◽

Intraclass Correlation Coefficients ◽

Future Data ◽

Rater Severity

Abstract Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data collection.

Download Full-text