Artifact reduction of coaxial needles in magnetic resonance imaging-guided abdominal interventions at 1.5 T: a phantom study

AbstractNeedle artifacts pose a major limitation for MRI-guided interventions, as they impact the visually perceived needle size and needle-to-target-distance. The objective of this agar liver phantom study was to establish an experimental basis to understand and reduce needle artifact formation during MRI-guided abdominal interventions. Using a vendor-specific prototype fluoroscopic T1-weighted gradient echo sequence with real-time multiplanar acquisition at 1.5 T, the influence of 6 parameters (flip angle, bandwidth, matrix, slice thickness, read-out direction, intervention angle relative to B0) on artifact formation of 4 different coaxial MR-compatible coaxial needles (Nitinol, 16G–22G) was investigated. As one parameter was modified, the others remained constant. For each individual parameter variation, 2 independent and blinded readers rated artifact diameters at 2 predefined positions (15 mm distance from the perceived needle tip and at 50% of the needle length). Differences between the experimental subgroups were assessed by Bonferroni-corrected non-parametric tests. Correlations between continuous variables were expressed by the Bravais–Pearson coefficient and interrater reliability was quantified using the intraclass classification coefficient. Needle artifact size increased gradually with increasing flip angles (p = 0.002) as well as increasing intervention angles (p < 0.001). Artifact diameters differed significantly between the chosen matrix sizes (p = 0.002) while modifying bandwidth, readout direction, and slice thickness showed no significant differences. Interrater reliability was high (intraclass correlation coefficient 0.776–0.910). To minimize needle artifacts in MRI-guided abdominal interventions while maintaining optimal visibility of the coaxial needle, we suggest medium-range flip angles and low intervention angles relative to B0.

Download Full-text

Interobserver Reliability Using the Phonetic Level Evaluation With Severely and Profoundly Hearing-Impaired Children

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3405.989 ◽

1991 ◽

Vol 34 (5) ◽

pp. 989-999 ◽

Cited By ~ 6

Author(s):

Stephanie Shaw ◽

Truman E. Coggins

Keyword(s):

Interrater Reliability ◽

Interobserver Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Hearing Impaired ◽

Intraclass Correlation Coefficients ◽

Assessment Measure ◽

Impaired Children ◽

Speech Assessment ◽

Hearing Impaired Children

This study examines whether observers reliably categorize selected speech production behaviors in hearing-impaired children. A group of experienced speech-language pathologists was trained to score the elicited imitations of 5 profoundly and 5 severely hearing-impaired subjects using the Phonetic Level Evaluation (Ling, 1976). Interrater reliability was calculated using intraclass correlation coefficients. Overall, the magnitude of the coefficients was found to be considerably below what would be accepted in published behavioral research. Failure to obtain acceptably high levels of reliability suggests that the Phonetic Level Evaluation may not yet be an accurate and objective speech assessment measure for hearing-impaired children.

Download Full-text

CODEM Instrument

GeroPsych ◽

10.1024/1662-9647/a000100 ◽

2014 ◽

Vol 27 (1) ◽

pp. 23-31 ◽

Cited By ~ 4

Author(s):

Anne Kuemmel (This author contributed eq ◽

Julia Haberstroh (This author contributed ◽

Johannes Pantel

Keyword(s):

Convergent Validity ◽

Interrater Reliability ◽

Discriminant Validity ◽

Assessment Tool ◽

Intraclass Correlation ◽

Well Being ◽

Communication Behavior ◽

People With Dementia ◽

Pearson's R ◽

Pearson’S R

Communication and communication behaviors in situational contexts are essential conditions for well-being and quality of life in people with dementia. Measuring methods, however, are limited. The CODEM instrument, a standardized observational communication behavior assessment tool, was developed and evaluated on the basis of the current state of research in dementia care and social-communicative behavior. Initially, interrater reliability was examined by means of videoratings (N = 10 people with dementia). Thereupon, six caregivers in six German nursing homes observed 69 residents suffering from dementia and used CODEM to rate their communication behavior. The interrater reliability of CODEM was excellent (mean κ = .79; intraclass correlation = .91). Statistical analysis indicated that CODEM had excellent internal consistency (Cronbach’s α = .95). CODEM also showed excellent convergent validity (Pearson’s R = .88) as well as discriminant validity (Pearson’s R = .63). Confirmatory factor analysis verified the two-factor solution of verbal/content aspects and nonverbal/relationship aspects. With regard to the severity of the disease, the content and relational aspects of communication exhibited different trends. CODEM proved to be a reliable, valid, and sensitive assessment tool for examining communication behavior in the field of dementia. CODEM also provides researchers a feasible examination tool for measuring effects of psychosocial intervention studies that strive to improve communication behavior and well-being in dementia.

Download Full-text

Reliability of a Modified Medication Appropriateness Index in Community Pharmacies

Annals of Pharmacotherapy ◽

10.1177/106002800303700101 ◽

2003 ◽

Vol 37 (1) ◽

pp. 40-46

Author(s):

Rosemin Kassam ◽

Linda G Martin ◽

Karen B Farris ◽

Homero A Monsanto ◽

Jean-Marie Kaiser

Keyword(s):

Community Pharmacy ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Community Setting ◽

Medication Appropriateness Index ◽

Paired Samples ◽

Psychometric Data ◽

Ambulatory Patients ◽

Pharmacy Setting ◽

Medication Appropriateness

Background The medication appropriateness index (MAI) has demonstrated reliability in selected outpatient clinics where medical data were easily accessible from medical charts. However, its use in the community setting where patient data may be limited has not been examined. Objective To evaluate the usefulness of a modified MAI for use in the community pharmacy setting by testing interrater reliability using 3 different rating schemes. Methods Two raters evaluated 160 medications for 32 elderly ambulatory patients. Patient information was acquired using community pharmacist-collected medication histories. A summated MAI score, percent agreement, κ, positive agreement, negative agreement, and intraclass correlation coefficient were calculated for each criterion using 3 scoring schemes. A paired samples t-test (95% CI) was used to test interrater reliability. Results The κ statistics were >0.75 for indication and effectiveness, but good (0.41–0.66) for the remaining criteria using the Hanlon scoring scheme. The intraclass coefficients (0.82, 0.86, 0.87) and overall κ (0.65, 0.66, 0.61) were similar for the 3 schemes. Conclusions This study suggests that the modified MAI has the potential to detect medication appropriateness and inappropriateness in the community pharmacy setting; however, it is not without limitations. Because the MAI has the most clinimetric and psychometric data available, the instrument should be studied further to increase its reliability and generalizability.

Download Full-text

Testing telediagnostic obstetric ultrasound in Peru: a new horizon in expanding access to prenatal ultrasound

BMC Pregnancy and Childbirth ◽

10.1186/s12884-021-03720-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Marika Toscano ◽

Thomas J. Marini ◽

Kathryn Drennan ◽

Timothy M. Baran ◽

Jonah Kan ◽

...

Keyword(s):

Excellent Agreement ◽

Rural Areas ◽

Intraclass Correlation ◽

Standard Of Care ◽

The United States ◽

Categorical Variables ◽

Continuous Variables ◽

Second Trimester ◽

Obstetric Ultrasound ◽

Biometric Measurements

Abstract Background Ninety-four percent of all maternal deaths occur in low- and middle-income countries, and the majority are preventable. Access to quality Obstetric ultrasound can identify some complications leading to maternal and neonatal/perinatal mortality or morbidity and may allow timely referral to higher-resource centers. However, there are significant global inequalities in access to imaging and many challenges to deploying ultrasound to rural areas. In this study, we tested a novel, innovative Obstetric telediagnostic ultrasound system in which the imaging acquisitions are obtained by an operator without prior ultrasound experience using simple scan protocols based only on external body landmarks and uploaded using low-bandwidth internet for asynchronous remote interpretation by an off-site specialist. Methods This is a single-center pilot study. A nurse and care technician underwent 8 h of training on the telediagnostic system. Subsequently, 126 patients (68 second trimester and 58 third trimester) were recruited at a health center in Lima, Peru and scanned by these ultrasound-naïve operators. The imaging acquisitions were uploaded by the telemedicine platform and interpreted remotely in the United States. Comparison of telediagnostic imaging was made to a concurrently performed standard of care ultrasound obtained and interpreted by an experienced attending radiologist. Cohen’s Kappa was used to test agreement between categorical variables. Intraclass correlation and Bland-Altman plots were used to test agreement between continuous variables. Results Obstetric ultrasound telediagnosis showed excellent agreement with standard of care ultrasound allowing the identification of number of fetuses (100% agreement), fetal presentation (95.8% agreement, κ =0.78 (p < 0.0001)), placental location (85.6% agreement, κ =0.74 (p < 0.0001)), and assessment of normal/abnormal amniotic fluid volume (99.2% agreement) with sensitivity and specificity > 95% for all variables. Intraclass correlation was good or excellent for all fetal biometric measurements (0.81–0.95). The majority (88.5%) of second trimester ultrasound exam biometry measurements produced dating within 14 days of standard of care ultrasound. Conclusion This Obstetric ultrasound telediagnostic system is a promising means to increase access to diagnostic Obstetric ultrasound in low-resource settings. The telediagnostic system demonstrated excellent agreement with standard of care ultrasound. Fetal biometric measurements were acceptable for use in the detection of gross discrepancies in fetal size requiring further follow up.

Download Full-text

Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items

Educational and Psychological Measurement ◽

10.1177/0013164419899731 ◽

2020 ◽

Vol 80 (4) ◽

pp. 808-820

Author(s):

Cindy M. Walker ◽

Sakine Göçer Şahin

Keyword(s):

Differential Item Functioning ◽

Interrater Reliability ◽

Rating Scales ◽

Rating Scale ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Promising Alternative ◽

Constructed Response ◽

Polytomous Item ◽

Item Functioning

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared with traditional interrater reliability measures. Three different procedures that can be used as measures of interrater reliability were compared: (1) intraclass correlation coefficient (ICC), (2) Cohen’s kappa statistic, and (3) DIF statistic obtained from Poly-SIBTEST. The results of this investigation indicated that DIF procedures appear to be a promising alternative to assess the interrater reliability of constructed response items, or other polytomous types of items, such as rating scales. Furthermore, using DIF to assess interrater reliability does not require a fully crossed design and allows one to determine if a rater is either more severe, or more lenient, in their scoring of each individual polytomous item on a test or rating scale.

Download Full-text

Development and Initial Validation of a Project-Based Rubric to Assess the Systems-Based Practice Competency of Residents in the Clinical Chemistry Rotation of a Pathology Residency

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2013-0046-oa ◽

2014 ◽

Vol 138 (6) ◽

pp. 809-813

Author(s):

Carolyn R. Vitek ◽

Jane C. Dale ◽

Henry A. Homburger ◽

Sandra C. Bryant ◽

Amy K. Saenger ◽

...

Keyword(s):

Critical Thinking ◽

Interrater Reliability ◽

Clinical Chemistry ◽

Core Competencies ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Thinking Skills ◽

Project Evaluation ◽

Critical Thinking Skills

Context.— Systems-based practice (SBP) is 1 of 6 core competencies required in all resident training programs accredited by the Accreditation Council for Graduate Medical Education. Reliable methods of assessing resident competency in SBP have not been described in the medical literature. Objective.— To develop and validate an analytic grading rubric to assess pathology residents' analyses of SBP problems in clinical chemistry. Design.— Residents were assigned an SBP project based upon unmet clinical needs in the clinical chemistry laboratories. Using an iterative method, we created an analytic grading rubric based on critical thinking principles. Four faculty raters used the SBP project evaluation rubric to independently grade 11 residents' projects during their clinical chemistry rotations. Interrater reliability and Cronbach α were calculated to determine the reliability and validity of the rubric. Project mean scores and range were also assessed to determine whether the rubric differentiated resident critical thinking skills related to the SBP projects. Results.— Overall project scores ranged from 6.56 to 16.50 out of a possible 20 points. Cronbach α ranged from 0.91 to 0.96, indicating that the 4 rubric categories were internally consistent without significant overlap. Intraclass correlation coefficients ranged from 0.63 to 0.81, indicating moderate to strong interrater reliability. Conclusions.— We report development and statistical analysis of a novel SBP project evaluation rubric. The results indicate the rubric can be used to reliably assess pathology residents' critical thinking skills in SBP.

Download Full-text

Development of a Model for the Acquisition and Assessment of Advanced Laparoscopic Suturing Skills Using an Automated Device

Surgical Innovation ◽

10.1177/1553350618764221 ◽

2018 ◽

Vol 25 (3) ◽

pp. 286-290 ◽

Cited By ~ 2

Author(s):

Elif Bilgic ◽

Madoka Takao ◽

Pepa Kaneva ◽

Satoshi Endo ◽

Toshitatsu Takao ◽

...

Keyword(s):

Laparoscopic Surgery ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Instructional Video ◽

Validity Evidence ◽

Laparoscopic Suturing ◽

Intraclass Correlation Coefficients ◽

Operative Assessment ◽

Suturing Skills

Background. Needs assessment identified a gap regarding laparoscopic suturing skills targeted in simulation. This study collected validity evidence for an advanced laparoscopic suturing task using an Endo StitchTM device. Methods. Experienced (ES) and novice surgeons (NS) performed continuous suturing after watching an instructional video. Scores were based on time and accuracy, and Global Operative Assessment of Laparoscopic Surgery. Data are shown as medians [25th-75th percentiles] (ES vs NS). Interrater reliability was calculated using intraclass correlation coefficients (confidence interval). Results. Seventeen participants were enrolled. Experienced surgeons had significantly greater task (980 [964-999] vs 666 [391-711], P = .0035) and Global Operative Assessment of Laparoscopic Surgery scores (25 [24-25] vs 14 [12-17], P = .0029). Interrater reliability for time and accuracy were 1.0 and 0.9 (0.74-0.96), respectively. All experienced surgeons agreed that the task was relevant to practice. Conclusion. This study provides validity evidence for the task as a measure of laparoscopic suturing skill using an automated suturing device. It could help trainees acquire the skills they need to better prepare for clinical learning.

Download Full-text

Reliability Assessment of Scores From Video-Recorded TGMD-3 Performances

Journal of Motor Learning and Development ◽

10.1123/jmld.2016-0007 ◽

2017 ◽

Vol 5 (1) ◽

pp. 59-68 ◽

Cited By ~ 16

Author(s):

Pauli Olavi Rintala ◽

Arja Kaarina Sääkslahti ◽

Susanna Iivonen

Keyword(s):

Motor Development ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Intrarater Reliability ◽

Gross Motor ◽

Gross Motor Development ◽

Percent Agreement ◽

Two Samples ◽

Ball Skills

This study examined the intrarater and interrater reliability of the Test of Gross Motor Development—3rd Edition (TGMD-3). Participants were 60 Finnish children aged between 3 and 9 years, divided into three separate samples of 20. Two samples of 20 were used to examine the intrarater reliability of two different assessors, and the third sample of 20 was used to establish interrater reliability. Children’s TGMD-3 performances were video-recorded and later assessed using an intraclass correlation coefficient, a kappa statistic, and a percent agreement calculation. The intrarater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.69 to 0.77, and percent agreement ranged from 87 to 91%. The interrater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.56 to 0.64. Percent agreement of 83% was observed for locomotor skills, ball skills, and total skills, respectively. Hop, horizontal jump, and two-hand strike assessments showed the most difference between the assessors. These results show acceptable reliability for the TGMD-3 to analyze children’s gross motor skills.

Download Full-text

Reliability and Responsiveness of Two Physical Performance Measures Examined in the Context of a Functional Training Intervention

Physical Therapy ◽

10.1093/ptj/80.1.8 ◽

2000 ◽

Vol 80 (1) ◽

pp. 8-16 ◽

Cited By ~ 61

Author(s):

Mary B King ◽

James O Judge ◽

Robert Whipple ◽

Leslie Wolfson

Keyword(s):

Physical Performance ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Intervention Group ◽

Training Intervention ◽

Control Group ◽

Walk Test ◽

Test Retest Reliability ◽

6 Minute Walk Test ◽

Minute Walk

Abstract Background and Purpose. The reliability and responsiveness of 2 physical performance measures were assessed in this nonrandomized, controlled pilot exercise intervention. Subjects. Forty-five older individuals with mobility impairment (mean age=77.9 years, SD=5.9, range=70–92) were sequentially assigned to participate in an exercise program (intervention group) or to a control group. Methods. The intervention group performed exercise 3 times a week for 12 weeks that targeted muscle force, endurance, balance, and flexibility. Outcome measures were the 8-item Physical Performance Test (PPT-8) and the 6-minute walk test. Test-retest reliability and responsiveness indexes were determined for both tests; interrater reliability was measured for the PPT-8. Results. The intraclass correlation coefficient for interrater reliability for the PPT-8 was .96. Intraclass correlation coefficients for test-retest reliability were .88 for the PPT-8 and .93 for the 6-minute walk test. The intervention group improved 2.4 points and the control group improved 0.7 point on the PPT-8, as compared with baseline measurements. There was no change in 6-minute walk test distance in the intervention group when compared with the control group. The responsiveness index was .8 for the PPT-8 and .6 for the 6-minute walk test. Conclusion and Discussion. Measurements for both the PPT-8 and the 6-minute walk test appeared to be highly reliable. The PPT-8 was more responsive than the 6-minute walk test to change in performance expected with this functional training intervention.

Download Full-text

Det första nationella provet i samhällskunskap - en studie i bedömarsamstämmighet

Acta Didactica Norge ◽

10.5617/adno.6283 ◽

2018 ◽

Vol 12 (4) ◽

pp. 13

Author(s):

Arne Löfstedt

Keyword(s):

Interrater Reliability ◽

Intraclass Correlation ◽

Ninth Graders ◽

National Agency ◽

Final Grade ◽

The Subject ◽

And Mathematics ◽

The Stability ◽

National Tests ◽

National Test

Skolämnet samhällskunskap som eget ämne existerar i princip enbart i de nordiska länderna. I många andra länder delar flera skolämnen på ämnesinnehållet, till exempel geografi och civics. Ämnesinnehållet är stort och genomgår ständig förändring. År 2013 genomfördes de första nationella proven i samhällskunskap i Sverige för årskurs 9. Med tanke på ämnets karaktär kan det vara speciellt viktigt att undersöka om dessa prov är ”rättvisa.” Avsikten med denna studie är att undersöka en aspekt av denna ”rättvisa”, nämligen interbedömarstabilitet, dvs om samma elevsvar ger upphov till samma bedömning oavsett bedömare. Skolverket i Sverige genomförde 2009 en större studie av de ämnen som då genomförde nationella prov och föreliggande studie försöker dels efterlikna och dels bygga ut upplägget från Skolverket. Studien genomfördes på de första nationella proven i samhällskunskap 2013. Genom att pröva olika reliabilitetsmått inom kategorierna ”consensus estimates”, och ”consistency estimates” analyseras resultaten, bland annat diskuteras måttet intraclass correlation. Syftet är också, då detta var de första proven, att skapa en ram för återkommande studier av Interbedömarreliabilitet. Upplägget med en större mängd lärare som genomför totalt tre bedömningar av de utvalda hela proven försöker också efterlikna bedömningssituationen ute på skolorna såtillvida att det var relativt många lärare med i studien, och de kom från olika skolor spridda över Sverige. Genom detta testas också bedömningsanvisningarnas stabilitet. Själva genomförandet var omfattande och tog två hela dagar. Resultaten pekar på en god överensstämmelse för provbetyget, det sammanfattande omdöme eleverna får. Studien avses att återupprepas under kommande år.Nyckelord: Samhällskunskap, nationella prov, interbedömarreliabilitet, intraclass correlationThe first national test in samhällskunskap – a study of interrater reliabilityAbstractThe Swedish school subject Samhällskunskap (Societal knowledge) exists basically only in the Nordic countries. In other countries a number of different subjects, such as geography and civics, share the content. The content of the subject is constantly changing, depending on how society is changing. The first national tests in Samhällskunskap for all Swedish ninth graders took place in 2013. A large part of the test contains constructed responses. Given the characteristics of the subject we consider it especially important to investigate whether these tests are “fair” or not. The intent of this study is to investigate one aspect of “fairness”, interrater reliability, meaning the degree to which the same student responses are scored equally by different raters. In 2009, the National Agency of Education in Sweden conducted a large study of the subjects Swedish, English and Mathematics. Our study aims to mimic and further develop the design of the study from 2009. Our study was carried out on the first national tests in 2013. The results were analyzed by exploring different reliability measures within the categories consensus estimates, and consistency estimates. As the 2013 tests were the first tests of its kind in Sweden the purpose was also to create a framework for regular studies of interrater reliability. The rater design with a relatively large number of teachers from all over the country, each assessing a total of three complete student test responses aimed at mimicking the way the tests are assessed in schools. This also allowed us to study the stability of our assessment rubrics. The study itself was extensive and took two days to perform. The results indicate a large compliance when it comes to the final grade of the test. The study is meant to be repeated in the coming years.Keywords: Social science, civics, national testing, interrater reliability, intraclass correlation

Download Full-text