Addition of CT to Improve the Diagnostic Confidence for the Detection of Sacroiliac Joint Erosions in Patients with Equivocal MRI Findings

2021 ◽  
pp. 084653712110565
Author(s):  
Ibrahim M. Nadeem ◽  
Sohaib Munir ◽  
Vincent Leung ◽  
Euan Stubbs

Purpose To determine if CT can improve the diagnostic confidence for the detection of sacroiliac joint (SIJ) erosions in patients with equivocal MRI findings. Methods A retrospective analysis of adult patients who had an SIJ MRI and a subsequent SIJ CT within 12 months was conducted. Using a 5-point Likert scale, two reviewers evaluated the de-identified MRI and CT images in randomized order and in separate sessions to answer the question: “Does the patient have SIJ erosions?”. A Fisher’s exact test was used to analyze the difference in diagnostic confidence, and intraclass correlation coefficient (ICC) was used to determine interrater reliability. Results 54 patients were included in the analysis (average age, 43.9 years). The average time interval between initial SIJ MRI and subsequent CT was 14.4 weeks (range, 5.6–50.3 weeks). CT resulted in significantly more cases with definitive diagnostic confidence than cases with probable or equivocal confidence compared to MRI ( P < .001). Amongst cases with equivocal findings on MRI, 73.2% of cases had definitive diagnoses on CT. There was moderate interrater agreement for MRI, with an ICC of .490 [95% CI, .258–.669], and excellent agreement for CT, with an ICC of .832 [95% CI, .728–.899]. Conclusion Overall, CT led to significantly increased diagnostic confidence and higher interrater reliability for the detection of SIJ erosions compared to MRI. Judicious use of CT may be useful in detecting SIJ erosions in patients with equivocal MRI findings.

2001 ◽  
Vol 81 (2) ◽  
pp. 799-809 ◽  
Author(s):  
Corrie J Odom ◽  
Andrea B Taylor ◽  
Christine E Hurd ◽  
Craig R Denegar

Abstract Background and Purpose. The Lateral Scapular Slide Test (LSST) is used to determine scapular position with the arm abducted 0, 45, and 90 degrees in the coronal plane. Assessment of scapular position is based on the derived difference measurement of bilateral scapular distances. The purpose of this study was to assess the reliability of measurements obtained using the LSST and whether they could be used to identify people with and without shoulder impairments. Subjects. Forty-six subjects ranging in age from 18 to 65 years (X̄=30.0, SD=11.1) participated in this study. One group consisted of 20 subjects being treated for shoulder impairments, and one group consisted of 26 subjects without shoulder impairments. Methods. Two measurements in each test position were obtained bilaterally. From the bilateral measurements, we derived the difference measurement. Intraclass correlation coefficients (ICC [1,1]) and the standard error of measurement (SEM) were calculated for intrarater and interrater reliability of the difference in side-to-side measures of scapular distance. Sensitivity and specificity of the LSST for classifying subjects with and without shoulder impairments were also determined. Results. The ICCs for intrarater reliability were .75, .77, and .80 and .52, .66, and .62, respectively, for subjects without and with shoulder impairments in 0, 45, and 90 degrees of abduction. The ICCs for interrater reliability were .67, .43, and .74 and .79, .45, and .57, respectively, for subjects without and with shoulder impairments in 0, 45 and 90 degrees of abduction. The SEMs ranged from 0.57 to 0.86 cm for intrarater reliability and from 0.79 to 1.20 cm for interrater reliability. Using the criterion of greater than 1.0 cm difference, sensitivity and specificity were 35% and 48%, 41% and 54%, and 43% and 56%, respectively, for 0, 45, and 90 degrees of abduction. Sensitivity and specificity based on the criterion of greater than 1.5 cm difference were 28% and 53%, 50% and 58%, and 34% and 52%, respectively, for the 3 scapular positions. Conclusion and Discussion. Our results suggest that measurements of scapular positioning based on the difference in side-to-side scapular distance measures are not reliable. Furthermore, the results suggest that sensitivity and specificity of the LSST measurements are poor and that the LSST should not be used to identify people with and without shoulder dysfunction.


Author(s):  
Jenell L. S. Wittmer ◽  
James M. LeBreton

Statistics used to index interrater similarity are prevalent in many areas of the social sciences, with multilevel research being one of the most common domains for estimating interrater similarity. Multilevel research spans multiple hierarchical levels, such as individuals, teams, departments, and the organization. There are three main research questions that multilevel researchers answer using indices of interrater agreement and interrater reliability: (a) Does the nesting of lower-level units (e.g., employees) within higher-level units (e.g., work teams) result in the non-independence of residuals, which is an assumption of the general linear model?; (b) Is there sufficient agreement between scores on measures collected from lower-level units (e.g., employees perceptions of customer service climate) to justify aggregating data to the higher-level (e.g., team-level climate)?; and (c) Following data aggregation, how effective are the higher-level unit means at distinguishing between those higher levels (e.g., how reliably do team climate scores distinguish between the teams)? Interrater agreement and interrater reliability refer to the extent to which lower-level data nested or clustered within a higher-level unit are similar to one another. While closely related, interrater agreement and reliability differ from one another in how similarity is defined. Interrater reliability is the relative consistency in lower-level data. For example, to what degree do the scores assigned by raters tend to correlate with one another? Alternatively, interrater agreement is the consensus of the lower-level data points. For example, estimates of interrater agreement are used to determine the extent to which ratings made by judges/observers could be considered interchangeable or equivalent in terms of their values. Thus, while interrater agreement and reliability both estimate the similarity of ratings by judges/observers, but they define interrater similarity in slightly different ways, and these statistics are suited to address different types of research questions. The first research question that these statistics address, the issue of non-independence, is typically measured using an interclass correlation statistic that is a function of both interrater reliability and agreement. However, in the context of non-independence, the intraclass correlation is most often interpreted as an effect size. The second multilevel research question, concerning adequate agreement to aggregate lower-level data to a higher level, would require a measure on interrater agreement, as the research is looking for consensus among raters. Finally, the third multilevel research question, concerning the reliability of higher-level means, not only requires a different variation of the intraclass correlation, but is also a function of both interrater reliability and agreement. Multilevel research requires researchers to appropriately apply interrater agreement and/or reliability statistics to their data, as well as follow best practices for calculating and interpreting these statistics.


2016 ◽  
Vol 2016 ◽  
pp. 1-4 ◽  
Author(s):  
Süleyman Çelebi ◽  
Seyithan Özaydın ◽  
Cemile Beşik Baştaş ◽  
Özgür Kuzdan ◽  
Cankat Erdoğan ◽  
...  

Aim. Vesicoureteral reflux (VUR) is one of the most common conditions seen in pediatric urology. Fortunately, there are many treatment options for this disorder. The grading system for VUR varies among doctors, and the literature on its reliability is sparse. Here, we assessed the effectiveness of the current VUR grading system.Methods. A series of 40 voiding cystourethrogram (VCUG) studies were selected. Four pediatric urologists (PU) and four pediatric radiologists (PR) independently graded each VCUG and then agreed on a uniform interpretation. For statistical analysis the intraclass correlation coefficient (ICC) was applied to assess interrater agreement.Results. ICC values ranging from 0.82 to 0.88 reflected the strong reliability of VCUG for grading cases of VUR among pediatric urologists and radiologists as separate groups, and the reliability between the two groups was also good, as indicated by an ICC of 0.89. Despite the high ICC, disagreement existed between raters; the lowest agreement was associated with middle grades (III and IV).Conclusions. The interrater reliability of the international grading system for VUR was high but imperfect. Thus, grading differences at middle grades can profoundly influence the type of treatment pursued.


2018 ◽  
Vol 09 (04) ◽  
pp. 510-515
Author(s):  
Blagica Stanoevska ◽  
Luis Anunciação ◽  
Jane Squires ◽  
Ajay Singh ◽  
Vladimir Trajkovski

ABSTRACT Context: Early detection of developmental problems is critical, and interventions are more effective when they are carried out early in a child's life. In Macedonia, there are only four centers providing early intervention services. Aims: In this research, we determined the reliability of the translation and adaptation of Ages and Stages Questionnaires 3rd edition (ASQ-3-M) for assessment of children aged 3–5 years old in Macedonia, and reported preliminary results of the gender differences in the development. Materials and Methods: ASQ-3-M was completed by 165 parents and 40 educators in seven kindergarten classrooms. Children were 3–5 years old. Statistical Analysis Used: Cronbach's alpha, Intraclass Correlation coefficient (ICC), and interrater reliability (IRR) were used to assess ASQ-3-M psychometric properties. The Bayesian t-test was performed to estimate the difference in means between males and females. Results: The Cronbach's alpha ranged from 0.65 to 0.87. The overall ICC was 0.89 (ranged from 0.8 to 0.95), which indicates a strong to almost perfect strength of agreement between test-retest. IRR correlation revealed an average of 0.88 (ranged from 0.74 to 0.95), suggesting that ASQ-3-M is reliable and stable. Conclusions: The results from the comparison between males and females on all dimensions of ASQ-3-M were not statistically significant (BF10 <3), indicating no significant gender difference. That said, the ASQ-3 is recommended for routine use in screening children aged 3–5 years old.


2021 ◽  
Author(s):  
Chin-Suk Cho ◽  
Lauren Tollefson ◽  
Kenneth Reckelhoff

Abstract ObjectiveThe Thessaly test is a commonly used orthopedic test for meniscus tear evaluation. The study’s objective is to evaluate the degree of medial meniscal extrusion during different loading phases of the Thessaly test. MethodsA convenience sample of 60 healthy knees (35 participants) was examined. Sonographic measurement of the degree of physiologic extrusion of the medial meniscus deep to the medial collateral ligament was taken by two examiners at six different loading phases: supine, standing, 5° knee-flexion with internal (IR)/external (ER) rotation and 20° knee-flexion with IR/ER. The difference in meniscal extrusion by knee position was compared with ANOVA. Interrater reliability assessment was analyzed using intraclass correlation coefficient. ResultsThe mean meniscal extrusion for each position was - supine: 2.3±0.5mm, standing: 2.8±0.8mm, 5° IR: 2.3±0.9mm, 5° ER: 2.4±0.7mm, 20° IR: 1.9±0.8mm, and 20° ER: 2.3±0.7mm. Significant increase in extrusion was observed from supine to standing (p<0.05) and from 20° IR to 20° ER (p=0.015). Significant decreased measurement was observed from standing to 5° IR (p<0.05), 5° ER (p<0.05), 20° IR (p<0.05) and 20° ER (p<0.05). There is no significant change between 5° IR and 5° ER (p=1.0). Interrater reliability of the measurements across the six positions was poor to moderate (0.35-0.57, p<0.05). ConclusionOur study’s novel findings showed clear dynamic changes during Thessaly test, which implies increase in compressive stress across the medial meniscus and a potential mechanism for pain generation during this test. Further testing is needed to address the poor-moderate reliability and confirm findings.


2020 ◽  
Vol 29 (3) ◽  
pp. 50-57
Author(s):  
Pedram Borghei ◽  
◽  
Shadman Nemati ◽  
Suzan Adel ◽  
Mehdi Nikkhah ◽  
...  

Background: For many years, Canal Wall Down (CWD) tympanomastoidectomy has been the gold standard for treatment of cholesteatoma; however, this method has long-term complications for the patients. The Intact Canal Wall (ICW) tympanomastoidectomy has relatively lower complications, but access to the middle-ear recesses is difficult in this method. Therefore, endoscopy is used to visualize the underexposed recesses. Objective: This study aims to compare the incidence of residual cholesteatoma using the two methods of CWD and endoscopic-assisted ICW. Materials and Methods: In this prospective randomized clinical trial, participants were 40 patients with cholesteatoma in the middle ear and mastoid who were candidates for tympanomastoidectomy. They were randomly divided into two groups. In the first group, ICW was performed with endoscopic assisted visualization, while in the second group, conventional CWD technique was performed without ossicular reconstruction. All the patients were microscopically examined at 3, 6, 9 and 12 months after surgery. Revision middle ear surgery and possible ossicular reconstruction under local anesthesia were performed one year after the surgery. The presence of cholesteatoma pearl in the middle-ear, evaluated by using a 2.7mm 30° endoscope, was recorded as the sign of residual cholesteatoma. Fisher’s exact test and Mann-Whitney U test were used for statistical analysis. Significance level for the tests was set at 5%. Results: The incidence of residual cholesteatoma was not statistically significant between the two groups (P>0.05). In each group, 20% (n=4) had residual cholesteatoma. The difference in time interval from the first to second surgery was not statistically significant between the study groups (P>0.05). Conclusion: Endoscopic-assisted ICW tympanomastoidectomy is comparable with CWD tympanomastoidectomy in eradication of cholesteatoma, having possibly fewer complications. It is recommended that more studies be conducted with a larger sample size and longer follow-up period.


1991 ◽  
Vol 34 (5) ◽  
pp. 989-999 ◽  
Author(s):  
Stephanie Shaw ◽  
Truman E. Coggins

This study examines whether observers reliably categorize selected speech production behaviors in hearing-impaired children. A group of experienced speech-language pathologists was trained to score the elicited imitations of 5 profoundly and 5 severely hearing-impaired subjects using the Phonetic Level Evaluation (Ling, 1976). Interrater reliability was calculated using intraclass correlation coefficients. Overall, the magnitude of the coefficients was found to be considerably below what would be accepted in published behavioral research. Failure to obtain acceptably high levels of reliability suggests that the Phonetic Level Evaluation may not yet be an accurate and objective speech assessment measure for hearing-impaired children.


GeroPsych ◽  
2014 ◽  
Vol 27 (1) ◽  
pp. 23-31 ◽  
Author(s):  
Anne Kuemmel (This author contributed eq ◽  
Julia Haberstroh (This author contributed ◽  
Johannes Pantel

Communication and communication behaviors in situational contexts are essential conditions for well-being and quality of life in people with dementia. Measuring methods, however, are limited. The CODEM instrument, a standardized observational communication behavior assessment tool, was developed and evaluated on the basis of the current state of research in dementia care and social-communicative behavior. Initially, interrater reliability was examined by means of videoratings (N = 10 people with dementia). Thereupon, six caregivers in six German nursing homes observed 69 residents suffering from dementia and used CODEM to rate their communication behavior. The interrater reliability of CODEM was excellent (mean κ = .79; intraclass correlation = .91). Statistical analysis indicated that CODEM had excellent internal consistency (Cronbach’s α = .95). CODEM also showed excellent convergent validity (Pearson’s R = .88) as well as discriminant validity (Pearson’s R = .63). Confirmatory factor analysis verified the two-factor solution of verbal/content aspects and nonverbal/relationship aspects. With regard to the severity of the disease, the content and relational aspects of communication exhibited different trends. CODEM proved to be a reliable, valid, and sensitive assessment tool for examining communication behavior in the field of dementia. CODEM also provides researchers a feasible examination tool for measuring effects of psychosocial intervention studies that strive to improve communication behavior and well-being in dementia.


2003 ◽  
Vol 37 (1) ◽  
pp. 40-46
Author(s):  
Rosemin Kassam ◽  
Linda G Martin ◽  
Karen B Farris ◽  
Homero A Monsanto ◽  
Jean-Marie Kaiser

Background The medication appropriateness index (MAI) has demonstrated reliability in selected outpatient clinics where medical data were easily accessible from medical charts. However, its use in the community setting where patient data may be limited has not been examined. Objective To evaluate the usefulness of a modified MAI for use in the community pharmacy setting by testing interrater reliability using 3 different rating schemes. Methods Two raters evaluated 160 medications for 32 elderly ambulatory patients. Patient information was acquired using community pharmacist-collected medication histories. A summated MAI score, percent agreement, κ, positive agreement, negative agreement, and intraclass correlation coefficient were calculated for each criterion using 3 scoring schemes. A paired samples t-test (95% CI) was used to test interrater reliability. Results The κ statistics were >0.75 for indication and effectiveness, but good (0.41–0.66) for the remaining criteria using the Hanlon scoring scheme. The intraclass coefficients (0.82, 0.86, 0.87) and overall κ (0.65, 0.66, 0.61) were similar for the 3 schemes. Conclusions This study suggests that the modified MAI has the potential to detect medication appropriateness and inappropriateness in the community pharmacy setting; however, it is not without limitations. Because the MAI has the most clinimetric and psychometric data available, the instrument should be studied further to increase its reliability and generalizability.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Minjeong Kim ◽  
Ja Young Oh ◽  
Seon Ha Bae ◽  
Seung Hyeun Lee ◽  
Won Jun Lee ◽  
...  

AbstractWe evaluated the reliability and validity of the 5-scale grading system to interpret the point-of-care immunoassay for tear matrix metalloproteinase (MMP)-9. Six observers graded red bands of photographs of the readout window in MMP-9 immunoassay kit (InflammaDry) two times with 2-week interval based on the 5-scale grading system (i.e. grade 0–4). Interobserver and intraobserver reliability were evaluated using intraclass correlation coefficients. The interobserver agreements were analyzed according to the severity of tear MMP-9 expression. To validate the system, a concentration calibration curve was made using MMP-9 solutions with reference concentrations, then the distribution of MMP-9 concentrations was analyzed according to the 5-scale grading system. Both intraobserver and interobserver reliability was excellent. The readout grades were significantly correlated with the quantified colorimetric densities. The interobserver variance of readout grades had no correlation with the severity of the measured densities. The band density continued to increase up to a maximal concentration (i.e. 5000 ng/mL) according to the calibration curve. The difference of grades reflected the change of MMP-9 concentrations sensitively, especially between grade 2 and 4. Together, our data indicate that the subjective 5-scale grading system in the point-of-care MMP-9 immunoassay is an easy and reliable method with acceptable accuracy.


Sign in / Sign up

Export Citation Format

Share Document