Low diagnostic accuracy and inter-observer agreement on CT and MRI in diagnosis of spinal fractures in multiple myeloma

Skeletal disease is common in multiple myeloma. We investigated the inter-observer agreement and diagnostic accuracy of spinal fractures diagnosed by computer tomography (CT) and magnetic resonance imaging (MRI) from 12 myeloma patients. Two radiologists independently assessed the images. CT, MRI, and other images were combined to a gold standard. The inter-observer agreement was assessed with Cohen’s kappa. Radiologist 1 diagnosed 20 malignant spinal fractures on CT and 26 on MRI, while radiologist 2 diagnosed 12 malignant spinal fractures on CT and 22 on MRI. In comparison the gold standard diagnosed 10 malignant spinal fractures. The sensitivity for malignant fractures varied from 0.5 to 1 for CT and MRI, and the specificity varied from 0.17 to 0.67. On MRI, the specificity for malignant spinal fractures was 0.17 for both radiologists. The inter-observer agreement for malignant spinal fractures on CT was -0.42 (Cohen’s kappa) and -0.13 for MRI, while for osteoporotic fractures it was -0.24 for CT and 0.53 for MRI. We conclude that malignant spinal fractures were over-diagnosed on CT and MRI. The inter-observer agreement was extremely poor.

Download Full-text

Classification of Shoulder X-ray Images with Deep Learning Ensemble Models

Applied Sciences ◽

10.3390/app11062723 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2723

Author(s):

Fatih Uysal ◽

Fırat Hardalaç ◽

Ozan Peker ◽

Tolga Tolunay ◽

Nil Tokgöz

Keyword(s):

Deep Learning ◽

Performance Test ◽

The Body ◽

Test Accuracy ◽

Cohen’S Kappa ◽

X Ray ◽

Cohen's Kappa ◽

Auc Value ◽

Magnetic Resonance Imaging Mri ◽

Fully Connected

Fractures occur in the shoulder area, which has a wider range of motion than other joints in the body, for various reasons. To diagnose these fractures, data gathered from X-radiation (X-ray), magnetic resonance imaging (MRI), or computed tomography (CT) are used. This study aims to help physicians by classifying shoulder images taken from X-ray devices as fracture/non-fracture with artificial intelligence. For this purpose, the performances of 26 deep learning-based pre-trained models in the detection of shoulder fractures were evaluated on the musculoskeletal radiographs (MURA) dataset, and two ensemble learning models (EL1 and EL2) were developed. The pre-trained models used are ResNet, ResNeXt, DenseNet, VGG, Inception, MobileNet, and their spinal fully connected (Spinal FC) versions. In the EL1 and EL2 models developed using pre-trained models with the best performance, test accuracy was 0.8455, 0.8472, Cohen’s kappa was 0.6907, 0.6942 and the area that was related with fracture class under the receiver operating characteristic (ROC) curve (AUC) was 0.8862, 0.8695. As a result of 28 different classifications in total, the highest test accuracy and Cohen’s kappa values were obtained in the EL2 model, and the highest AUC value was obtained in the EL1 model.

Download Full-text

Diagnostic performance of 3D-multi-Echo-data-image-combination (MEDIC) for evaluating SLAP lesions of the shoulder

BMC Musculoskeletal Disorders ◽

10.1186/s12891-019-2986-1 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Felix Wuennemann ◽

Laurent Kintzelé ◽

Felix Zeifang ◽

Michael W. Maier ◽

Iris Burkholder ◽

...

Keyword(s):

Shoulder Pain ◽

Predictive Value ◽

Gold Standard ◽

Proton Density ◽

Diagnostic Challenge ◽

Cohen’S Kappa ◽

Slap Lesions ◽

Cohen's Kappa ◽

Sensitivity Specificity ◽

Interreader Agreement

Abstract Background Superior labral anterior to posterior (SLAP) lesions remain a clinical and diagnostic challenge in routine (non-arthrographic) MR examinations of the shoulder. This study prospectively evaluated the ability of 3D-Multi-Echo-Data-Image-Combination (MEDIC) compared to that of routine high resolution 2D-proton-density weighted fat-saturated (PD fs) sequence using 3 T-MRI to detect SLAP lesions using arthroscopy as gold standard. Methods Seventeen consecutive patients (mean age, 51.6 ± 14.8 years, 11 males) with shoulder pain underwent 3 T MRI including 3D-MEDIC and 2D-PD fs followed by arthroscopy. The presence or absence of SLAP lesions was evaluated using both sequences by two independent raters with 4 and 14 years of experience in musculoskeletal MRI, respectively. During arthroscopy, SLAP lesions were classified according to Snyder’s criteria by two certified orthopedic shoulder surgeons. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of 3D-MEDIC and 2D-PD fs for detection of SLAP lesions were calculated with reference to arthroscopy as a gold standard. Interreader agreement and sequence correlation were analyzed using Cohen’s kappa coefficient. Figure 1 demonstrates the excellent visibility of a proven SLAP lesion using the 3D-MEDIC and Fig. 2 demonstrates a false-positive case. Results Arthroscopy revealed SLAP lesions in 11/17 patients. Using 3D-MEDIC, SLAP lesions were diagnosed in 14/17 patients by reader 1 and in 13/17 patients by reader 2. Using 2D-PD fs, SLAP lesions were diagnosed in 11/17 patients by reader 1 and 12/17 patients for reader 2. Sensitivity, specificity, PPV, and NPV of 3D-MEDIC were 100.0, 50.0, 78.6, and 100.0% for reader 1; and 100.0, 66.7, 84.6, and 100% for reader 2, respectively. Sensitivity, specificity, PPV, and NPV of 2D-PD fs were 90.9, 83.3, 90.9, and 83.3% for reader 1 and 100.0, 83.3, 91.7, and 100.0% for reader 2. The combination of 2D-PD fs and 3D-MEDIC increased specificity from 50.0 to 83.3% for reader 1 and from 66.7 to 100.0% for reader 2. Interreader agreement was almost perfect with a Cohen’s kappa of 0.82 for 3D-MEDIC and 0.87 for PD fs. Conclusions With its high sensitivity and NPV, 3D-MEDIC is a valuable tool for the evaluation of SLAP lesions. As the combination with routine 2D-PD fs further increases specificity, we recommend incorporation of 3D-MEDIC as an additional sequence in conventional shoulder protocols in patients with non-specific shoulder pain.

Download Full-text

Ultra-short Echo Time MR Imaging in Assessing Cartilage Endplate Damage And Association Between Its Lesion and Disc Degeneration For Chronic Low Back Pain Patients

10.21203/rs.3.rs-1078137/v1 ◽

2021 ◽

Author(s):

Zhilin Ji ◽

Weiqiang Dou ◽

Yaru Zhu ◽

Yin Shi ◽

Yuefen Zou

Keyword(s):

Image Quality ◽

Observer Agreement ◽

Cohen’S Kappa ◽

Kendall’S Tau ◽

Cartilage Endplate ◽

Cohen's Kappa ◽

Echo Time ◽

Short Echo Time ◽

Ivd Degeneration ◽

The Relationship

Abstract Objective: To investigate the feasibility of ultra-short echo time (UTE) MRI in assessing cartilage endplate (CEP) damage and evaluating the relationship between total endplate score (TEPS) and lumbar intervertebral disc (IVD) degeneration.Materials and methods: 35 patients were measured for IVD using UTE imaging at 3T MR. Subtracted UTE images between short and long TEs were obtained to depict anatomy of CEP. The SNR and CNR were calculated to assess the image quality. A new grading criterion for endplate evaluation was developed based on Rajasekarank.S grading in this study. Two radiologists were employed to evaluate CEP and bony vertebral endplates (VEP) using new grading criterion and assess TEPS, independently. Cohen's kappa analysis was applied to evaluate the inter-observer agreement of endplate damage assessment between two radiologists, and the Kendall's TAU-B analysis was employed to determine the relationship between TEPS and IVD degeneration evaluated with Pfirrmann grading.Results: Well structural CEP was depicted on subtracted UTE images and confirmed by high SNR (33.0±2.92) and CNR values (9.4±2.08). Qualified subtracted UTE images were used by two radiologists to evaluate CEP and VEP damage. Excellent inter-observer agreement was confirmed by high value in Cohen's kappa test (0.839,P<0.001). Ensured by this, 138 endplates from 69 IVDs of 35 patients were classified into six grades based on the new grading criterion and TEPS of each endplate was calculated. In addition, the degeneration degree of IVDs were classified into five grades. Finally, using Kendall's TAU-B analysis, significant relationship was obtained between endplate damage related TEPS and IVD degeneration (r= 0.864,P<0.001).Conclusion: Ensured by high image quality, UTE imaging might be considered an effective tool to assess CEP damage. Additionally, further calculated TEPS has shown strong positive association with IVD degeneration, suggesting that the severity of endplate damage is highly linked with the degree of IVD degeneration.

Download Full-text

Diagnostic accuracy of B-scan in predicting retinoblastoma in children, taking Magnetic Resonance Imaging of orbits as gold standard

Journal of the Pakistan Medical Association ◽

10.47391/jpma.1305 ◽

2020 ◽

pp. 1-7

Author(s):

Saba Murad ◽

Ishtiaq Ahmed ◽

Hania Ali ◽

Maria Ghani ◽

Sana Murad

Keyword(s):

Magnetic Resonance Imaging ◽

Magnetic Resonance ◽

Diagnostic Accuracy ◽

Gold Standard ◽

Radiology Department ◽

Inclusion Criteria ◽

Resonance Imaging ◽

Cross Sectional ◽

Magnetic Resonance Imaging Mri ◽

Sensitivity Specificity

Abstract The objective of this study was to determine the diagnostic accuracy of B-scan in predicting retinoblastoma (Rb) taking Magnetic Resonance Imaging (MRI) as a gold standard. A cross-sectional validation study was conducted in the Radiology Department of Fauji Foundation Hospital from May 20 to Nov 20, 2017. Children fulfilling the inclusion criteria were selected after informed consent and detailed history was taken for investigation of Rb. B-scan of both eyes was done using 7.5-10 MHz probe, followed by MRI of both eyes in the same patients using 1.5 Tesla MRI machine with the help of qualified MRI technicians. Data analysis was done by SPSS version 16.0. The diagnostic accuracy, sensitivity, specificity, PPV and NPV of B-scan in prediction of Rb as compared to MRI was 90.45%, 82.28%, 90.54% and 90.28% respectively. The study concluded that diagnostic accuracy of B-scan as compared to MRI is substantial in Retinoblastoma. Continuous...

Download Full-text

Cohen’s kappa coefficient of observer agreement: A BASIC program for minicomputers

Behavior Research Methods ◽

10.3758/bf03201395 ◽

1979 ◽

Vol 11 (6) ◽

pp. 602-602 ◽

Cited By ~ 5

Author(s):

D. R. Wixon

Keyword(s):

Basic Program ◽

Kappa Coefficient ◽

Observer Agreement ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

Cohen’S Kappa Coefficient

Download Full-text

Reproducible Naevus Counts Using 3D Total Body Photography and Convolutional Neural Networks

Dermatology ◽

10.1159/000517218 ◽

2021 ◽

pp. 1-8

Author(s):

Brigid Betz-Stablein ◽

Brian D’Alessandro ◽

Uyen Koh ◽

Elsemieke Plasmeijer ◽

Monika Janda ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Gold Standard ◽

Three Dimensional ◽

Body Images ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

Lesion Level ◽

Total Body ◽

Sensitivity Specificity

Background: The number of naevi on a person is the strongest risk factor for melanoma; however, naevus counting is highly variable due to lack of consistent methodology and lack of inter-rater agreement. Machine learning has been shown to be a valuable tool for image classification in dermatology. Objectives: To test whether automated, reproducible naevus counts are possible through the combination of convolutional neural networks (CNN) and three-dimensional (3D) total body imaging. Methods: Total body images from a study of naevi in the general population were used for the training (82 subjects, 57,742 lesions) and testing (10 subjects; 4,868 lesions) datasets for the development of a CNN. Lesions were labelled as naevi, or not (“non-naevi”), by a senior dermatologist as the gold standard. Performance of the CNN was assessed using sensitivity, specificity, and Cohen’s kappa, and evaluated at the lesion level and person level. Results: Lesion-level analysis comparing the automated counts to the gold standard showed a sensitivity and specificity of 79% (76–83%) and 91% (90–92%), respectively, for lesions ≥2 mm, and 84% (75–91%) and 91% (88–94%) for lesions ≥5 mm. Cohen’s kappa was 0.56 (0.53–0.59) indicating moderate agreement for naevi ≥2 mm, and substantial agreement (0.72, 0.63–0.80) for naevi ≥5 mm. For the 10 individuals in the test set, person-level agreement was assessed as categories with 70% agreement between the automated and gold standard counts. Agreement was lower in subjects with numerous seborrhoeic keratoses. Conclusion: Automated naevus counts with reasonable agreement to those of an expert clinician are possible through the combination of 3D total body photography and CNNs. Such an algorithm may provide a faster, reproducible method over the traditional in person total body naevus counts.

Download Full-text

Diagnostic Accuracy of CareStart™ Malaria HRP2 and SD Bioline Pf/PAN for Malaria in Febrile Outpatients in Varying Malaria Transmission Settings in Cameroon

Diagnostics ◽

10.3390/diagnostics11091556 ◽

2021 ◽

Vol 11 (9) ◽

pp. 1556

Author(s):

Innocent Mbulli Ali ◽

Akindeh Mbuh Nji ◽

Jacob Chefor Bonkum ◽

Marcel Nyuylam Moyeh ◽

Guenang Kenfack Carole ◽

...

Keyword(s):

Diagnostic Accuracy ◽

Malaria Transmission ◽

Diagnostic Tests ◽

Diagnostic Performance ◽

District Hospital ◽

Rapid Diagnostic Tests ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

Malaria Rapid Diagnostic Tests ◽

Catholic Hospital

Background: There was an increase in the number of malaria cases in Cameroon in 2018 that could reflect changes in provider practice, despite effective interventions. In this study, we assessed the diagnostic performance of two malaria rapid diagnostic tests (mRDTs) for diagnostic confirmation of suspected cases of malaria in public and private health facilities in two malaria transmission settings in Cameroon. Methods: We evaluated the diagnostic performance of CareStart pf and SD Bioline Pf/PAN mRDT and compared these parameters by RDT type and transmission setting. Nested PCR and blood film microscopy were used as references. The chi square test was used for independent sample comparisons, while the McNemar’s test was used to test for the dependence of categorical data in paired sample testing. A p < 0.05 was considered significant in all comparisons. The R (v.4.0.2) software was used for analyses. Results: A total of 1126 participants consented for the study in the four sites. The diagnostic accuracy of the CareStart Pf mRDT was 0.93.6% (0.911–0.961) in Yaoundé, 0.930% (0.90–0.960) in Ngounso, 0.84% (0.794–0.891) in St Vincent Catholic Hospital Dschang and 0.407 (0.345–0.468) in Dschang district hospital. For SD Bioline Pf/PAN the accuracy was 0.759 (0.738–0.846) for St Vincent Catholic Hospital Dschang and 0.426 (0.372–0.496) for the Dschang district hospital. The accuracy was slightly lower in each case but not statistically different when PCR was considered as the reference. The likelihood ratios of the positive and negative tests were high in the high transmission settings of Yaoundé (10.99 (6.24–19.35)) and Ngounso (14.40 (7.89–26.28)) compared to the low transmission settings of Dschang (0.71 (0.37–1.37)) and St Vincent Catholic hospital (7.37 (4.32−12.59)). There was a high degree of agreement between the tests in Yaoundé (Cohen’s Kappa: 0.85 ± 0.05 (0.7–0.95)) and Ngounso (Cohen’s Kappa: 0.86 ± 0.05 (0.74, 0.97)) and moderate agreement in St Vincent hospital Dschang (k: 0.58 ± 0.06 (0.44–0.71)) and poor agreement in the District Hospital Dschang (Cohen’s Kappa: −0.11 ± 0.05 (−0.21–0.01)). The diagnostic indicators of the SD Bioline Pf/PAN were slightly better than for CareStart Pf mRDT in St Vincent Catholic hospital Dschang, irrespective of the reference test. Conclusions: Publicly procured malaria rapid diagnostic tests in Cameroon have maintained high accuracy (91–94%) in the clinical diagnosis of malaria in high malaria transmission regions of Cameroon, although they failed to reach WHO standards. We observed an exception in the low transmission region of Dschang, West region, where the accuracy tended to be lower and variable between facilities located in this town. These results underscore the importance of the routine monitoring of the quality and performance of malaria RDTs in diverse settings in malaria endemic areas.

Download Full-text

Observer Agreement and Cohen’s Kappa

Sequential Analysis and Observational Methods for the Behavioral Sciences ◽

10.1017/cbo9781139017343.006 ◽

2012 ◽

pp. 57-71

Author(s):

Roger Bakeman ◽

Vicenc Quera

Keyword(s):

Observer Agreement ◽

Cohen’S Kappa ◽

Cohen's Kappa

Download Full-text

Evaluation of Platelet Indices in Children with Renal Scarring based on Diagnostic Accuracy Criteria and Cohen’s Kappa

Pediatrics International ◽

10.1111/ped.15055 ◽

2021 ◽

Author(s):

Semra Erdoğan ◽

Serra Sürmeli Döven

Keyword(s):

Diagnostic Accuracy ◽

Renal Scarring ◽

Cohen’S Kappa ◽

Platelet Indices ◽

Cohen's Kappa

Download Full-text

Assessment of Liver Metastases Using CT and MRI Scans in Patients with Pancreatic Ductal Adenocarcinoma: Effects of Observer Experience on Diagnostic Accuracy

10.20944/preprints202005.0490.v1 ◽

2020 ◽

Author(s):

Masakatsu Tsurusaki ◽

Isao Numoto ◽

Teruyoshi Oda ◽

Miyuki Wakana ◽

Ayako Suzuki ◽

...

Keyword(s):

Diagnostic Accuracy ◽

Liver Metastases ◽

Pancreatic Ductal Adenocarcinoma ◽

Sensitivity And Specificity ◽

Correlation Coefficients ◽

Ductal Adenocarcinoma ◽

Mri Scans ◽

Magnetic Resonance Imaging Mri ◽

The Impact ◽

Ct And Mri

To investigate the impact of radiologic experience on the diagnostic accuracy of computed tomography CT vs. magnetic resonance imaging (MRI) reporting for liver metastases of pancreatic ductal adenocarcinoma (LM of PDAC). Intra-individual CT and MRI examinations of 112 patients with clinically proven LM of PDAC were included. Four radiologists with varying years of experience (A > 20, B > 5, C > 1 and D < 1) assessed liver segments affected by LM of PDAC, as well as associated metastases occurring in each patient. Their sensitivity and specificity in evaluating the segments were compared. Cohen's Kappa (κ) for diagnosed liver segments and Intra-class Correlation Coefficients (ICC) for the number of metastatic lesions in each patient were calculated. The radiologists’ sensitivity and specificity for the CT vs. MRI were, respectively: Reader A -94.4, 90.3% vs. 96.6, 94.8%; B - 86.7, 79.7% vs. 83.9, 82.0%; C - 78.0, 76.7% vs. 83.3, 78.9% and D - 71.8, 79.2% vs. 64.0, 69.5%. Reviewers A and B achieved greater agreement in assessing results from the MRI (κ = 0.72, p < 0.001; ICC = 0.73, p < 0.001) vs. the CT (κ = 0.58, p < 0.001; ICC = 0.61, p < 0.001), in contrast to readers C and D (MRI: κ = 0.34, p < 0.001; ICC = 0.42, p < 0.001, and CT: κ = 0.48, p < 0.001; ICC = 0.59, p < 0.001). Our results indicate that accurate diagnosis of LM of PDAC depends more on radiologic experience in MRI over CT scans.

Download Full-text