scholarly journals Inter-observer variability between readers of CT images: all for one and one for all

2021 ◽  
Vol 2 (2) ◽  
pp. 105-118
Author(s):  
Nikolas S. Kulberg ◽  
Roman V. Reshetnikov ◽  
Vladimir P. Novik ◽  
Alexey B. Elizarov ◽  
Maxim A. Gusev ◽  
...  

BACKGROUND: The markup of medical image datasets is based on the subjective interpretation of the observed entities by radiologists. There is currently no widely accepted protocol for determining ground truth based on radiologists reports. AIM: To assess the accuracy of radiologist interpretations and their agreement for the publicly available dataset CTLungCa-500, as well as the relationship between these parameters and the number of independent readers of CT scans. MATERIALS AND METHODS: Thirty-four radiologists took part in the dataset markup. The dataset included 536 patients who were at high risk of developing lung cancer. For each scan, six radiologists worked independently to create a report. After that, an arbitrator reviewed the lesions discovered by them. The number of true-positive, false-positive, true-negative, and false-negative findings was calculated for each reader to assess diagnostic accuracy. Further, the inter-observer variability was analyzed using the percentage agreement metric. RESULTS: An increase in the number of independent readers providing CT scan interpretations leads to accuracy increase associated with a decrease in agreement. The majority of disagreements were associated with the presence of a lung nodule in a specific site of the CT scan. CONCLUSION: If arbitration is provided, an increase in the number of independent initial readers can improve their combined accuracy. The experience and diagnostic accuracy of individual readers have no bearing on the quality of a crowd-tagging annotation. At four independent readings per CT scan, the optimal balance of markup accuracy and cost was achieved.

2020 ◽  
Vol 41 (4) ◽  
pp. 240-247
Author(s):  
Lei Yang ◽  
Qingtao Zhao ◽  
Shuyu Wang

Background: Serum periostin has been proposed as a noninvasive biomarker for asthma diagnosis and management. However, its accuracy for the diagnosis of asthma in different populations is not completely clear. Methods: This meta-analysis aimed to evaluate the diagnostic accuracy of periostin level in the clinical determination of asthma. Several medical literature data bases were searched for relevant studies through December 1, 2019. The numbers of patients with true-positive, false-positive, false-negative, and true-negative results for the periostin level were extracted from each individual study. We assessed the risk of bias by using Quality Assessment of Diagnostic Accuracy Studies 2. We used the meta-analysis to produce summary estimates of accuracy. Results: In total, nine studies with 1757 subjects met the inclusion criteria. The pooled estimates of sensitivity, specificity, and diagnostic odds ratios for the detection of asthma were 0.58 (95% confidence interval [CI], 0.38‐0.76), 0.86 (95% CI, 0.74‐0.93), and 8.28 (95% CI, 3.67‐18.68), respectively. The area under the summary receiver operating characteristic curve was 0.82 (95% CI, 0.79‐0.85). And significant publication bias was found in this meta‐analysis (p = 0.39). Conclusion: Serum periostin may be used for the diagnosis of asthma, with moderate diagnostic accuracy.


2021 ◽  
Author(s):  
Brigid A McDonald ◽  
Carlos Cardenas ◽  
Nicolette O'Connell ◽  
Sara Ahmed ◽  
Mohamed A. Naser ◽  
...  

Purpose: In order to accurately accumulate delivered dose for head and neck cancer patients treated with the Adapt to Position workflow on the 1.5T magnetic resonance imaging (MRI)-linear accelerator (MR-linac), the low-resolution T2-weighted MRIs used for daily setup must be segmented to enable reconstruction of the delivered dose at each fraction. In this study, our goal is to evaluate various autosegmentation methods for head and neck organs at risk (OARs) on on-board setup MRIs from the MR-linac for off-line reconstruction of delivered dose. Methods: Seven OARs (parotid glands, submandibular glands, mandible, spinal cord, and brainstem) were contoured on 43 images by seven observers each. Ground truth contours were generated using a simultaneous truth and performance level estimation (STAPLE) algorithm. 20 autosegmentation methods were evaluated in ADMIRE: 1-9) atlas-based autosegmentation using a population atlas library (PAL) of 5/10/15 patients with STAPLE, patch fusion (PF), random forest (RF) for label fusion; 10-19) autosegmentation using images from a patient's 1-4 prior fractions (individualized patient prior (IPP)) using STAPLE/PF/RF; 20) deep learning (DL) (3D ResUNet trained on 43 ground truth structure sets plus 45 contoured by one observer). Execution time was measured for each method. Autosegmented structures were compared to ground truth structures using the Dice similarity coefficient, mean surface distance, Hausdorff distance, and Jaccard index. For each metric and OAR, performance was compared to the inter-observer variability using Dunn's test with control. Methods were compared pairwise using the Steel-Dwass test for each metric pooled across all OARs. Further dosimetric analysis was performed on three high-performing autosegmentation methods (DL, IPP with RF and 4 fractions (IPP_RF_4), IPP with 1 fraction (IPP_1)), and one low-performing (PAL with STAPLE and 5 atlases (PAL_ST_5)). For five patients, delivered doses from clinical plans were recalculated on setup images with ground truth and autosegmented structure sets. Differences in maximum and mean dose to each structure between the ground truth and autosegmented structures were calculated and correlated with geometric metrics. Results: DL and IPP methods performed best overall, all significantly outperforming inter-observer variability and with no significant difference between methods in pairwise comparison. PAL methods performed worst overall; most were not significantly different from the inter-observer variability or from each other. DL was the fastest method (33 seconds per case) and PAL methods the slowest (3.7 - 13.8 minutes per case). Execution time increased with number of prior fractions/atlases for IPP and PAL. For DL, IPP_1, and IPP_RF_4, the majority (95%) of dose differences were within 250 cGy from ground truth, but outlier differences up to 785 cGy occurred. Dose differences were much higher for PAL_ST_5, with outlier differences up to 1920 cGy. Dose differences showed weak but significant correlations with all geometric metrics (R2 between 0.030 and 0.314). Conclusions: The autosegmentation methods offering the best combination of performance and execution time are DL and IPP_1. Dose reconstruction on on-board T2-weighted MRIs is feasible with autosegmented structures with minimal dosimetric variation from ground truth, but contours should be visually inspected prior to dose reconstruction in an end-to-end dose accumulation workflow.


2019 ◽  
Vol 34 (2) ◽  
pp. 306-314
Author(s):  
Do Hyun Kim ◽  
Youngjun Seo ◽  
Kyung Min Kim ◽  
Seoungmin Lee ◽  
Se Hwan Hwang

Background We evaluated the accuracy of nasal endoscopy in diagnosing chronic rhinosinusitis (CRS) compared with paranasal sinus computed tomography (CT). Methods Two authors independently searched the 5 databases (PubMed, SCOPUS, Embase, the Web of Science, and the Cochrane database) up to March 2019. For all included studies, we calculated correlation coefficients between the endoscopic and CT scores. We extracted data on true-positive and false-positive and true-negative and false-negative results. Methodological quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies tool (version 2). Results We included 16 observational or retrospective studies. A high correlation ( r = .8543; 95% confidence interval [CI] [0.7685–0.9401], P < .0001, I2 = 76.58%) between endoscopy and CT in terms of the diagnostic accuracy for CRS was apparent. The odds ratio (Lund–Kennedy endoscopic score ≥1) was 7.915 (95% CI [4.435–14.124]; I2 = 28.361%). The area under the summary receiver operating characteristic curve was 0.765. The sensitivity and specificity were 0.726 (95% CI [0.584–0.834]) and 0.767 (95% CI [0.685–0.849]), respectively. However, high interstudy heterogeneity was evident given the different endoscopic score thresholds used (Lund–Kennedy endoscopic score ≥1 vs 2). In a subgroup analysis of studies using a Lund–Kennedy endoscopic score threshold ≥2, the area under the summary curve was 0.881, and the sensitivity and specificity were 0.874 (95% CI [0.783–0.930]) and 0.793 (95% CI [0.366–0.962]), respectively. Conclusion Nasal endoscopy is a useful diagnostic tool; the Lund–Kennedy score was comparable with that of CT.


2017 ◽  
Vol 8 (1) ◽  
pp. 17-23
Author(s):  
Meher Angez Rahman ◽  
Salauddin Al Azad ◽  
Nazrul Islam ◽  
Sadrul Amin ◽  
Md ziaul Haque ◽  
...  

Background: CT-Scan for the detection of orbital mass among pediatric patients is very important noninvasive radiological modality. The purpose of the study was to find out CT-Scan findings of orbital mass among pediatric patients in a tertiary care hospital.Methodology: This is a cross sectional study was carried out in Ophthalmology and Radiology and Imaging department of National Institute of Ophthalmology (NIO) from January 2012 to December 2013. All the patient below 18 years of age presented with suspected orbital mass at Ophthalmology and Radiology and Imaging department of NIO and performed CT- Scan of orbit for diagnosis of the disease and also done histopathology after operation was enrolled in this study.Results: In this study it was observed that a total of 29 cases identified as malignant evaluated by CT, among them 27 cases were true positive and 2 cases were false positive. Benign was found in 41 cases evaluated by CT scan, out of which 1 false negative and 40 cases were true negative. The sensitivity in diagnosis of orbital tumor by CT was 93.3%.Conclusion: The sensitivity in diagnosis of orbital tumor by CT was high and is a useful method in the differentiation between benign and malignant orbital mass.Anwer Khan Modern Medical College Journal Vol. 8, No. 1: Jan 2017, P 17-23


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7893 ◽  
Author(s):  
Simone Macrì ◽  
Romain J.G. Clément ◽  
Chiara Spinello ◽  
Maurizio Porfiri

Zebrafish (Danio rerio) have recently emerged as a valuable laboratory species in the field of behavioral pharmacology, where they afford rapid and precise high-throughput drug screening. Although the behavioral repertoire of this species manifests along three-dimensional (3D), most of the efforts in behavioral pharmacology rely on two-dimensional (2D) projections acquired from a single overhead or front camera. We recently showed that, compared to a 3D scoring approach, 2D analyses could lead to inaccurate claims regarding individual and social behavior of drug-free experimental subjects. Here, we examined whether this conclusion extended to the field of behavioral pharmacology by phenotyping adult zebrafish, acutely exposed to citalopram (30, 50, and 100 mg/L) or ethanol (0.25%, 0.50%, and 1.00%), in the novel tank diving test over a 6-min experimental session. We observed that both compounds modulated the time course of general locomotion and anxiety-related profiles, the latter being represented by specific behaviors (erratic movements and freezing) and avoidance of anxiety-eliciting areas of the test tank (top half and distance from the side walls). We observed that 2D projections of 3D trajectories (ground truth data) may introduce a source of unwanted variation in zebrafish behavioral phenotyping. Predictably, both 2D views underestimate absolute levels of general locomotion. Additionally, while data obtained from a camera positioned on top of the experimental tank are similar to those obtained from a 3D reconstruction, 2D front view data yield false negative findings.


2021 ◽  
Vol 71 (3) ◽  
pp. 1015-19
Author(s):  
Muhammad Atif ◽  
Fida Hussain ◽  
Zaigham Salim Dar ◽  
Jameela Khatoon ◽  
Saadia Ajmal ◽  
...  

Objective: To determine diagnostic accuracy of 99mTc labelled Ubiquicidin (29-41) SPECT/CT for detection of osteomyelitis in diabetic foot patients by taking bone biopsy as gold standard. Study Design: Cross-sectional validation study. Place and Duration of Study: Nuclear Medical Centre, Armed Forces Institute of Pathology, from Apr 2017 to Mar 2018. Methodology: Study assessed 122 patients of both genders, aged between 30-80 years (mean age=55.3 years), presenting with diabetic foot ulcers having suspicion of osteomyelitis, by 99mTc-Ubiquicidin (29-41) SPECT/CT followed by bone biopsy (histopathology and culture) taken as gold standard. Results: Among 122 patients [94 male (77%) and 28 female (23%)], osteomyelitis was histopathologically confirmed in 113 patients. 107 out of these patients were positive for osteomyelitis on 99mTc-UBI (29-41) SPECT/CT (true positives) while 6 were false negative. Out of 9 patients declared negative for osteomyelitis on histopathology and culture, 8 were negative on 99mTc-UBI (29-41) SPECT/CT as well (true negative) while only 1 case came out to be positive (false positive). Thus, the 99mTc-UBI (29-41) scan showed 94.6% sensitivity, 88.89% specificity, 99% positive predictive value, 57% negative predictive value with overall 94.2% diagnostic accuracy. Conclusion: 99mTc labelled Ubiquicidin (29-41) SPECT/CT scan can precisely localize infective focus, in diabetic foot osteomyelitis, with simultaneous discrimination between bone and soft tissues.


2021 ◽  
Vol 28 (08) ◽  
pp. 1166-1171
Author(s):  
Mahwish Yasin ◽  
◽  
Huma Muzaffar ◽  
M Ahmed Zamir ◽  
Talha Munir ◽  
...  

Objective: The objective of the study was to: determine the diagnostic accuracy of AST to platelet ratio index in detecting significant fibrosis in chronic hepatitis C patients by using histopathology as gold standard. Study Design: Cross Sectional study. Settings: Department of Medicine, DHQ Hospital, Faisalabad. Period: 1st Oct 2017 to March 2018. Results: In this study, out of 158 cases, 48.73%(n=77) were between 25-40 years while 51.27%(n=81) were between 41-60 years, mean+SD was calculated as 40.94+9.10 years, 55.06%(n=87) were male and 44.94%(n=71) were females, mean AST and platelet count was calculated as 1.68+0.54 and 191.0+43.75, frequency of significant fibroids in chronic hepatitis C patients by using histopathology as gold standard reveals as 53.16%(n=84) while 46.84%(n=74) had no findings of this morbidity. The diagnostic accuracy of AST to platelet ratio index in detecting significant fibrosis in chronic hepatitis C patients by using histopathology as gold standard was recorded which shows 51.27%(n=81) as true positive, 2.53%(n=4) false positive, 1.89%(n=3) false negative and 44.31%(n=70) were recorded as true negative, sensitivity, specificity, positive predictive value, negative predictive value and accuracy rate was computed as 96.43%, 94.59%, 95.29%, 95.89% and 95.57% respectively. Conclusion: The results of the study reveal that diagnostic accuracy of AST to platelet ratio for detection of significant fibrosis in chronic Hepatitis C patients was satisfactory and it may be used for the avoidance of invasive liver biopsy to initiate the antiviral therapy in these patients.


2019 ◽  
Vol 101-B (10) ◽  
pp. 1218-1229 ◽  
Author(s):  
Till D. Lerch ◽  
Patric Eichelberger ◽  
Heiner Baur ◽  
Florian Schmaranzer ◽  
Emanuel F. Liechti ◽  
...  

Aims Abnormal femoral torsion (FT) is increasingly recognized as an additional cause for femoroacetabular impingement (FAI). It is unknown if in-toeing of the foot is a specific diagnostic sign for increased FT in patients with symptomatic FAI. The aims of this study were to determine: 1) the prevalence and diagnostic accuracy of in-toeing to detect increased FT; 2) if foot progression angle (FPA) and tibial torsion (TT) are different among patients with abnormal FT; and 3) if FPA correlates with FT. Patients and Methods A retrospective, institutional review board (IRB)-approved, controlled study of 85 symptomatic patients (148 hips) with FAI or hip dysplasia was performed in the gait laboratory. All patients had a measurement of FT (pelvic CT scan), TT (CT scan), and FPA (optical motion capture system). We allocated all patients to three groups with decreased FT (< 10°, 37 hips), increased FT (> 25°, 61 hips), and normal FT (10° to 25°, 50 hips). Cluster analysis was performed. Results We found a specificity of 99%, positive predictive value (PPV) of 93%, and sensitivity of 23% for in-toeing (FPA < 0°) to detect increased FT > 25°. Most of the hips with normal or decreased FT had no in-toeing (false-positive rate of 1%). Patients with increased FT had significantly (p < 0.001) more in-toeing than patients with decreased FT. The majority of the patients (77%) with increased FT walk with a normal foot position. The correlation between FPA and FT was significant (r = 0.404, p < 0.001). Five cluster groups were identified. Conclusion In-toeing has a high specificity and high PPV to detect increased FT, but increased FT can be missed because of the low sensitivity and high false-negative rate. These results can be used for diagnosis of abnormal FT in patients with FAI or hip dysplasia undergoing hip arthroscopy or femoral derotation osteotomy. However, most of the patients with increased FT walk with a normal foot position. This can lead to underestimation or misdiagnosis of abnormal FT. We recommend measuring FT with CT/MRI scans in all patients with FAI. Cite this article: Bone Joint J 2019;101-B:1218–1229


2016 ◽  
Vol 6 (1) ◽  
pp. 33
Author(s):  
Tahmina Islam ◽  
Salauddin Al-Azad ◽  
Lubna Khondker ◽  
Sabina Akhter

<p><strong>Background:</strong> Computed tomography (CT) is the gold standard for exact delineation of paranasal sinus(PNS) disease. There are many radiologically important diseases of paranasal sinuses.<strong> </strong></p><p><strong>Objective:</strong> to evaluate the malignant PNS mass by computed tomographic image and the findings of this modality were compared with histopathological result.</p><p><strong>Methods:</strong> It was a cross sectional type of study and carried out with suspected PNS mass having patients during January 2009 to Octo­ber 2010.</p><p><strong>Results:</strong> The mean age of the patients was 35.95 ± 18.24 and common complaints of the patients were nasal obstruction (73.7%) and maximum 53.9% patients had PNS mass in maxillary sinuses. Out of 76 cases 21.1 % found malignant mass on CT and after histopathology 19. 7% had malignant mass. Out of all cases 14 were diagnosed as malig­nant PNS mass by CT scan and confirmed by histopathological evaluation and they were true positive. Two cases were diagnosed as malignant PNS mass by CT scan but not confirmed by histopathological findings and they were false positive. Of 60 cases, which were diagnosed by CT scan, one was confirmed as malignant and 59 were benign by histopa­thology. They were false negative and true negative respectively. Sensitivity of CT scan to diagnose malignant PNS mass was 93.3%, specificity 96.7%, positive predictive value 87.5%, negative predictive value 98.3% and accuracy 96.1 %.<strong> </strong></p><p><strong>Conclusion:</strong> CT scan of the malignant para nasal sinus mass provides more information and better image quality and CT diagnosis correlate well with the findings of histopathology.</p>


Sign in / Sign up

Export Citation Format

Share Document