scholarly journals Recalibration of deep learning models for abnormality detection in smartphone-captured chest radiograph

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Po-Chih Kuo ◽  
Cheng Che Tsai ◽  
Diego M. López ◽  
Alexandros Karargyris ◽  
Tom J. Pollard ◽  
...  

AbstractImage-based teleconsultation using smartphones has become increasingly popular. In parallel, deep learning algorithms have been developed to detect radiological findings in chest X-rays (CXRs). However, the feasibility of using smartphones to automate this process has yet to be evaluated. This study developed a recalibration method to build deep learning models to detect radiological findings on CXR photographs. Two publicly available databases (MIMIC-CXR and CheXpert) were used to build the models, and four derivative datasets containing 6453 CXR photographs were collected to evaluate model performance. After recalibration, the model achieved areas under the receiver operating characteristic curve of 0.80 (95% confidence interval: 0.78–0.82), 0.88 (0.86–0.90), 0.81 (0.79–0.84), 0.79 (0.77–0.81), 0.84 (0.80–0.88), and 0.90 (0.88–0.92), respectively, for detecting cardiomegaly, edema, consolidation, atelectasis, pneumothorax, and pleural effusion. The recalibration strategy, respectively, recovered 84.9%, 83.5%, 53.2%, 57.8%, 69.9%, and 83.0% of performance losses of the uncalibrated model. We conclude that the recalibration method can transfer models from digital CXRs to CXR photographs, which is expected to help physicians’ clinical works.

2020 ◽  
Vol 34 (7) ◽  
pp. 717-730 ◽  
Author(s):  
Matthew C. Robinson ◽  
Robert C. Glen ◽  
Alpha A. Lee

Abstract Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening. We further suggest that area under the precision–recall curve should be used in conjunction with the receiver operating characteristic curve. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.


2020 ◽  
Vol 10 (4) ◽  
pp. 211 ◽  
Author(s):  
Yong Joon Suh ◽  
Jaewon Jung ◽  
Bum-Joo Cho

Mammography plays an important role in screening breast cancer among females, and artificial intelligence has enabled the automated detection of diseases on medical images. This study aimed to develop a deep learning model detecting breast cancer in digital mammograms of various densities and to evaluate the model performance compared to previous studies. From 1501 subjects who underwent digital mammography between February 2007 and May 2015, craniocaudal and mediolateral view mammograms were included and concatenated for each breast, ultimately producing 3002 merged images. Two convolutional neural networks were trained to detect any malignant lesion on the merged images. The performances were tested using 301 merged images from 284 subjects and compared to a meta-analysis including 12 previous deep learning studies. The mean area under the receiver-operating characteristic curve (AUC) for detecting breast cancer in each merged mammogram was 0.952 ± 0.005 by DenseNet-169 and 0.954 ± 0.020 by EfficientNet-B5, respectively. The performance for malignancy detection decreased as breast density increased (density A, mean AUC = 0.984 vs. density D, mean AUC = 0.902 by DenseNet-169). When patients’ age was used as a covariate for malignancy detection, the performance showed little change (mean AUC, 0.953 ± 0.005). The mean sensitivity and specificity of the DenseNet-169 (87 and 88%, respectively) surpassed the mean values (81 and 82%, respectively) obtained in a meta-analysis. Deep learning would work efficiently in screening breast cancer in digital mammograms of various densities, which could be maximized in breasts with lower parenchyma density.


2019 ◽  
Author(s):  
Hongyang Li ◽  
Yuanfang Guan

AbstractSleep arousals are transient periods of wakefulness punctuated into sleep. Excessive sleep arousals are associated with many negative effects including daytime sleepiness and sleep disorders. High-quality annotation of polysomnographic recordings is crucial for the diagnosis of sleep arousal disorders. Currently, sleep arousals are mainly annotated by human experts through looking at millions of data points manually, which requires considerable time and effort. Here we present a deep learning approach, DeepSleep, which ranked first in the 2018 PhysioNet Challenge for automatically segmenting sleep arousal regions based on polysomnographic recordings. DeepSleep features accurate (area under receiver operating characteristic curve of 0.93), high-resolution (5-millisecond resolution), and fast (10 seconds per sleep record) delineation of sleep arousals.


2019 ◽  
Vol 57 (6) ◽  
Author(s):  
Toine Mercier ◽  
Ellen Guldentops ◽  
Sofie Patteet ◽  
Kurt Beuselinck ◽  
Katrien Lagrou ◽  
...  

ABSTRACT Measuring serum beta-d-glucan (BDG) is a useful tool for supporting a quantitative PCR (qPCR)-based diagnosis of suspected Pneumocystis pneumonia (PCP) with bronchoalveolar lavage (BAL) fluid. Since the 2000s, the Fungitell assay was the only BDG assay which was FDA cleared and Conformité Européenne (CE) marked. However, the Wako β-glucan test was also recently CE marked and commercialized. We analyzed archived sera from 116 PCP cases (who were considered to have PCP based on compatible clinical and radiological findings plus a BAL fluid qPCR threshold cycle value of ≤28) and 114 controls (those with a BAL fluid qPCR threshold cycle value of >45 and no invasive fungal infection) using the Fungitell and Wako assays in parallel and assessed their diagnostic performance using the manufacturer’s proposed cutoffs of 80 pg/ml and 11 pg/ml, respectively. We found the Wako assay to be more specific (0.98 versus 0.87, P < 0.001) and the Fungitell assay to be more sensitive (0.78 versus 0.85, P = 0.039) at the proposed cutoffs. Overall performance, as determined by the area under the receiver operating characteristic curve, was similar for both assays. We determined a new Wako assay cutoff (3.616 pg/ml) to match the sensitivity of the Fungitell assay (0.88 at a cutoff of ≥60 pg/ml). Using this newly proposed cutoff, the specificity of the Wako assay was significantly better than that of the Fungitell assay (0.89 versus 0.82, P = 0.011). In conclusion, the Wako assay performed excellently compared to the Fungitell assay for the diagnosis of presumed PCP based on qPCR. In addition, contrary to the Fungitell assay, the Wako assay allows for single-sample testing with lower inter- and intrarun variability. Finally, we propose an optimized cutoff for the Wako assay to reliably exclude PCP.


2020 ◽  
Vol 102-B (11) ◽  
pp. 1574-1581
Author(s):  
Si-Cheng Zhang ◽  
Jun Sun ◽  
Chuan-Bin Liu ◽  
Ji-Hong Fang ◽  
Hong-Tao Xie ◽  
...  

Aims The diagnosis of developmental dysplasia of the hip (DDH) is challenging owing to extensive variation in paediatric pelvic anatomy. Artificial intelligence (AI) may represent an effective diagnostic tool for DDH. Here, we aimed to develop an anteroposterior pelvic radiograph deep learning system for diagnosing DDH in children and analyze the feasibility of its application. Methods In total, 10,219 anteroposterior pelvic radiographs were retrospectively collected from April 2014 to December 2018. Clinicians labelled each radiograph using a uniform standard method. Radiographs were grouped according to age and into ‘dislocation’ (dislocation and subluxation) and ‘non-dislocation’ (normal cases and those with dysplasia of the acetabulum) groups based on clinical diagnosis. The deep learning system was trained and optimized using 9,081 radiographs; 1,138 test radiographs were then used to compare the diagnoses made by deep learning system and clinicians. The accuracy of the deep learning system was determined using a receiver operating characteristic curve, and the consistency of acetabular index measurements was evaluated using Bland-Altman plots. Results In all, 1,138 patients (242 males; 896 females; mean age 1.5 years (SD 1.79; 0 to 10) were included in this study. The area under the receiver operating characteristic curve, sensitivity, and specificity of the deep learning system for diagnosing hip dislocation were 0.975, 276/289 (95.5%), and 1,978/1,987 (99.5%), respectively. Compared with clinical diagnoses, the Bland-Altman 95% limits of agreement for acetabular index, as determined by the deep learning system from the radiographs of non-dislocated and dislocated hips, were -3.27° - 2.94° and -7.36° - 5.36°, respectively (p < 0.001). Conclusion The deep learning system was highly consistent, more convenient, and more effective for diagnosing DDH compared with clinician-led diagnoses. Deep learning systems should be considered for analysis of anteroposterior pelvic radiographs when diagnosing DDH. The deep learning system will improve the current artificially complicated screening referral process. Cite this article: Bone Joint J 2020;102-B(11):1574–1581.


MicroRNA ◽  
2018 ◽  
Vol 8 (1) ◽  
pp. 86-92 ◽  
Author(s):  
Shili Jiang ◽  
Wei Jiang ◽  
Ying Xu ◽  
Xiaoning Wang ◽  
Yongping Mu ◽  
...  

Background and Objective: Accurately evaluating the severity of liver cirrhosis is essential for clinical decision making and disease management. This study aimed to evaluate the value of circulating levels of microRNA (miR)-26a and miR-21 as novel noninvasive biomarkers in detecting severity of cirrhosis in patients with chronic hepatitis B. </P><P> Methods: Thirty patients with clinically diagnosed chronic hepatitis B-related cirrhosis and 30 healthy individuals were selected. The serum levels of miR-26a and miR-21 were quantified by qRT-PCR. Receiver operating characteristic curve analysis was performed to evaluate the sensitivity and specificity of the miRNAs for detecting the severity of cirrhosis. Results: Serum miR-26a and miR-21 levels were found to be significantly downregulated in patients with severe cirrhosis scored at Child-Pugh class C in comparison to healthy controls (miR-26a p<0.01, and miR-21 p<0.001, respectively). The circulating miR-26a and miR-21 levels in patients were positively correlated with serum albumin concentration but negatively correlated with serum total bilirubin concentration and prothrombin time. Receiver operating characteristic curve analysis revealed that both serum miR-26a and miR-21 levels were associated with a high diagnostic accuracy for patients with cirrhosis scored at Child-Pugh class C (miR-26a Cut-off fold change at ≤0.4, Sensitivity: 84.62%, Specificity: 89.36%, P<0.0001; miR-21 Cut-off fold change at ≤0.6, Sensitivity: 84.62%, Specificity: 78.72%, P<0.0001). Our results indicate that the circulating levels of miR-26a and miR-21 are closely related to the extent of liver decompensation, and the decreased levels are capable of discriminating patients with cirrhosis at Child-Pugh class C from the whole cirrhosis cases.


2019 ◽  
Vol 30 (7-8) ◽  
pp. 221-228
Author(s):  
Shahab Hajibandeh ◽  
Shahin Hajibandeh ◽  
Nicholas Hobbs ◽  
Jigar Shah ◽  
Matthew Harris ◽  
...  

Aims To investigate whether an intraperitoneal contamination index (ICI) derived from combined preoperative levels of C-reactive protein, lactate, neutrophils, lymphocytes and albumin could predict the extent of intraperitoneal contamination in patients with acute abdominal pathology. Methods Patients aged over 18 who underwent emergency laparotomy for acute abdominal pathology between January 2014 and October 2018 were randomly divided into primary and validation cohorts. The proposed intraperitoneal contamination index was calculated for each patient in each cohort. Receiver operating characteristic curve analysis was performed to determine discrimination of the index and cut-off values of preoperative intraperitoneal contamination index that could predict the extent of intraperitoneal contamination. Results Overall, 468 patients were included in this study; 234 in the primary cohort and 234 in the validation cohort. The analyses identified intraperitoneal contamination index of 24.77 and 24.32 as cut-off values for purulent contamination in the primary cohort (area under the curve (AUC): 0.73, P < 0.0001; sensitivity: 84%, specificity: 60%) and validation cohort (AUC: 0.83, P < 0.0001; sensitivity: 91%, specificity: 69%), respectively. Receiver operating characteristic curve analysis also identified intraperitoneal contamination index of 33.70 and 33.41 as cut-off values for feculent contamination in the primary cohort (AUC: 0.78, P < 0.0001; sensitivity: 87%, specificity: 64%) and validation cohort (AUC: 0.79, P < 0.0001; sensitivity: 86%, specificity: 73%), respectively. Conclusions As a predictive measure which is derived purely from biomarkers, intraperitoneal contamination index may be accurate enough to predict the extent of intraperitoneal contamination in patients with acute abdominal pathology and to facilitate decision-making together with clinical and radiological findings.


2021 ◽  
Vol 10 (15) ◽  
pp. 3231
Author(s):  
Marta Gonzalez-Hernandez ◽  
Daniel Gonzalez-Hernandez ◽  
Daniel Perez-Barbudo ◽  
Paloma Rodriguez-Esteve ◽  
Nisamar Betancor-Caro ◽  
...  

Background: Laguna-ONhE is an application for the colorimetric analysis of optic nerve images, which topographically assesses the cup and the presence of haemoglobin. Its latest version has been fully automated with five deep learning models. In this paper, perimetry in combination with Laguna-ONhE or Cirrus-OCT was evaluated. Methods: The morphology and perfusion estimated by Laguna ONhE were compiled into a “Globin Distribution Function” (GDF). Visual field irregularity was measured with the usual pattern standard deviation (PSD) and the threshold coefficient of variation (TCV), which analyses its harmony without taking into account age-corrected values. In total, 477 normal eyes, 235 confirmed, and 98 suspected glaucoma cases were examined with Cirrus-OCT and different fundus cameras and perimeters. Results: The best Receiver Operating Characteristic (ROC) analysis results for confirmed and suspected glaucoma were obtained with the combination of GDF and TCV (AUC: 0.995 and 0.935, respectively. Sensitivities: 94.5% and 45.9%, respectively, for 99% specificity). The best combination of OCT and perimetry was obtained with the vertical cup/disc ratio and PSD (AUC: 0.988 and 0.847, respectively. Sensitivities: 84.7% and 18.4%, respectively, for 99% specificity). Conclusion: Using Laguna ONhE, morphology, perfusion, and function can be mutually enhanced with the methods described for the purpose of glaucoma assessment, providing early sensitivity.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1127
Author(s):  
Ji Hyung Nam ◽  
Dong Jun Oh ◽  
Sumin Lee ◽  
Hyun Joo Song ◽  
Yun Jeong Lim

Capsule endoscopy (CE) quality control requires an objective scoring system to evaluate the preparation of the small bowel (SB). We propose a deep learning algorithm to calculate SB cleansing scores and verify the algorithm’s performance. A 5-point scoring system based on clarity of mucosal visualization was used to develop the deep learning algorithm (400,000 frames; 280,000 for training and 120,000 for testing). External validation was performed using additional CE cases (n = 50), and average cleansing scores (1.0 to 5.0) calculated using the algorithm were compared to clinical grades (A to C) assigned by clinicians. Test results obtained using 120,000 frames exhibited 93% accuracy. The separate CE case exhibited substantial agreement between the deep learning algorithm scores and clinicians’ assessments (Cohen’s kappa: 0.672). In the external validation, the cleansing score decreased with worsening clinical grade (scores of 3.9, 3.2, and 2.5 for grades A, B, and C, respectively, p < 0.001). Receiver operating characteristic curve analysis revealed that a cleansing score cut-off of 2.95 indicated clinically adequate preparation. This algorithm provides an objective and automated cleansing score for evaluating SB preparation for CE. The results of this study will serve as clinical evidence supporting the practical use of deep learning algorithms for evaluating SB preparation quality.


Sign in / Sign up

Export Citation Format

Share Document