scholarly journals Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance

Author(s):  
Laura Kerschke ◽  
Stefanie Weigel ◽  
Alejandro Rodriguez-Ruiz ◽  
Nico Karssemeijer ◽  
Walter Heindel

Abstract Objectives To evaluate if artificial intelligence (AI) can discriminate recalled benign from recalled malignant mammographic screening abnormalities to improve screening performance. Methods A total of 2257 full-field digital mammography screening examinations, obtained 2011–2013, of women aged 50–69 years which were recalled for further assessment of 295 malignant out of 305 truly malignant lesions and 2289 benign lesions after independent double-reading with arbitration, were included in this retrospective study. A deep learning AI system was used to obtain a score (0–95) for each recalled lesion, representing the likelihood of breast cancer. The sensitivity on the lesion level and the proportion of women without false-positive ratings (non-FPR) resulting under AI were estimated as a function of the classification cutoff and compared to that of human readers. Results Using a cutoff of 1, AI decreased the proportion of women with false-positives from 89.9 to 62.0%, non-FPR 11.1% vs. 38.0% (difference 26.9%, 95% confidence interval 25.1–28.8%; p < .001), preventing 30.1% of reader-induced false-positive recalls, while reducing sensitivity from 96.7 to 91.1% (5.6%, 3.1–8.0%) as compared to human reading. The positive predictive value of recall (PPV-1) increased from 12.8 to 16.5% (3.7%, 3.5–4.0%). In women with mass-related lesions (n = 900), the non-FPR was 14.2% for humans vs. 36.7% for AI (22.4%, 19.8–25.3%) at a sensitivity of 98.5% vs. 97.1% (1.5%, 0–3.5%). Conclusion The application of AI during consensus conference might especially help readers to reduce false-positive recalls of masses at the expense of a small sensitivity reduction. Prospective studies are needed to further evaluate the screening benefit of AI in practice. Key Points • Integrating the use of artificial intelligence in the arbitration process reduces benign recalls and increases the positive predictive value of recall at the expense of some sensitivity loss. • Application of the artificial intelligence system to aid the decision to recall a woman seems particularly beneficial for masses, where the system reaches comparable sensitivity to that of the readers, but with considerably reduced false-positives. • About one-fourth of all recalled malignant lesions are not automatically marked by the system such that their evaluation (AI score) must be retrieved manually by the reader. A thorough reading of screening mammograms by readers to identify suspicious lesions therefore remains mandatory.

Author(s):  
Pamela Reinagel

AbstractAfter an experiment has been completed and analyzed, a trend may be observed that is “not quite significant”. Sometimes in this situation, researchers incrementally grow their sample size N in an effort to achieve statistical significance. This is especially tempting in situations when samples are very costly or time-consuming to collect, such that collecting an entirely new sample larger than N (the statistically sanctioned alternative) would be prohibitive. Such post-hoc sampling or “N-hacking” is condemned, however, because it leads to an excess of false positive results. Here Monte-Carlo simulations are used to show why and how incremental sampling causes false positives, but also to challenge the claim that it necessarily produces alarmingly high false positive rates. In a parameter regime that would be representative of practice in many research fields, simulations show that the inflation of the false positive rate is modest and easily bounded. But the effect on false positive rate is only half the story. What many researchers really want to know is the effect N-hacking would have on the likelihood that a positive result is a real effect that will be replicable: the positive predictive value (PPV). This question has not been considered in the reproducibility literature. The answer depends on the effect size and the prior probability of an effect. Although in practice these values are not known, simulations show that for a wide range of values, the PPV of results obtained by N-hacking is in fact higher than that of non-incremented experiments of the same sample size and statistical power. This is because the increase in false positives is more than offset by the increase in true positives. Therefore in many situations, adding a few samples to shore up a nearly-significant result is in fact statistically beneficial. In conclusion, if samples are added after an initial hypothesis test this should be disclosed, and if a p value is reported it should be corrected. But, contrary to widespread belief, collecting additional samples to resolve a borderline p value is not invalid, and can confer previously unappreciated advantages for efficiency and positive predictive value.


Author(s):  
Youssriah Yahia Sabri ◽  
Ikram Hamed Mahmoud ◽  
Lamis Tarek El-Gendy ◽  
Mohamed Raafat Abd El-Mageed ◽  
Sally Fouad Tadros

Abstract Background There are many causes of pleural disease including variable benign and malignant etiologies. DWI is a non-enhanced functional MRI technique that allows qualitative and quantitative characterization of tissues based on their water molecules diffusivity. The aim of this study was to evaluate the diagnostic value of DWI-MRI in detection and characterization of pleural diseases and its capability in differentiating benign from malignant pleural lesions. Results Conventional MRI was able to discriminate benign from malignant lesions by using morphological features (contour and thickness) with sensitivity 89.29%, specificity 76%, positive predictive value 89%, negative predictive value 76.92%, and accuracy 85.37%. ADC value as a quantitative parameter of DWI found that ADC values of malignant pleural diseases were significantly lower than that of benign lesions (P < 0.001). Hence, we discovered that using ADC mean value of 1.68 × 10-3 mm2/s as a cutoff value can differentiate malignant from benign pleural diseases with sensitivity 89.3%, specificity 100%, positive predictive value 100%, negative predictive value 81.2%, and accuracy 92.68% (P < 0.001). Conclusion Although DWI-MRI is unable to differentiate between malignant and benign pleural effusion, its combined morphological and functional information provide valid non-invasive method to accurately characterize pleural soft tissue diseases differentiating benign from malignant lesions with higher specificity and accuracy than conventional MRI.


2021 ◽  
Vol 11 (10) ◽  
pp. 4334
Author(s):  
Guadalupe O. Gutiérrez-Esparza ◽  
Tania A. Ramírez-delReal ◽  
Mireya Martínez-García ◽  
Oscar Infante Infante Vázquez ◽  
Maite Vallejo ◽  
...  

The exponential increase of metabolic syndrome and its association with the risk impact of morbidity and mortality has propitiated the development of tools to diagnose this syndrome early. This work presents a model that is based on prognostic variables to classify Mexicans with metabolic syndrome without blood screening applying machine and deep learning. The data that were used in this study contain health parameters related to anthropometric measurements, dietary information, smoking habit, alcohol consumption, quality of sleep, and physical activity from 2289 participants of the Mexico City Tlalpan 2020 cohort. We use accuracy, balanced accuracy, positive predictive value, and negative predictive value criteria to evaluate the performance and validate different models. The models were separated by gender due to the shared features and different habits. Finally, the highest performance model in women found that the most relevant features were: waist circumference, age, body mass index, waist to height ratio, height, sleepy manner that is associated with snoring, dietary habits related with coffee, cola soda, whole milk, and Oaxaca cheese and diastolic and systolic blood pressure. Men’s features were similar to women’s; the variations were in dietary habits, especially in relation to coffee, cola soda, flavored sweetened water, and corn tortilla consumption. The positive predictive value obtained was 84.7% for women and 92.29% for men. With these models, we offer a tool that supports Mexicans to prevent metabolic syndrome by gender; it also lays the foundation for monitoring the patient and recommending change habits.


Author(s):  
Mohamed Zidan ◽  
Shimaa Ali Saad ◽  
Eman Abo Elhamd ◽  
Hosam Eldin Galal ◽  
Reem Elkady

Abstract Background Asymmetric breast density is a potentially perplexing finding; it may be due to normal hormonal variation of the parenchymal pattern and summation artifact or it may indicate an underlying true pathology. The current study aimed to identify the role of diffusion-weighted imaging (DWI) and the apparent diffusion coefficient (ADC) values in the assessment of breast asymmetries. Results Fifty breast lesions were detected corresponding to the mammographic asymmetry. There were 35 (70%) benign lesions and 15 (30%) malignant lesions. The mean ADC value was 1.59 ± 0.4 × 10–3 mm2/s for benign lesions and 0.82 ± 0.3 × 10–3 mm2/s for malignant lesions. The ADC cutoff value to differentiate between benign and malignant lesions was 1.10 × 10–3 mm2/s with sensitivity 80%, specificity 88.6%, positive predictive value 75%, negative predictive value 91%, and accuracy 86%. Best results were achieved by implementation of the combined DCE-MRI and DWI protocol, with sensitivity 93.3%, specificity 94.3%, positive predictive value 87.5%, negative predictive value 97.1%, and accuracy 94%. Conclusion Dynamic contrast-enhanced MRI (DCE-MRI) was the most sensitive method for the detection of the underlying malignant pathology of breast asymmetries. However, it provided a limited specificity that may cause improper final BIRADS classification and may increase the unnecessary invasive procedures. DWI was used as an adjunctive method to DCE-MRI that maintained high sensitivity and increased specificity and the overall diagnostic accuracy of breast MRI examination. Best results can be achieved by the combined protocol of DCE-MRI and DWI.


2019 ◽  
Author(s):  
Rayees Rahman ◽  
Arad Kodesh ◽  
Stephen Z Levine ◽  
Sven Sandin ◽  
Abraham Reichenberg ◽  
...  

AbstractImportanceCurrent approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, where most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome.ObjectiveDevelop a machine learning (ML) method predicting the diagnosis of ASD in offspring in a general population sample, using parental electronic medical records (EMR) available before childbirthDesignPrognostic study of EMR data within a single Israeli health maintenance organization, for the parents of 1,397 ASD children (ICD-9/10), and 94,741 non-ASD children born between January 1st, 1997 through December 31st, 2008. The complete EMR record of the parents was used to develop various ML models to predict the risk of having a child with ASD.Main outcomes and measuresRoutinely available parental sociodemographic information, medical histories and prescribed medications data until offspring’s birth were used to generate features to train various machine learning algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross validation, by computing C statistics, sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value, PPV).ResultsAll ML models tested had similar performance, achieving an average C statistics of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85% for predicting ASD in this dataset.Conclusion and relevanceML algorithms combined with EMR capture early life ASD risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.Key pointsQuestionCan autism risk in children be predicted using the pre-birth electronic medical record (EMR) of the parents?FindingsIn this population-based study that included 1,397 children with autism spectrum disorder (ASD) and 94,741 non-ASD children, we developed a machine learning classifier for predicting the likelihood of childhood diagnosis of ASD with an average C statistic of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85%.MeaningThe results presented serve as a proof-of-principle of the potential utility of EMR for the identification of a large proportion of future children at a high-risk of ASD.


PEDIATRICS ◽  
1982 ◽  
Vol 70 (3) ◽  
pp. 464-467 ◽  
Author(s):  
M. Jeffrey Maisels ◽  
Sarah Conrad

A total of 292 transcutaneous bilirubin (TcB) measurements were performed in 157 white full-term infants: 157 were obtained from the forehead and 135 from the midsternum. TcB measurements correlated well with serum bilirubin determinations (r = .93, P &lt; .0001). The sensitivity of the test was 100% and the specificity 97%. It was possible to establish guidelines for the TcB measurement which identified all infants whose serum bilirubin concentrations exceeded 12.9 mg/100 ml (221 µmoles/liter) with no false-negative and only five false-positive determinations (3%). The positive predictive value for the TcB measurements was 58%. This implies that, in our population, an infant with a TcB index ≥24 has a 58% chance of having a serum bilirubin concentration &gt;12.9 mg/100 ml. The negative predictive value was 100%. Thus, a negative test will correctly predict the absence of hyperbilirubinemia in all cases. As these measurements were obtained prospectively in a well-baby population with a prevalence of hyperbilirubinemia (&gt;12.9 mg/100 ml) of 4.5%, the positive predictive value should be applicable to other similar populations and will, in fact, increase in populations with a higher prevalence of hyperbilirubinemia. TcB measurements can be recommended for the identification of significant neonatal jaundice in full-term infants. It is important to recognize, however, that because of potential variations in TcB meters as well as serum bilirubin measurements in different laboratories, each institution should establish its own criteria for the use of this instrument.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e12583-e12583
Author(s):  
Jian Li ◽  
Cai Nian ◽  
Xie Ze-Ming ◽  
Zhou Jingwen ◽  
Huang Kemin

e12583 Background: To improve the performance of ultrasound (US) for diagnosing metastatic axillary lymph node (ALN), machine learning was used to reveal the inherently medical hints from ultrasonic images and assist pre-treatment evaluation of ALN for patients with early breast cancer. Methods: A total of 214 eligible patients with 220 breast lesions, from whom 220 target ALNs of ipsilateral axillae underwent ultrasound elastography (UE), were prospectively recruited. Based on feature extraction and fusion of B-mode and shear wave elastography (SWE) images of 140 target ALNs using radiomics and deep learning, with reference to the axillary pathological evaluation from training cohort, a proposed deep learning-based heterogeneous model (DLHM) was established and then validated by a collection of B-mode and SWE images of 80 target ALNs from testing cohort. Performance was compared between UE based on radiological criteria and DLHM in terms of areas under the receiver operating characteristics curve (AUC), sensitivity, specificity, accuracy, negative predictive value, and positive predictive value for diagnosing ALN metastasis. Results: DLHM achieved an excellent performance for both training and validation cohorts. In the prospectively testing cohort, DLHM demonstrated the best diagnostic performance with AUC of 0.911(95% confidence interval [CI]: 0.826, 0.963) in identifying metastatic ALN, which significantly outperformed UE in terms of AUC (0.707, 95% CI: 0.595, 0.804, P<0.001). Conclusions: DLHM provides an effective, accurate and non-invasive preoperative method for assisting the diagnosis of ALN metastasis in patients with early breast cancer.[Table: see text]


2020 ◽  
Author(s):  
Bei Zhang ◽  
Li Zhang ◽  
Bingyang Bian ◽  
Fang Lin ◽  
Zining Zhu ◽  
...  

Abstract BACKGROUND Whole body diffusion weighted imaging (WB-DWI) is commonly used for the detection of multiple myeloma (MM). Comparative data on the efficiency of WB-DWI compared with 18 F positron emission tomography computed tomography ( 18 F-FDG PET/CT) to detect MM are lacking. METHODS This was a retrospective, single-center study of twenty-two patients with MM enrolled from January 2019 to December 2019. All patients underwent WB-DWI and 18 F-FDG PET/CT. Pathological and clinical manifestations as well as radiologic follow-up were used for diagnosis. The overall accuracy, sensitivity, specificity, positive predictive value and negative predictive value of both methods were compared. The appearance diffusion coefficient (ADC) values of MM lesions and false-positive lesions were estimated. RESULTS A total of 214 MM bone lesions were evaluated. WB-DWI showed a higher overall accuracy than PET/CT (75.7% and 55.6%, respectively; < 0.05). However, for sensitivity, specificity, positive predictive value and negative predictive value, there were no significant differences for WB-DWI vs PET/CT (99.3% and 83.9%, 64.9% and 94.8%, 63.6% and 54.2%, 98.1% and 65.3%, respectively). The ADC value for MM lesions was significantly lower than that for false-positive lesions (p < 0.001). Receiver operating curve (ROC) curve analysis showed that the AUC was 0.846, and when the cut-off value was 0.745×10 -3 mm 2 /s, the sensitivity and specificity were 86.0% and 82.4%, respectively, which distinguished MM lesions from non-MM lesions. CONCLUSION WB-DWI may be a useful tool for the diagnosis of MM bone disease due to to higher overall accuracy and measurements of ADC values compared with PET/CT.


Sign in / Sign up

Export Citation Format

Share Document