scholarly journals Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Yiqiu Shen ◽  
Farah E. Shamout ◽  
Jamie R. Oliver ◽  
Jan Witowski ◽  
Kawshik Kannan ◽  
...  

AbstractThough consistently shown to detect mammographically occult cancers, breast ultrasound has been noted to have high false-positive rates. In this work, we present an AI system that achieves radiologist-level accuracy in identifying breast cancer in ultrasound images. Developed on 288,767 exams, consisting of 5,442,907 B-mode and Color Doppler images, the AI achieves an area under the receiver operating characteristic curve (AUROC) of 0.976 on a test set consisting of 44,755 exams. In a retrospective reader study, the AI achieves a higher AUROC than the average of ten board-certified breast radiologists (AUROC: 0.962 AI, 0.924 ± 0.02 radiologists). With the help of the AI, radiologists decrease their false positive rates by 37.3% and reduce requested biopsies by 27.8%, while maintaining the same level of sensitivity. This highlights the potential of AI in improving the accuracy, consistency, and efficiency of breast ultrasound diagnosis.

2021 ◽  
Author(s):  
Yiqiu Shen ◽  
Farah E. Shamout ◽  
Jamie R. Oliver ◽  
Jan Witowski ◽  
Kawshik Kannan ◽  
...  

AbstractUltrasound is an important imaging modality for the detection and characterization of breast cancer. Though consistently shown to detect mammographically occult cancers, especially in women with dense breasts, breast ultrasound has been noted to have high false-positive rates. In this work, we present an artificial intelligence (AI) system that achieves radiologist-level accuracy in identifying breast cancer in ultrasound images. To develop and validate this system, we curated a dataset consisting of 288,767 ultrasound exams from 143,203 patients examined at NYU Langone Health, between 2012 and 2019. On a test set consisting of 44,755 exams, the AI system achieved an area under the receiver operating characteristic curve (AUROC) of 0.976. In a reader study, the AI system achieved a higher AUROC than the average of ten board-certified breast radiologists (AUROC: 0.962 AI, 0.924±0.02 radiologists). With the help of the AI, radiologists decreased their false positive rates by 37.4% and reduced the number of requested biopsies by 27.8%, while maintaining the same level of sensitivity. To confirm its generalizability, we evaluated our system on an independent external test dataset where it achieved an AUROC of 0.911. This highlights the potential of AI in improving the accuracy, consistency, and efficiency of breast ultrasound diagnosis worldwide.


2014 ◽  
Vol 99 (12) ◽  
pp. 4589-4599 ◽  
Author(s):  
Carole Spencer ◽  
Ivana Petrovic ◽  
Shireen Fatemi ◽  
Jonathan LoPresti

Context: Reliable thyroglobulin (Tg) autoantibody (TgAb) detection before Tg testing for differentiated thyroid cancer (DTC) is critical when TgAb status (positive/negative) is used to authenticate sensitive second-generation immunometric assay (2GIMA) measurements as free from TgAb interference and when reflexing “TgAb-positive” sera to TgAb-resistant, but less sensitive, Tg methodologies (radioimmunoassay [RIA] or liquid chromatography-tandem mass spectrometry [LC-MS/MS]). Objective: The purpose of this study was to assess how different Kronus (K) vs Roche (R) TgAb method cutoffs for “positivity” influence false-negative vs false-positive serum TgAb misclassifications that may reduce the clinical utility of reflex Tg testing. Methods: Serum Tg2GIMA, TgRIA, and TgLC-MS/MS measurements for 52 TgAb-positive and 37 TgAb-negative patients with persistent/recurrent DTC were compared. A total of 1426 DTC sera with TgRIA of ≥1.0 μg/L had false-negative and false-positive TgAb frequencies determined using low Tg2GIMA/TgRIA ratios (<75%) to indicate TgAb interference. Results: TgAb-negative patients with disease displayed Tg2GIMA, TgRIA, and TgLC-MS/MS serum discordances (% coefficient of variation = 24 ± 20%, range, 0%–100%). Of the TgAb-positive patients with disease, 98% had undetectable/lower Tg2GIMA vs either TgRIA or TgLC-MS/MS (P < .01), whereas 8 of 52 (15%) had undetectable Tg2GIMA + TgLC-MS/MS associated with TgRIA of ≥1.0 μg/L. Receiver operating characteristic curve analysis reported more sensitivity for TgAb method K vs R (81.9% vs 69.1%, P < .001), but receiver operating characteristic curve cutoffs (>0.6 kIU/L [K] vs >40 kIU/L [R]) had unacceptably high false-negative frequencies (22%–32%), whereas false positives approximated 12%. Functional sensitivity cutoffs minimized false negatives (13.5% [K] vs 21.3% [R], P < .01) and severe interferences (Tg2GIMA, <0.10 μg/L) (0.7% [K] vs 2.4% [R], P < .05) but false positives approximated 23%. Conclusions: Reliable detection of interfering TgAbs is method and cutoff dependent. No cutoff eliminated both false-negative and false-positive TgAb misclassifications. Functional sensitivity cutoffs were optimal for minimizing false negatives but have inherent imprecision (20% coefficient of variation) that, exacerbated by TgAb biologic variability during DTC monitoring, could cause TgAb status to fluctuate for patients with low TgAb concentrations, prompting unnecessary Tg method changes and disrupting Tg monitoring. Laboratories using reflexing should limit Tg method changes by considering a patient's Tg + TgAb testing history in addition to current TgAb status before Tg method selection.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Hui Huang ◽  
Li Deng ◽  
Liping Jia ◽  
Runan Zhu

Abstract Background The aim of the present study was to develop a clinical scoring system for the diagnosis of hand-foot-mouth disease (HFMD) with improved accuracy. Methods A retrospective analysis was performed on standardized patient history and clinical examination data obtained from 1435 pediatric patients under the age of three years who presented with acute rash illness and underwent enterovirus nucleic acid detection. Patients were then divided into the HFMD (1094 patients) group or non-HFMD (341 patients) group based on a positive or a negative result from the assay, respectively. We then divided the data into a training set (1004 cases, 70%) and a test set (431 cases, 30%) using a random number method. Multivariate logistic regression was performed on 15 clinical variables (e.g. age, exposure history, number of rash spots in a single body region) to identify variables highly predictive of a positive diagnosis in the training set. Using the variables with high impact on the diagnostic accuracy, we generated a scoring system for predicting HFMD and subsequently evaluated this system in the test set by receiver operating characteristic curve (ROC curve). Results Using the logistic model, we identified seven clinical variables (age, exposure history, and rash density at specific regions of the body) to be included into the scoring system. The final scores ranged from − 5 to 24 (higher scores positively predicted HFMD diagnosis). Through our training set, a cutoff score of 7 resulted in a sensitivity of 0.76 and specificity of 0.68. The area under the receiver operating characteristic curve (AUC) was 0.804 (95% confidence interval [CI]: 0.773–0.835) (P < 0.001). Using the test set, we obtained an AUC of 0.76 (95% CI: 0.710–0.810) with a sensitivity of 0.76 and a specificity of 0.62. These results from the test set were consistent with those from the training set. Conclusions This study establishes an objective scoring system for the diagnosis of typical and atypical HFMD using measures accessible through routine clinical encounters. Due to the accuracy and sensitivity achieved by this scoring system, it can be employed as a rapid, low-cost method for establishing diagnoses in children with acute rash illness.


Stroke ◽  
2019 ◽  
Vol 50 (4) ◽  
pp. 909-916 ◽  
Author(s):  
Manuel Cappellari ◽  
Salvatore Mangiafico ◽  
Valentina Saia ◽  
Giovanni Pracucci ◽  
Sergio Nappini ◽  
...  

Background and Purpose— As a reliable scoring system to detect the risk of symptomatic intracerebral hemorrhage after thrombectomy for ischemic stroke is not yet available, we developed a nomogram for predicting symptomatic intracerebral hemorrhage in patients with large vessel occlusion in the anterior circulation who received bridging of thrombectomy with intravenous thrombolysis (training set), and to validate the model by using a cohort of patients treated with direct thrombectomy (test set). Methods— We conducted a cohort study on prospectively collected data from 3714 patients enrolled in the IER (Italian Registry of Endovascular Stroke Treatment in Acute Stroke). Symptomatic intracerebral hemorrhage was defined as any type of intracerebral hemorrhage with increase of ≥4 National Institutes of Health Stroke Scale score points from baseline ≤24 hours or death. Based on multivariate logistic models, the nomogram was generated. We assessed the discriminative performance by using the area under the receiver operating characteristic curve. Results— National Institutes of Health Stroke Scale score, onset-to-end procedure time, age, unsuccessful recanalization, and Careggi collateral score composed the IER-SICH nomogram. After removing Careggi collateral score from the first model, a second model including Alberta Stroke Program Early CT Score was developed. The area under the receiver operating characteristic curve of the IER-SICH nomogram was 0.778 in the training set (n=492) and 0.709 in the test set (n=399). The area under the receiver operating characteristic curve of the second model was 0.733 in the training set (n=988) and 0.685 in the test set (n=779). Conclusions— The IER-SICH nomogram is the first model developed and validated for predicting symptomatic intracerebral hemorrhage after thrombectomy. It may provide indications on early identification of patients for more or less postprocedural intensive management.


2017 ◽  
Vol 107 (6) ◽  
pp. 721-731 ◽  
Author(s):  
Daniel D. M. Bassimba ◽  
Jose L. Mira ◽  
Antonio Vicent

Alternaria brown spot (ABS) is a serious fungal disease of mandarin in the Mediterranean Basin. Due to the rigorous fruit quality standards, models for ABS should avoid false negatives. Experiments were conducted with susceptible ‘Fortune’ and ‘Nova’ inoculated at different temperatures and leaf wetness durations, including interrupted periods. Effects of temperature and time elapsed after inoculation were also studied. Disease incidence data were fitted to generalized additive models and a generic infection model. Exposure of trap plants in affected orchards was used for model evaluation, including the Alter-Rater and a simple rule system (SRS). The predictive ability of the models was analyzed using the partial area under the receiver operating characteristic curve in the high-sensitivity range between 0.9 and 1. Postinoculation temperature had a significant effect on disease incidence, with maximum symptom expression after 30 h on Fortune and 60 h on Nova. ABS incidence did not increase after a leaf wetness interruption of 1 h on Nova and 2 h on Fortune. All the models evaluated had high false-positive rates on Fortune. Only the SRS showed a substantial strength of agreement in Nova, with a true-positive rate of 0.93 and false-positive rate of 0.16.


MicroRNA ◽  
2018 ◽  
Vol 8 (1) ◽  
pp. 86-92 ◽  
Author(s):  
Shili Jiang ◽  
Wei Jiang ◽  
Ying Xu ◽  
Xiaoning Wang ◽  
Yongping Mu ◽  
...  

Background and Objective: Accurately evaluating the severity of liver cirrhosis is essential for clinical decision making and disease management. This study aimed to evaluate the value of circulating levels of microRNA (miR)-26a and miR-21 as novel noninvasive biomarkers in detecting severity of cirrhosis in patients with chronic hepatitis B. </P><P> Methods: Thirty patients with clinically diagnosed chronic hepatitis B-related cirrhosis and 30 healthy individuals were selected. The serum levels of miR-26a and miR-21 were quantified by qRT-PCR. Receiver operating characteristic curve analysis was performed to evaluate the sensitivity and specificity of the miRNAs for detecting the severity of cirrhosis. Results: Serum miR-26a and miR-21 levels were found to be significantly downregulated in patients with severe cirrhosis scored at Child-Pugh class C in comparison to healthy controls (miR-26a p<0.01, and miR-21 p<0.001, respectively). The circulating miR-26a and miR-21 levels in patients were positively correlated with serum albumin concentration but negatively correlated with serum total bilirubin concentration and prothrombin time. Receiver operating characteristic curve analysis revealed that both serum miR-26a and miR-21 levels were associated with a high diagnostic accuracy for patients with cirrhosis scored at Child-Pugh class C (miR-26a Cut-off fold change at ≤0.4, Sensitivity: 84.62%, Specificity: 89.36%, P<0.0001; miR-21 Cut-off fold change at ≤0.6, Sensitivity: 84.62%, Specificity: 78.72%, P<0.0001). Our results indicate that the circulating levels of miR-26a and miR-21 are closely related to the extent of liver decompensation, and the decreased levels are capable of discriminating patients with cirrhosis at Child-Pugh class C from the whole cirrhosis cases.


2019 ◽  
Vol 30 (7-8) ◽  
pp. 221-228
Author(s):  
Shahab Hajibandeh ◽  
Shahin Hajibandeh ◽  
Nicholas Hobbs ◽  
Jigar Shah ◽  
Matthew Harris ◽  
...  

Aims To investigate whether an intraperitoneal contamination index (ICI) derived from combined preoperative levels of C-reactive protein, lactate, neutrophils, lymphocytes and albumin could predict the extent of intraperitoneal contamination in patients with acute abdominal pathology. Methods Patients aged over 18 who underwent emergency laparotomy for acute abdominal pathology between January 2014 and October 2018 were randomly divided into primary and validation cohorts. The proposed intraperitoneal contamination index was calculated for each patient in each cohort. Receiver operating characteristic curve analysis was performed to determine discrimination of the index and cut-off values of preoperative intraperitoneal contamination index that could predict the extent of intraperitoneal contamination. Results Overall, 468 patients were included in this study; 234 in the primary cohort and 234 in the validation cohort. The analyses identified intraperitoneal contamination index of 24.77 and 24.32 as cut-off values for purulent contamination in the primary cohort (area under the curve (AUC): 0.73, P < 0.0001; sensitivity: 84%, specificity: 60%) and validation cohort (AUC: 0.83, P < 0.0001; sensitivity: 91%, specificity: 69%), respectively. Receiver operating characteristic curve analysis also identified intraperitoneal contamination index of 33.70 and 33.41 as cut-off values for feculent contamination in the primary cohort (AUC: 0.78, P < 0.0001; sensitivity: 87%, specificity: 64%) and validation cohort (AUC: 0.79, P < 0.0001; sensitivity: 86%, specificity: 73%), respectively. Conclusions As a predictive measure which is derived purely from biomarkers, intraperitoneal contamination index may be accurate enough to predict the extent of intraperitoneal contamination in patients with acute abdominal pathology and to facilitate decision-making together with clinical and radiological findings.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yang Mi ◽  
Pengfei Qu ◽  
Na Guo ◽  
Ruimiao Bai ◽  
Jiayi Gao ◽  
...  

Abstract Background For most women who have had a previous cesarean section, vaginal birth after cesarean section (VBAC) is a reasonable and safe choice, but which will increase the risk of adverse outcomes such as uterine rupture. In order to reduce the risk, we evaluated the factors that may affect VBAC and and established a model for predicting the success rate of trial of the labor after cesarean section (TOLAC). Methods All patients who gave birth at Northwest Women’s and Children’s Hospital from January 2016 to December 2018, had a history of cesarean section and voluntarily chose the TOLAC were recruited. Among them, 80% of the population was randomly assigned to the training set, while the remaining 20% were assigned to the external validation set. In the training set, univariate and multivariate logistic regression models were used to identify indicators related to successful TOLAC. A nomogram was constructed based on the results of multiple logistic regression analysis, and the selected variables included in the nomogram were used to predict the probability of successfully obtaining TOLAC. The area under the receiver operating characteristic curve was used to judge the predictive ability of the model. Results A total of 778 pregnant women were included in this study. Among them, 595 (76.48%) successfully underwent TOLAC, whereas 183 (23.52%) failed and switched to cesarean section. In multi-factor logistic regression, parity = 1, pre-pregnancy BMI < 24 kg/m2, cervical score ≥ 5, a history of previous vaginal delivery and neonatal birthweight < 3300 g were associated with the success of TOLAC. The area under the receiver operating characteristic curve in the prediction and validation models was 0.815 (95% CI: 0.762–0.854) and 0.730 (95% CI: 0.652–0.808), respectively, indicating that the nomogram prediction model had medium discriminative power. Conclusion The TOLAC was useful to reducing the cesarean section rate. Being primiparous, not overweight or obese, having a cervical score ≥ 5, a history of previous vaginal delivery or neonatal birthweight < 3300 g were protective indicators. In this study, the validated model had an approving predictive ability.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
F Kahles ◽  
R.W Mertens ◽  
M.V Rueckbeil ◽  
M.C Arrivas ◽  
J Moellmann ◽  
...  

Abstract Background GLP-1 and GLP-2 (glucagon-like peptide-1/2) are gut derived hormones that are co-secreted from intestinal L-cells in response to food intake. While GLP-1 is known to induce postprandial insulin secretion, GLP-2 enhances intestinal nutrient absorption and is clinically used for the treatment of patients with short bowel syndrome. The relevance of the GLP-2 system for cardiovascular disease is unknown. Purpose The aim of this study was to assess the predictive capacity of GLP-2 for cardiovascular prognosis in patients with myocardial infarction. Methods Total GLP-2 levels, NT-proBNP concentrations and the Global Registry of Acute Coronary Events (GRACE) score were assessed at time of admission in 918 patients with myocardial infarction, among them 597 patients with NSTEMI and 321 with STEMI. The primary composite outcome of the study was the first occurrence of cardiovascular death, nonfatal myocardial infarction, or nonfatal stroke (3-P-MACE) with a median follow-up of 311 days. Results Kaplan-Meier survival plots (separated by the median of GLP-2 with a cut-off value of 4.4 ng/mL) and univariable cox regression analyses found GLP-2 values to be associated with adverse outcome (logarithmized GLP-2 values HR: 2.87; 95% CI: 1.75–4.68; p&lt;0.0001). Further adjustment for age, sex, smoking, hypertension, hypercholesterolemia, diabetes mellitus, family history of cardiovascular disease, hs-Troponin T, NT-proBNP and hs-CRP levels did not affect the association of GLP-2 with poor prognosis (logarithmized GLP-2 values HR: 2.96; 95% CI: 1.38–6.34; p=0.0053). Receiver operating characteristic curve (ROC) analyses illustrated that GLP-2 is a strong indicator for cardiovascular events and proved to be comparable to other established risk markers (area under the curve of the combined endpoint at 6 months; GLP-2: 0.72; hs-Troponin: 0.56; NT-proBNP: 0.70; hs-CRP: 0.62). Adjustment of the GRACE risk estimate by GLP-2 increased the area under the receiver-operating characteristic curve for the combined triple endpoint after 6 months from 0.70 (GRACE) to 0.75 (GRACE + GLP-2) in NSTEMI patients. Addition of GLP-2 to a model containing GRACE and NT-proBNP led to a further improvement in model performance (increase in AUC from 0.72 for GRACE + NT-proBNP to 0.77 for GRACE + NT-proBNP + GLP-2). Conclusions In patients admitted with acute myocardial infarction, GLP-2 levels are associated with adverse cardiovascular prognosis. This demonstrates a strong yet not appreciated crosstalk between the heart and the gut with relevance for cardiovascular outcome. Future studies are needed to further explore this crosstalk with the possibility of new treatment avenues for cardiovascular disease. Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): German Society of Cardiology (DGK), German Research Foundation (DFG)


Sign in / Sign up

Export Citation Format

Share Document