scholarly journals Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems

2021 ◽  
Vol 3 ◽  
Author(s):  
Usman Mahmood ◽  
Robik Shrestha ◽  
David D. B. Bates ◽  
Lorenzo Mannelli ◽  
Giuseppe Corrias ◽  
...  

Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.

2019 ◽  
Author(s):  
Jennifer Goldsack ◽  
Andrea Coravos ◽  
Jessie Bakker ◽  
Brinnae Bent ◽  
Ariel V. Dowling ◽  
...  

UNSTRUCTURED Digital medicine is an interdisciplinary field, drawing together stakeholders with expertise in engineering, manufacturing, clinical science, data science, biostatistics, regulatory considerations, ethics, patient advocacy, and healthcare policy, to name a few. While this diversity is undoubtedly valuable, it can lead to confusion regarding terminology and best practices. There are many instances, as we detail in this paper, where a single term is used by different groups to mean different things, as well as cases where multiple terms are used to describe essentially the same concept. Our intent is to clarify core terminology and best practices for the evaluation of Biometric Monitoring Technologies (BioMeTs), without unnecessarily introducing new terms. We propose and describe a three-component framework intended to provide a foundational evaluation framework for BioMeTs. This framework includes 1) verification, 2) analytical validation, and 3) clinical validation. We aim for this common vocabulary to enable more effective communication and collaboration, generate a common and meaningful evidence base for BioMeTs, and improve the accessibility of the digital medicine field.


BMJ Open ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. e040778
Author(s):  
Vineet Kumar Kamal ◽  
Ravindra Mohan Pandey ◽  
Deepak Agrawal

ObjectiveTo develop and validate a simple risk scores chart to estimate the probability of poor outcomes in patients with severe head injury (HI).DesignRetrospective.SettingLevel-1, government-funded trauma centre, India.ParticipantsPatients with severe HI admitted to the neurosurgery intensive care unit during 19 May 2010–31 December 2011 (n=946) for the model development and further, data from same centre with same inclusion criteria from 1 January 2012 to 31 July 2012 (n=284) for the external validation of the model.Outcome(s)In-hospital mortality and unfavourable outcome at 6 months.ResultsA total of 39.5% and 70.7% had in-hospital mortality and unfavourable outcome, respectively, in the development data set. The multivariable logistic regression analysis of routinely collected admission characteristics revealed that for in-hospital mortality, age (51–60, >60 years), motor score (1, 2, 4), pupillary reactivity (none), presence of hypotension, basal cistern effaced, traumatic subarachnoid haemorrhage/intraventricular haematoma and for unfavourable outcome, age (41–50, 51–60, >60 years), motor score (1–4), pupillary reactivity (none, one), unequal limb movement, presence of hypotension were the independent predictors as its 95% confidence interval (CI) of odds ratio (OR)_did not contain one. The discriminative ability (area under the receiver operating characteristic curve (95% CI)) of the score chart for in-hospital mortality and 6 months outcome was excellent in the development data set (0.890 (0.867 to 912) and 0.894 (0.869 to 0.918), respectively), internal validation data set using bootstrap resampling method (0.889 (0.867 to 909) and 0.893 (0.867 to 0.915), respectively) and external validation data set (0.871 (0.825 to 916) and 0.887 (0.842 to 0.932), respectively). Calibration showed good agreement between observed outcome rates and predicted risks in development and external validation data set (p>0.05).ConclusionFor clinical decision making, we can use of these score charts in predicting outcomes in new patients with severe HI in India and similar settings.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Chi-Tung Cheng ◽  
Chih-Chi Chen ◽  
Chih-Yuan Fu ◽  
Chung-Hsien Chaou ◽  
Yu-Tung Wu ◽  
...  

Abstract Background With recent transformations in medical education, the integration of technology to improve medical students’ abilities has become feasible. Artificial intelligence (AI) has impacted several aspects of healthcare. However, few studies have focused on medical education. We performed an AI-assisted education study and confirmed that AI can accelerate trainees’ medical image learning. Materials We developed an AI-based medical image learning system to highlight hip fracture on a plain pelvic film. Thirty medical students were divided into a conventional (CL) group and an AI-assisted learning (AIL) group. In the CL group, the participants received a prelearning test and a postlearning test. In the AIL group, the participants received another test with AI-assisted education before the postlearning test. Then, we analyzed changes in diagnostic accuracy. Results The prelearning performance was comparable in both groups. In the CL group, postlearning accuracy (78.66 ± 14.53) was higher than prelearning accuracy (75.86 ± 11.36) with no significant difference (p = .264). The AIL group showed remarkable improvement. The WithAI score (88.87 ± 5.51) was significantly higher than the prelearning score (75.73 ± 10.58, p < 0.01). Moreover, the postlearning score (84.93 ± 14.53) was better than the prelearning score (p < 0.01). The increase in accuracy was significantly higher in the AIL group than in the CL group. Conclusion The study demonstrated the viability of AI for augmenting medical education. Integrating AI into medical education requires dynamic collaboration from research, clinical, and educational perspectives.


2021 ◽  
pp. 1-10
Author(s):  
Fen Zhang ◽  
Min She

English reading learning in college education is an efficient means of English learning. However, most of the current English reading learning platforms in colleges and universities only put different English books on the platform in electronic form for students to read, which leads to blindness of reading. Based on artificial intelligence algorithms, this paper builds model function modules according to the needs of English reading and learning management in college education and implements system functions based on artificial intelligence algorithms. Moreover, according to the above design principles of personalized learning model and the characteristics of personalized network learning, this paper designs a personalized learning system based on meaningful learning theory. In addition, this article verifies and analyzes the model performance. The research results show that the model proposed in this paper has a certain effect.


Endoscopy ◽  
2020 ◽  
Author(s):  
Alanna Ebigbo ◽  
Robert Mendel ◽  
Tobias Rückert ◽  
Laurin Schuster ◽  
Andreas Probst ◽  
...  

Background and aims: The accurate differentiation between T1a and T1b Barrett’s cancer has both therapeutic and prognostic implications but is challenging even for experienced physicians. We trained an Artificial Intelligence (AI) system on the basis of deep artificial neural networks (deep learning) to differentiate between T1a and T1b Barrett’s cancer white-light images. Methods: Endoscopic images from three tertiary care centres in Germany were collected retrospectively. A deep learning system was trained and tested using the principles of cross-validation. A total of 230 white-light endoscopic images (108 T1a and 122 T1b) was evaluated with the AI-system. For comparison, the images were also classified by experts specialized in endoscopic diagnosis and treatment of Barrett’s cancer. Results: The sensitivity, specificity, F1 and accuracy of the AI-system in the differentiation between T1a and T1b cancer lesions was 0.77, 0.64, 0.73 and 0.71, respectively. There was no statistically significant difference between the performance of the AI-system and that of human experts with sensitivity, specificity, F1 and accuracy of 0.63, 0.78, 0.67 and 0.70 respectively. Conclusion: This pilot study demonstrates the first multicenter application of an AI-based system in the prediction of submucosal invasion in endoscopic images of Barrett’s cancer. AI scored equal to international experts in the field, but more work is necessary to improve the system and apply it to video sequences and in a real-life setting. Nevertheless, the correct prediction of submucosal invasion in Barret´s cancer remains challenging for both experts and AI.


Diagnostics ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 330
Author(s):  
Mio Adachi ◽  
Tomoyuki Fujioka ◽  
Mio Mori ◽  
Kazunori Kubota ◽  
Yuka Kikuchi ◽  
...  

We aimed to evaluate an artificial intelligence (AI) system that can detect and diagnose lesions of maximum intensity projection (MIP) in dynamic contrast-enhanced (DCE) breast magnetic resonance imaging (MRI). We retrospectively gathered MIPs of DCE breast MRI for training and validation data from 30 and 7 normal individuals, 49 and 20 benign cases, and 135 and 45 malignant cases, respectively. Breast lesions were indicated with a bounding box and labeled as benign or malignant by a radiologist, while the AI system was trained to detect and calculate possibilities of malignancy using RetinaNet. The AI system was analyzed using test sets of 13 normal, 20 benign, and 52 malignant cases. Four human readers also scored these test data with and without the assistance of the AI system for the possibility of a malignancy in each breast. Sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were 0.926, 0.828, and 0.925 for the AI system; 0.847, 0.841, and 0.884 for human readers without AI; and 0.889, 0.823, and 0.899 for human readers with AI using a cutoff value of 2%, respectively. The AI system showed better diagnostic performance compared to the human readers (p = 0.002), and because of the increased performance of human readers with the assistance of the AI system, the AUC of human readers was significantly higher with than without the AI system (p = 0.039). Our AI system showed a high performance ability in detecting and diagnosing lesions in MIPs of DCE breast MRI and increased the diagnostic performance of human readers.


AI Magazine ◽  
2013 ◽  
Vol 34 (3) ◽  
pp. 93-98 ◽  
Author(s):  
Vita Markman ◽  
Georgi Stojanov ◽  
Bipin Indurkhya ◽  
Takashi Kido ◽  
Keiki Takadama ◽  
...  

The Association for the Advancement of Artificial Intelligence was pleased to present the AAAI 2013 Spring Symposium Series, held Monday through Wednesday, March 25-27, 2013. The titles of the eight symposia were Analyzing Microtext, Creativity and (Early) Cognitive Development, Data Driven Wellness: From Self-Tracking to Behavior Change, Designing Intelligent Robots: Reintegrating AI II, Lifelong Machine Learning, Shikakeology: Designing Triggers for Behavior Change, Trust and Autonomous Systems, and Weakly Supervised Learning from Multimedia. This report contains summaries of the symposia, written, in most cases, by the cochairs of the symposium.


Sign in / Sign up

Export Citation Format

Share Document