Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems

Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.

Download Full-text

Verification, Analytical Validation, and Clinical Validation (V3): The Foundation of Determining Fit-for-Purpose for Biometric Monitoring Technologies (BioMeTs) (Preprint)

10.2196/preprints.17264 ◽

2019 ◽

Author(s):

Jennifer Goldsack ◽

Andrea Coravos ◽

Jessie Bakker ◽

Brinnae Bent ◽

Ariel V. Dowling ◽

...

Keyword(s):

Best Practices ◽

Data Science ◽

Evidence Base ◽

Evaluation Framework ◽

Clinical Validation ◽

Analytical Validation ◽

Science Data ◽

Interdisciplinary Field ◽

Digital Medicine ◽

Monitoring Technologies

UNSTRUCTURED Digital medicine is an interdisciplinary field, drawing together stakeholders with expertise in engineering, manufacturing, clinical science, data science, biostatistics, regulatory considerations, ethics, patient advocacy, and healthcare policy, to name a few. While this diversity is undoubtedly valuable, it can lead to confusion regarding terminology and best practices. There are many instances, as we detail in this paper, where a single term is used by different groups to mean different things, as well as cases where multiple terms are used to describe essentially the same concept. Our intent is to clarify core terminology and best practices for the evaluation of Biometric Monitoring Technologies (BioMeTs), without unnecessarily introducing new terms. We propose and describe a three-component framework intended to provide a foundational evaluation framework for BioMeTs. This framework includes 1) verification, 2) analytical validation, and 3) clinical validation. We aim for this common vocabulary to enable more effective communication and collaboration, generate a common and meaningful evidence base for BioMeTs, and improve the accessibility of the digital medicine field.

Download Full-text

Development and temporal external validation of a simple risk score tool for prediction of outcomes after severe head injury based on admission characteristics from level-1 trauma centre of India using retrospectively collected data

BMJ Open ◽

10.1136/bmjopen-2020-040778 ◽

2021 ◽

Vol 11 (1) ◽

pp. e040778

Author(s):

Vineet Kumar Kamal ◽

Ravindra Mohan Pandey ◽

Deepak Agrawal

Keyword(s):

Hospital Mortality ◽

External Validation ◽

Trauma Centre ◽

Unfavourable Outcome ◽

Motor Score ◽

Validation Data ◽

Data Set ◽

Development Data ◽

Level 1 ◽

Pupillary Reactivity

ObjectiveTo develop and validate a simple risk scores chart to estimate the probability of poor outcomes in patients with severe head injury (HI).DesignRetrospective.SettingLevel-1, government-funded trauma centre, India.ParticipantsPatients with severe HI admitted to the neurosurgery intensive care unit during 19 May 2010–31 December 2011 (n=946) for the model development and further, data from same centre with same inclusion criteria from 1 January 2012 to 31 July 2012 (n=284) for the external validation of the model.Outcome(s)In-hospital mortality and unfavourable outcome at 6 months.ResultsA total of 39.5% and 70.7% had in-hospital mortality and unfavourable outcome, respectively, in the development data set. The multivariable logistic regression analysis of routinely collected admission characteristics revealed that for in-hospital mortality, age (51–60, >60 years), motor score (1, 2, 4), pupillary reactivity (none), presence of hypotension, basal cistern effaced, traumatic subarachnoid haemorrhage/intraventricular haematoma and for unfavourable outcome, age (41–50, 51–60, >60 years), motor score (1–4), pupillary reactivity (none, one), unequal limb movement, presence of hypotension were the independent predictors as its 95% confidence interval (CI) of odds ratio (OR)_did not contain one. The discriminative ability (area under the receiver operating characteristic curve (95% CI)) of the score chart for in-hospital mortality and 6 months outcome was excellent in the development data set (0.890 (0.867 to 912) and 0.894 (0.869 to 0.918), respectively), internal validation data set using bootstrap resampling method (0.889 (0.867 to 909) and 0.893 (0.867 to 0.915), respectively) and external validation data set (0.871 (0.825 to 916) and 0.887 (0.842 to 0.932), respectively). Calibration showed good agreement between observed outcome rates and predicted risks in development and external validation data set (p>0.05).ConclusionFor clinical decision making, we can use of these score charts in predicting outcomes in new patients with severe HI in India and similar settings.

Download Full-text

Artificial intelligence-based education assists medical students’ interpretation of hip fracture

Insights into Imaging ◽

10.1186/s13244-020-00932-0 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Chi-Tung Cheng ◽

Chih-Chi Chen ◽

Chih-Yuan Fu ◽

Chung-Hsien Chaou ◽

Yu-Tung Wu ◽

...

Keyword(s):

Artificial Intelligence ◽

Medical Education ◽

Hip Fracture ◽

Medical Students ◽

Medical Image ◽

Learning System ◽

Significant Difference ◽

Dynamic Collaboration ◽

Increase In Accuracy ◽

Image Learning

Abstract Background With recent transformations in medical education, the integration of technology to improve medical students’ abilities has become feasible. Artificial intelligence (AI) has impacted several aspects of healthcare. However, few studies have focused on medical education. We performed an AI-assisted education study and confirmed that AI can accelerate trainees’ medical image learning. Materials We developed an AI-based medical image learning system to highlight hip fracture on a plain pelvic film. Thirty medical students were divided into a conventional (CL) group and an AI-assisted learning (AIL) group. In the CL group, the participants received a prelearning test and a postlearning test. In the AIL group, the participants received another test with AI-assisted education before the postlearning test. Then, we analyzed changes in diagnostic accuracy. Results The prelearning performance was comparable in both groups. In the CL group, postlearning accuracy (78.66 ± 14.53) was higher than prelearning accuracy (75.86 ± 11.36) with no significant difference (p = .264). The AIL group showed remarkable improvement. The WithAI score (88.87 ± 5.51) was significantly higher than the prelearning score (75.73 ± 10.58, p < 0.01). Moreover, the postlearning score (84.93 ± 14.53) was better than the prelearning score (p < 0.01). The increase in accuracy was significantly higher in the AIL group than in the CL group. Conclusion The study demonstrated the viability of AI for augmenting medical education. Integrating AI into medical education requires dynamic collaboration from research, clinical, and educational perspectives.

Download Full-text

Design of English reading and learning management system in college education based on artificial intelligence

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219125 ◽

2021 ◽

pp. 1-10

Author(s):

Fen Zhang ◽

Min She

Keyword(s):

Artificial Intelligence ◽

Model Performance ◽

College Education ◽

Personalized Learning ◽

Learning System ◽

Model Function ◽

English Reading ◽

Function Modules ◽

Learning Platforms ◽

Learning Management

English reading learning in college education is an efficient means of English learning. However, most of the current English reading learning platforms in colleges and universities only put different English books on the platform in electronic form for students to read, which leads to blindness of reading. Based on artificial intelligence algorithms, this paper builds model function modules according to the needs of English reading and learning management in college education and implements system functions based on artificial intelligence algorithms. Moreover, according to the above design principles of personalized learning model and the characteristics of personalized network learning, this paper designs a personalized learning system based on meaningful learning theory. In addition, this article verifies and analyzes the model performance. The research results show that the model proposed in this paper has a certain effect.

Download Full-text

Endoscopic prediction of submucosal invasion in Barrett’s cancer with the use of Artificial Intelligence: A pilot Study

Endoscopy ◽

10.1055/a-1311-8570 ◽

2020 ◽

Author(s):

Alanna Ebigbo ◽

Robert Mendel ◽

Tobias Rückert ◽

Laurin Schuster ◽

Andreas Probst ◽

...

Keyword(s):

Artificial Intelligence ◽

Deep Learning ◽

Pilot Study ◽

White Light ◽

Submucosal Invasion ◽

Learning System ◽

Endoscopic Images ◽

Barrett’S Cancer ◽

Significant Difference ◽

Sensitivity Specificity

Background and aims: The accurate differentiation between T1a and T1b Barrett’s cancer has both therapeutic and prognostic implications but is challenging even for experienced physicians. We trained an Artificial Intelligence (AI) system on the basis of deep artificial neural networks (deep learning) to differentiate between T1a and T1b Barrett’s cancer white-light images. Methods: Endoscopic images from three tertiary care centres in Germany were collected retrospectively. A deep learning system was trained and tested using the principles of cross-validation. A total of 230 white-light endoscopic images (108 T1a and 122 T1b) was evaluated with the AI-system. For comparison, the images were also classified by experts specialized in endoscopic diagnosis and treatment of Barrett’s cancer. Results: The sensitivity, specificity, F1 and accuracy of the AI-system in the differentiation between T1a and T1b cancer lesions was 0.77, 0.64, 0.73 and 0.71, respectively. There was no statistically significant difference between the performance of the AI-system and that of human experts with sensitivity, specificity, F1 and accuracy of 0.63, 0.78, 0.67 and 0.70 respectively. Conclusion: This pilot study demonstrates the first multicenter application of an AI-based system in the prediction of submucosal invasion in endoscopic images of Barrett’s cancer. AI scored equal to international experts in the field, but more work is necessary to improve the system and apply it to video sequences and in a real-life setting. Nevertheless, the correct prediction of submucosal invasion in Barret´s cancer remains challenging for both experts and AI.

Download Full-text

Detection and Diagnosis of Breast Cancer Using Artificial Intelligence Based Assessment of Maximum Intensity Projection Dynamic Contrast-Enhanced Magnetic Resonance Images

Diagnostics ◽

10.3390/diagnostics10050330 ◽

2020 ◽

Vol 10 (5) ◽

pp. 330

Author(s):

Mio Adachi ◽

Tomoyuki Fujioka ◽

Mio Mori ◽

Kazunori Kubota ◽

Yuka Kikuchi ◽

...

Keyword(s):

Artificial Intelligence ◽

Magnetic Resonance ◽

Diagnostic Performance ◽

Maximum Intensity Projection ◽

Breast Mri ◽

Magnetic Resonance Images ◽

Maximum Intensity ◽

Validation Data ◽

Dynamic Contrast Enhanced ◽

Contrast Enhanced

We aimed to evaluate an artificial intelligence (AI) system that can detect and diagnose lesions of maximum intensity projection (MIP) in dynamic contrast-enhanced (DCE) breast magnetic resonance imaging (MRI). We retrospectively gathered MIPs of DCE breast MRI for training and validation data from 30 and 7 normal individuals, 49 and 20 benign cases, and 135 and 45 malignant cases, respectively. Breast lesions were indicated with a bounding box and labeled as benign or malignant by a radiologist, while the AI system was trained to detect and calculate possibilities of malignancy using RetinaNet. The AI system was analyzed using test sets of 13 normal, 20 benign, and 52 malignant cases. Four human readers also scored these test data with and without the assistance of the AI system for the possibility of a malignancy in each breast. Sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were 0.926, 0.828, and 0.925 for the AI system; 0.847, 0.841, and 0.884 for human readers without AI; and 0.889, 0.823, and 0.899 for human readers with AI using a cutoff value of 2%, respectively. The AI system showed better diagnostic performance compared to the human readers (p = 0.002), and because of the increased performance of human readers with the assistance of the AI system, the AUC of human readers was significantly higher with than without the AI system (p = 0.039). Our AI system showed a high performance ability in detecting and diagnosing lesions in MIPs of DCE breast MRI and increased the diagnostic performance of human readers.

Download Full-text

A Call for Controlled Validation Data Sets: Promoting the Safe Introduction of Artificial Intelligence in Breast Imaging

Journal of the American College of Radiology ◽

10.1016/j.jacr.2021.06.001 ◽

2021 ◽

Author(s):

Fredrik Strand ◽

Bhavika K. Patel ◽

Bibb Allen

Keyword(s):

Artificial Intelligence ◽

Breast Imaging ◽

Data Sets ◽

Validation Data

Download Full-text

BEST PRACTICES TO MITIGATE BIAS AND DISCRIMINATION IN ARTIFICIAL INTELLIGENCE

Performance Improvement ◽

10.1002/pfi.21987 ◽

2021 ◽

Author(s):

Veronika Shestakova

Keyword(s):

Artificial Intelligence ◽

Best Practices ◽

Bias And Discrimination

Download Full-text

Rational and design of ST-segment elevation not associated with acute cardiac necrosis (LESTONNAC). A prospective registry for validation of a deep learning system assisted by artificial intelligence

Journal of Electrocardiology ◽

10.1016/j.jelectrocard.2021.10.009 ◽

2021 ◽

Author(s):

Manuel Martínez-Sellés ◽

Miriam Juárez ◽

Manuel Marina-Breysse ◽

José María Lillo-Castellano ◽

Albert Ariza

Keyword(s):

Artificial Intelligence ◽

Deep Learning ◽

Learning System ◽

St Segment Elevation ◽

St Segment ◽

Cardiac Necrosis

Download Full-text

Reports of the 2013 AAAI Spring Symposium Series

AI Magazine ◽

10.1609/aimag.v34i3.2493 ◽

2013 ◽

Vol 34 (3) ◽

pp. 93-98 ◽

Cited By ~ 1

Author(s):

Vita Markman ◽

Georgi Stojanov ◽

Bipin Indurkhya ◽

Takashi Kido ◽

Keiki Takadama ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Cognitive Development ◽

Behavior Change ◽

Autonomous Systems ◽

Data Driven ◽

Intelligent Robots ◽

Symposium Series ◽

Development Data ◽

Weakly Supervised

The Association for the Advancement of Artificial Intelligence was pleased to present the AAAI 2013 Spring Symposium Series, held Monday through Wednesday, March 25-27, 2013. The titles of the eight symposia were Analyzing Microtext, Creativity and (Early) Cognitive Development, Data Driven Wellness: From Self-Tracking to Behavior Change, Designing Intelligent Robots: Reintegrating AI II, Lifelong Machine Learning, Shikakeology: Designing Triggers for Behavior Change, Trust and Autonomous Systems, and Weakly Supervised Learning from Multimedia. This report contains summaries of the symposia, written, in most cases, by the cochairs of the symposium.

Download Full-text