Abstract 141: Artificial Intelligence to Improve the Detection and Triage of Cerebral Aneurysms

Vitor Mendes Pereira; Yoni Donner; Gil Levi; Nicole Cancelliere; Erez Wasserman; Catarina Perry da Câmara; Karla Mirella Silva Lobato Mendes; Patrick Nicholson; David Golan; Timo Krings; Raul Nogueira

doi:10.1161/str.51.suppl_1.141

Abstract 141: Artificial Intelligence to Improve the Detection and Triage of Cerebral Aneurysms

Stroke ◽

10.1161/str.51.suppl_1.141 ◽

2020 ◽

Vol 51 (Suppl_1) ◽

Author(s):

Vitor Mendes Pereira ◽

Yoni Donner ◽

Gil Levi ◽

Nicole Cancelliere ◽

Erez Wasserman ◽

...

Keyword(s):

Artificial Intelligence ◽

Data Augmentation ◽

Pearson Correlation ◽

Model Performance ◽

Cerebral Aneurysms ◽

Roc Curves ◽

Clinical Decision ◽

Training Dataset ◽

Initial Assessment ◽

Validation Dataset

Cerebral Aneurysms (CAs) may occur in 5-10% of the population. They can be often missed because they require a very methodological diagnostic approach. We developed an algorithm using artificial intelligence to assist and supervise and detect CAs. Methods: We developed an automated algorithm to detect CAs. The algorithm is based on 3D convolutional neural network modeled as a U-net. We included all saccular CAs from 2014 to 2016 from a single center. Normal and pathological datasets were prepared and annotated in 3D using an in-house developed platform. To assess the accuracy and to optimize the model, we assessed preliminary results using a validation dataset. After the algorithm was trained, a dataset was used to evaluate final IA detection and aneurysm measurements. The accuracy of the algorithm was derived using ROC curves and Pearson correlation tests. Results: We used 528 CTAs with 674 aneurysms at the following locations: ACA (3%), ACA/ACOM (26.1%), ICA/MCA (26.3%), MCA (29.4%), PCA/PCOM (2.3%), Basilar (6.6%), Vertebral (2.3%) and other (3.7%). Training datasets consisted of 189 CA scans. We plotted ROC curves and achieved an AUC of 0.85 for unruptured and 0.88 for ruptured CAs. We improved the model performance by increasing the training dataset employing various methods of data augmentation to leverage the data to its fullest. The final model tested was performed in 528 CTAs using 5-fold cross-validation and an additional set of 2400 normal CTAs. There was a significant improvement compared to the initial assessment, with an AUC of 0.93 for unruptured and 0.94 for ruptured. The algorithm detected larger aneurysms more accurately, reaching an AUC of 0.97 and a 91.5% specificity at 90% sensitivity for aneurysms larger than 7mm. Also, the algorithm accurately detected CAs in the following locations: basilar(AUC of 0.97) and MCA/ACOM (AUC of 0.94). The volume measurement (mm3) by the model compared to the annotated one achieved a Pearson correlation of 99.36. Conclusion: The Viz.ai aneurysm algorithm was able to detect and measure ruptured and unruptured CAs in consecutive CTAs. The model has demonstrated that a deep learning AI algorithm can achieve clinically useful levels of accuracy for clinical decision support.

Download Full-text

ICD10Net: An Artificial Intelligence Algorithm with Medical Background Conducts ICD-10-CM Coding Task with Outstanding Performance (Preprint)

10.2196/preprints.13677 ◽

2019 ◽

Author(s):

Chin Lin ◽

Yu-Sheng Lou ◽

Chia-Cheng Lee ◽

Chia-Jung Hsu ◽

Ding-Chung Wu ◽

...

Keyword(s):

Artificial Intelligence ◽

General Hospital ◽

Pearson Correlation ◽

Model Performance ◽

International Classification Of Diseases ◽

Free Text ◽

Daily Work ◽

Medical Background ◽

Icd 10 ◽

F Measure

BACKGROUND An artificial intelligence-based algorithm has shown a powerful ability for coding the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) in discharge notes. However, its performance still requires improvement compared with human experts. The major disadvantage of the previous algorithm is its lack of understanding medical terminologies. OBJECTIVE We propose some methods based on human-learning process and conduct a series of experiments to validate their improvements. METHODS We compared two data sources for training the word-embedding model: English Wikipedia and PubMed journal abstracts. Moreover, the fixed, changeable, and double-channel embedding tables were used to test their performance. Some additional tricks were also applied to improve accuracy. We used these methods to identify the three-chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. Subsequently, 94,483-labeled discharge notes from June 1, 2015 to June 30, 2017 were used from the Tri-Service General Hospital in Taipei, Taiwan. To evaluate performance, 24,762 discharge notes from July 1, 2017 to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from other seven hospitals were also tested. The F-measure is the major global measure of effectiveness. RESULTS In understanding medical terminologies, the PubMed-embedding model (Pearson correlation = 0.60/0.57) shows a better performance compared with the Wikipedia-embedding model (Pearson correlation = 0.35/0.31). In the accuracy of ICD-10-CM coding, the changeable model both used the PubMed- and Wikipedia-embedding model has the highest testing mean F-measure (0.7311 and 0.6639 in Tri-Service General Hospital and other seven hospitals, respectively). Moreover, a proposed method called a hybrid sampling method, an augmentation trick to avoid algorithms identifying negative terms, was found to additionally improve the model performance. CONCLUSIONS The proposed model architecture and training method is named as ICD10Net, which is the first expert level model practically applied to daily work. This model can also be applied in unstructured information extraction from free-text medical writing. We have developed a web app to demonstrate our work (https://linchin.ndmctsgh.edu.tw/app/ICD10/).

Download Full-text

A real-world demonstration of machine learning generalizability in the detection of intracranial hemorrhage on head computerized tomography

Scientific Reports ◽

10.1038/s41598-021-95533-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hojjat Salehinejad ◽

Jumpei Kitamura ◽

Noah Ditkofsky ◽

Amy Lin ◽

Aditya Bharatha ◽

...

Keyword(s):

Machine Learning ◽

Medical Imaging ◽

Intracranial Hemorrhage ◽

Real World ◽

External Validation ◽

Model Performance ◽

Training Dataset ◽

Validation Dataset ◽

Great Promise ◽

Clinical Environments

AbstractMachine learning (ML) holds great promise in transforming healthcare. While published studies have shown the utility of ML models in interpreting medical imaging examinations, these are often evaluated under laboratory settings. The importance of real world evaluation is best illustrated by case studies that have documented successes and failures in the translation of these models into clinical environments. A key prerequisite for the clinical adoption of these technologies is demonstrating generalizable ML model performance under real world circumstances. The purpose of this study was to demonstrate that ML model generalizability is achievable in medical imaging with the detection of intracranial hemorrhage (ICH) on non-contrast computed tomography (CT) scans serving as the use case. An ML model was trained using 21,784 scans from the RSNA Intracranial Hemorrhage CT dataset while generalizability was evaluated using an external validation dataset obtained from our busy trauma and neurosurgical center. This real world external validation dataset consisted of every unenhanced head CT scan (n = 5965) performed in our emergency department in 2019 without exclusion. The model demonstrated an AUC of 98.4%, sensitivity of 98.8%, and specificity of 98.0%, on the test dataset. On external validation, the model demonstrated an AUC of 95.4%, sensitivity of 91.3%, and specificity of 94.1%. Evaluating the ML model using a real world external validation dataset that is temporally and geographically distinct from the training dataset indicates that ML generalizability is achievable in medical imaging applications.

Download Full-text

Development of an ANN-Based Urban Flood Alert Criteria Prediction Model and the Impact of Training Data Augmentation

Korean Society of Hazard Mitigation ◽

10.9798/kosham.2021.21.6.257 ◽

2021 ◽

Vol 21 (6) ◽

pp. 257-264

Author(s):

Hoseon Kang ◽

Jaewoong Cho ◽

Hanseung Lee ◽

Jeonggeun Hwang ◽

Hyejin Moon

Keyword(s):

Artificial Intelligence ◽

Data Augmentation ◽

Model Performance ◽

Fuzzy Model ◽

Flood Damage ◽

Training Data ◽

Ann Model ◽

Urban Flood ◽

Flood Alert ◽

The Impact

Urban flooding occurs during heavy rains of short duration, so quick and accurate warnings of the danger of inundation are required. Previous research proposed methods to estimate statistics-based urban flood alert criteria based on flood damage records and rainfall data, and developed a Neuro-Fuzzy model for predicting appropriate flood alert criteria. A variety of artificial intelligence algorithms have been applied to the prediction of the urban flood alert criteria, and their usage and predictive precision have been enhanced with the recent development of artificial intelligence. Therefore, this study predicted flood alert criteria and analyzed the effect of applying the technique to augmentation training data using the Artificial Neural Network (ANN) algorithm. The predictive performance of the ANN model was RMSE 3.39-9.80 mm, and the model performance with the extension of training data was RMSE 1.08-6.88 mm, indicating that performance was improved by 29.8-82.6%.

Download Full-text

Deep Learning Model Improves Radiologists’ Performance in Detection and Classification of Breast Lesions

10.21203/rs.3.rs-746374/v1 ◽

2021 ◽

Author(s):

Ying-Shi Sun ◽

Yu-Hong Qu ◽

Dong Wang ◽

Yi Li ◽

Lin Ye ◽

...

Keyword(s):

Artificial Intelligence ◽

Deep Learning ◽

Roc Curve ◽

False Positive ◽

Large Scale ◽

False Positive Rate ◽

Training Dataset ◽

Validation Dataset ◽

Breast Lesions ◽

Positive Rate

Abstract Background: Computer-aided diagnosis using deep learning algorithms has been initially applied in the field of mammography, but there is no large-scale clinical application.Methods: This study proposed to develop and verify an artificial intelligence model based on mammography. Firstly, retrospectively collected mammograms from six centers were randomized to a training dataset and a validation dataset for establishing the model. Secondly, the model was tested by comparing 12 radiologists’ performance with and without it. Finally, prospectively multicenter mammograms were diagnosed by radiologists with the model. The detection and diagnostic capabilities were evaluated using the free-response receiver operating characteristic (FROC) curve and ROC curve.Results: The sensitivity of model for detecting lesion after matching was 0.908 for false positive rate of 0.25 in unilateral images. The area under ROC curve (AUC) to distinguish the benign from malignant lesions was 0.855 (95% CI: 0.830, 0.880). The performance of 12 radiologists with the model was higher than that of radiologists alone (AUC: 0.852 vs. 0.808, P = 0.005). The mean reading time of with the model was shorter than that of reading alone (80.18 s vs. 62.28 s, P = 0.03). In prospective application, the sensitivity of detection reached 0.887 at false positive rate of 0.25; the AUC of radiologists with the model was 0.983 (95% CI: 0.978, 0.988), with sensitivity, specificity, PPV, and NPV of 94.36%, 98.07%, 87.76%, and 99.09%, respectively.Conclusions: The artificial intelligence model exhibits high accuracy for detecting and diagnosing breast lesions, improves diagnostic accuracy and saves time.Trial registration: NCT, NCT03708978. Registered 17 April 2018, https://register.clinicaltrials.gov/prs/app/ NCT03708978

Download Full-text

Automated Left Ventricular Dimension Assessment Using Artificial Intelligence Developed and Validated by a UK-Wide Collaborative

Circulation Cardiovascular Imaging ◽

10.1161/circimaging.120.011951 ◽

2021 ◽

Author(s):

James P. Howard ◽

Catherine C. Stowell ◽

Graham D. Cole ◽

Kajaluxy Ananthan ◽

Camelia D. Demetrescu ◽

...

Keyword(s):

Artificial Intelligence ◽

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Interventricular Septum ◽

Intraclass Correlation ◽

Left Ventricular ◽

Expert Consensus ◽

Training Dataset ◽

Validation Dataset ◽

Online Platform

Background: Artificial intelligence (AI) for echocardiography requires training and validation to standards expected of humans. We developed an online platform and established the Unity Collaborative to build a dataset of expertise from 17 hospitals for training, validation, and standardization of such techniques. Methods: The training dataset consisted of 2056 individual frames drawn at random from 1265 parasternal long-axis video-loops of patients undergoing clinical echocardiography in 2015 to 2016. Nine experts labeled these images using our online platform. From this, we trained a convolutional neural network to identify keypoints. Subsequently, 13 experts labeled a validation dataset of the end-systolic and end-diastolic frame from 100 new video-loops, twice each. The 26-opinion consensus was used as the reference standard. The primary outcome was precision SD, the SD of the differences between AI measurement and expert consensus. Results: In the validation dataset, the AI’s precision SD for left ventricular internal dimension was 3.5 mm. For context, precision SD of individual expert measurements against the expert consensus was 4.4 mm. Intraclass correlation coefficient between AI and expert consensus was 0.926 (95% CI, 0.904–0.944), compared with 0.817 (0.778–0.954) between individual experts and expert consensus. For interventricular septum thickness, precision SD was 1.8 mm for AI (intraclass correlation coefficient, 0.809; 0.729–0.967), versus 2.0 mm for individuals (intraclass correlation coefficient, 0.641; 0.568–0.716). For posterior wall thickness, precision SD was 1.4 mm for AI (intraclass correlation coefficient, 0.535 [95% CI, 0.379–0.661]), versus 2.2 mm for individuals (0.366 [0.288–0.462]). We present all images and annotations. This highlights challenging cases, including poor image quality and tapered ventricles. Conclusions: Experts at multiple institutions successfully cooperated to build a collaborative AI. This performed as well as individual experts. Future echocardiographic AI research should use a consensus of experts as a reference. Our collaborative welcomes new partners who share our commitment to publish all methods, code, annotations, and results openly.

Download Full-text

Mapping landslides using drone's full-motion videos

10.5194/egusphere-egu21-11179 ◽

2021 ◽

Author(s):

Ionut Cosmin Sandric ◽

Viorel Ilinca ◽

Radu Irimia ◽

Zenaida Chitu ◽

Marta Jurchescu ◽

...

Keyword(s):

Artificial Intelligence ◽

Optimal Number ◽

Aerial Imagery ◽

Training Dataset ◽

Validation Dataset ◽

Validation Process ◽

Video Frames ◽

Ongoing Project ◽

Insight Into ◽

Motion Video

<p>Rapid mapping of landslides plays an important role in both science and emergency management communities. It helps people to take the appropriate decisions in quasi-real-time and to diminish losses. With the increasing advancement in high-resolution satellite and aerial imagery, this task also increased the spatial accuracy, providing more and more accurate maps of landslide locations. In accordance with the latest developments in the fields of unmanned aerial vehicles and artificial intelligence, the current study is focused on providing an insight into the process of mapping landslides from full-motion videos and by means of artificial intelligence. To achieve this goal, several drone flights were performed over areas located in the Romanian Subcarpathians, using Quadro-Copters (DJI Phantom 4 and DJI Mavic 2 Enterprise) equipped with a 12 MP RGB camera. The flights were planned and executed to reach an optimal number of pictures and videos, taken from various angles and heights over the study areas. Using Structure from Motion techniques, each dataset was processed and orthorectified. Similarly, each video was processed and transformed into a full-motion video, having coordinates allocated to each frame. Samples of specific landslide features were collected by hand, using the pictures and the video frames, and used to create a complete database necessary to train a Mask RCNN model. The samples were divided into two different datasets, having 80% of them used for the training process and the rest of 20% for the validation process. The model was trained over 50 epochs and it reached an accuracy of approximately 86% on the training dataset and about 82% on the validation dataset. The study is part of an ongoing project, SlideMap 416PED, financed by UEFISCDI, Romania. More details about the project can be found at https://slidemap.geo-spatial.ro.</p>

Download Full-text

Landslide Susceptibility Modeling Using Integrated Ensemble Weights of Evidence with Logistic Regression and Random Forest Models

Applied Sciences ◽

10.3390/app9010171 ◽

2019 ◽

Vol 9 (1) ◽

pp. 171 ◽

Cited By ~ 42

Author(s):

Wei Chen ◽

Zenghui Sun ◽

Jichang Han

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Land Use Planning ◽

Landslide Susceptibility ◽

Area Under The Curve ◽

Roc Curves ◽

Weights Of Evidence ◽

Landslide Susceptibility Mapping ◽

Training Dataset ◽

Validation Dataset

The main aim of this study was to compare the performances of the hybrid approaches of traditional bivariate weights of evidence (WoE) with multivariate logistic regression (WoE-LR) and machine learning-based random forest (WoE-RF) for landslide susceptibility mapping. The performance of the three landslide models was validated with receiver operating characteristic (ROC) curves and area under the curve (AUC). The results showed that the areas under the curve obtained using the WoE, WoE-LR, and WoE-RF methods were 0.720, 0.773, and 0.802 for the training dataset, and were 0.695, 0.763, and 0.782 for the validation dataset, respectively. The results demonstrate the superiority of hybrid models and that the resultant maps would be useful for land use planning in landslide-prone areas.

Download Full-text

Automated Detection and Classification of Pavement Distresses using 3D Pavement Surface Images and Deep Learning

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211007481 ◽

2021 ◽

pp. 036119812110074

Author(s):

Rohit Ghosh ◽

Omar Smadi

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Three Dimensional ◽

Area Under The Curve ◽

Roc Curves ◽

Validation Dataset ◽

Accurate Identification ◽

Pavement Distresses ◽

High Level

Pavement distresses lead to pavement deterioration and failure. Accurate identification and classification of distresses helps agencies evaluate the condition of their pavement infrastructure and assists in decision-making processes on pavement maintenance and rehabilitation. The state of the art is automated pavement distress detection using vision-based methods. This study implements two deep learning techniques, Faster Region-based Convolutional Neural Networks (R-CNN) and You Only Look Once (YOLO) v3, for automated distress detection and classification of high resolution (1,800 × 1,200) three-dimensional (3D) asphalt and concrete pavement images. The training and validation dataset contained 625 images that included distresses manually annotated with bounding boxes representing the location and types of distresses and 798 no-distress images. Data augmentation was performed to enable more balanced representation of class labels and prevent overfitting. YOLO and Faster R-CNN achieved 89.8% and 89.6% accuracy respectively. Precision-recall curves were used to determine the average precision (AP), which is the area under the precision-recall curve. The AP values for YOLO and Faster R-CNN were 90.2% and 89.2% respectively, indicating strong performance for both models. Receiver operating characteristic (ROC) curves were also developed to determine the area under the curve, and the resulting area under the curve values of 0.96 for YOLO and 0.95 for Faster R-CNN also indicate robust performance. Finally, the models were evaluated by developing confusion matrices comparing our proposed model with manual quality assurance and quality control (QA/QC) results performed on automated pavement data. A very high level of match to manual QA/QC, namely 97.6% for YOLO and 96.9% for Faster R-CNN, suggest the proposed methodology has potential as a replacement for manual QA/QC.

Download Full-text

Diagnosis of Leukaemia in Blood Slides Based on a Fine-Tuned and Highly Generalisable Deep Learning Model

Sensors ◽

10.3390/s21092989 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2989

Author(s):

Luis Vogado ◽

Rodrigo Veras ◽

Kelson Aires ◽

Flávio Araújo ◽

Romuere Silva ◽

...

Keyword(s):

Blood Cells ◽

Data Augmentation ◽

White Blood Cells ◽

Model Performance ◽

Fine Tuning ◽

Training Dataset ◽

Current State ◽

Unseen Data ◽

Texture Characteristics ◽

Tuning Methods

Leukaemia is a dysfunction that affects the production of white blood cells in the bone marrow. Young cells are abnormally produced, replacing normal blood cells. Consequently, the person suffers problems in transporting oxygen and in fighting infections. This article proposes a convolutional neural network (CNN) named LeukNet that was inspired on convolutional blocks of VGG-16, but with smaller dense layers. To define the LeukNet parameters, we evaluated different CNNs models and fine-tuning methods using 18 image datasets, with different resolution, contrast, colour and texture characteristics. We applied data augmentation operations to expand the training dataset, and the 5-fold cross-validation led to an accuracy of 98.61%. To evaluate the CNNs generalisation ability, we applied a cross-dataset validation technique. The obtained accuracies using cross-dataset experiments on three datasets were 97.04, 82.46 and 70.24%, which overcome the accuracies obtained by current state-of-the-art methods. We conclude that using the most common and deepest CNNs may not be the best choice for applications where the images to be classified differ from those used in pre-training. Additionally, the adopted cross-dataset validation approach proved to be an excellent choice to evaluate the generalisation capability of a model, as it considers the model performance on unseen data, which is paramount for CAD systems.

Download Full-text

ALBERT-based Self-ensemble Model with Semi-supervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study (Preprint)

10.2196/preprints.23086 ◽

2020 ◽

Author(s):

Junyi Li ◽

Xuejie Zhang ◽

Xiaobing Zhou

Keyword(s):

Supervised Learning ◽

Semantic Similarity ◽

Data Augmentation ◽

Pearson Correlation ◽

Model Performance ◽

Small Data ◽

Calculation Algorithm ◽

Long Distance ◽

Similarity Calculation ◽

Semantic Textual Similarity

BACKGROUND In recent years, with the increase in the amount of information and the importance of information screening, increasing attention has been paid to the calculation of textual semantic similarity. In the medical field, with the rapid increase in electronic medical data, electronic medical records and medical research documents have become important data resources for medical clinical research. Medical textual semantic similarity calculation has become an urgent problem to be solved. The 2019 N2C2/OHNLP shared task Track on Clinical Semantic Textual Similarity is one of significant tasks for medical textual semantic similarity calculation. OBJECTIVE This research aims to solve two problems: 1) The size of medical datasets is small, which leads to the problem of insufficient learning with understanding of the models; 2) The data information will be lost in the process of long-distance propagation, which causes the models to be unable to grasp key information. METHODS This paper combines a text data augmentation method and a self-ensemble ALBERT model under semi-supervised learning to perform clinical textual semantic similarity calculation. RESULTS Compared with the competition methods the 2019 N2C2/OHNLP Track 1 ClinicalSTS, our method achieves state-of-the-art result with a value 0.92 of the Pearson correlation coefficient and surpasses the best result by 2 percentage point. CONCLUSIONS When the size of medical dataset is small, data augmentation and improved semi-supervised learning can increase the size of dataset and boost the learning efficiency of the model. Additionally, self-ensemble improves the model performance significantly. Through the results, we can know that our method has excellent performance and it has great potential to improve related medical problems. CLINICALTRIAL

Download Full-text