DOP13 Artificial Intelligence (AI) in endoscopy - Deep learning for detection and scoring of Ulcerative Colitis (UC) disease activity under multiple scoring systems

Abstract Background Computer vision & deep learning(DL)to assess & help with tissue characterization of disease activity in Ulcerative Colitis(UC)through Mayo Endoscopic Subscore(MES)show good results in central reading for clinical trials.UCEIS(Ulcerative Colitis Endoscopic Index of Severity)being a granular index,may be more reflective of disease activity & more primed for artificial intelligence(AI). We set out to create UC detection & scoring,in a single tool & graphic user interface(GUI),improving accuracy & precision of MES & UCEIS scores & reducing the time elapsed between video collection,quality assurance & final scoring.We apply DL models to detect & filter scorable frames,assess quality of endoscopic recordings & predict MES & UCEIS scores in videos of patients with UC Methods We leveraged>375,000frames from endoscopy cases using Olympus scopes(190&180Series).Experienced endoscopists & 9 labellers tagged~22,000(6%)images showing normal, disease state(MES orUCEIS subscores)& non-scorable frames.We separate total frames in 3 categories:training(60%),testing(20%)&validation(20%).Using a Convolutional Neural Network(CNN)Inception V3,including a biopsy & post-biopsy detector,an out-of-the-body framework & blue light algorithm.Similar architecture for detection with multiple separate units & corresponding dense layers taking CNN to provide continuous scores for 5 separate outputs:MES,aggregate UCEIS & individual components Vascular Pattern,Bleeding & Ulcers. Results Multiple metrics evaluate detection models.Overall performance has an accuracy of~88% & a similar precision & recall for all classes. MAE(distance from ground truth)& mean bias(over/under-prediction tendency)are used to assess the performance of the scoring model.Our model performs well as predicted distributions are relatively close to the labelled,ground truth data & MAE & Bias for all frames are relatively low considering the magnitude of the scoring scale. To leverage all our models,we developed a practical tool that should be used to improve efficiency & accuracy of reading & scoring process for UC at different stages of the clinical journey. Conclusion We propose a DL approach based on labelled images to automate a workflow for improving & accelerating UC disease detection & scoring using MES & UCEIS scores. Our deep learning model shows relevant feature identification for scoring disease activity in UC patients, well aligned with both scoring guidelines,performance of experts & demonstrates strong promise for generalization.Going forward, we aim to continue developing our detection & scoring tool. With our detailed workflow supported by deep learning models, we have a driving function to create a precise & potentially superhuman level AI to score disease activity

Download Full-text

Artificial Intelligence(AI) In Endoscopy-Deep Learning for Scoring of Ulcerative Colitis Disease Activity Under Multiple Scoring Systems

10.1055/s-0041-1724397 ◽

2021 ◽

Author(s):

MF Byrne ◽

JE East ◽

M Iacucci ◽

R Panaccione ◽

R Kalapala ◽

...

Keyword(s):

Artificial Intelligence ◽

Ulcerative Colitis ◽

Deep Learning ◽

Disease Activity ◽

Scoring Systems

Download Full-text

ID: 3526290 ARTIFICIAL INTELLIGENCE (AI) IN ENDOSCOPY - DEEP LEARNING FOR SCORING OF ULCERATIVE COLITIS DISEASE ACTIVITY UNDER MULTIPLE SCORING SYSTEMS

Gastrointestinal Endoscopy ◽

10.1016/j.gie.2021.03.441 ◽

2021 ◽

Vol 93 (6) ◽

pp. AB196-AB197

Author(s):

Michael F. Byrne ◽

James E. East ◽

Marietta Iacucci ◽

Remo Panaccione ◽

Rakesh Kalapala ◽

...

Keyword(s):

Artificial Intelligence ◽

Ulcerative Colitis ◽

Deep Learning ◽

Disease Activity ◽

Scoring Systems

Download Full-text

Development and Validation of a Deep Learning Algorithm to Evaluate Endoscopic Disease Activity of Ulcerative Colitis.

Case Medical Research ◽

10.31525/ct1-nct03973437 ◽

2019 ◽

Author(s):

Keyword(s):

Ulcerative Colitis ◽

Deep Learning ◽

Disease Activity ◽

Learning Algorithm ◽

Deep Learning Algorithm ◽

Development And Validation

Download Full-text

Training and deploying a deep learning model for endoscopic severity grading in ulcerative colitis using multicenter clinical trial data

Therapeutic Advances in Gastrointestinal Endoscopy ◽

10.1177/2631774521990623 ◽

2021 ◽

Vol 14 ◽

pp. 263177452199062

Author(s):

Benjamin Gutierrez Becker ◽

Filippo Arcadu ◽

Andreas Thalhammer ◽

Citlalli Gamez Serna ◽

Owen Feehan ◽

...

Keyword(s):

Artificial Intelligence ◽

Ulcerative Colitis ◽

Clinical Trials ◽

Deep Learning ◽

Mayo Clinic ◽

Automated System ◽

Phase Iii ◽

Plain Language ◽

Data Set ◽

Endoscopic Videos

Introduction: The Mayo Clinic Endoscopic Subscore is a commonly used grading system to assess the severity of ulcerative colitis. Correctly grading colonoscopies using the Mayo Clinic Endoscopic Subscore is a challenging task, with suboptimal rates of interrater and intrarater variability observed even among experienced and sufficiently trained experts. In recent years, several machine learning algorithms have been proposed in an effort to improve the standardization and reproducibility of Mayo Clinic Endoscopic Subscore grading. Methods: Here we propose an end-to-end fully automated system based on deep learning to predict a binary version of the Mayo Clinic Endoscopic Subscore directly from raw colonoscopy videos. Differently from previous studies, the proposed method mimics the assessment done in practice by a gastroenterologist, that is, traversing the whole colonoscopy video, identifying visually informative regions and computing an overall Mayo Clinic Endoscopic Subscore. The proposed deep learning–based system has been trained and deployed on raw colonoscopies using Mayo Clinic Endoscopic Subscore ground truth provided only at the colon section level, without manually selecting frames driving the severity scoring of ulcerative colitis. Results and Conclusion: Our evaluation on 1672 endoscopic videos obtained from a multisite data set obtained from the etrolizumab Phase II Eucalyptus and Phase III Hickory and Laurel clinical trials, show that our proposed methodology can grade endoscopic videos with a high degree of accuracy and robustness (Area Under the Receiver Operating Characteristic Curve = 0.84 for Mayo Clinic Endoscopic Subscore ⩾ 1, 0.85 for Mayo Clinic Endoscopic Subscore ⩾ 2 and 0.85 for Mayo Clinic Endoscopic Subscore ⩾ 3) and reduced amounts of manual annotation. Plain language summary Patient, caregiver and provider thoughts on educational materials about prescribing and medication safety Artificial intelligence can be used to automatically assess full endoscopic videos and estimate the severity of ulcerative colitis. In this work, we present an artificial intelligence algorithm for the automatic grading of ulcerative colitis in full endoscopic videos. Our artificial intelligence models were trained and evaluated on a large and diverse set of colonoscopy videos obtained from concluded clinical trials. We demonstrate not only that artificial intelligence is able to accurately grade full endoscopic videos, but also that using diverse data sets obtained from multiple sites is critical to train robust AI models that could potentially be deployed on real-world data.

Download Full-text

Deep Learning Based Cardiac MRI Segmentation: Do We Need Experts?

Algorithms ◽

10.3390/a14070212 ◽

2021 ◽

Vol 14 (7) ◽

pp. 212

Author(s):

Youssef Skandarani ◽

Pierre-Marc Jodoin ◽

Alain Lalande

Keyword(s):

Deep Learning ◽

Cardiac Mri ◽

Expert Knowledge ◽

Medical Image Analysis ◽

Ground Truth ◽

Cine Mri ◽

Data Sets ◽

Mri Segmentation ◽

Segmentation Evaluation ◽

Ground Truth Data

Deep learning methods are the de facto solutions to a multitude of medical image analysis tasks. Cardiac MRI segmentation is one such application, which, like many others, requires a large number of annotated data so that a trained network can generalize well. Unfortunately, the process of having a large number of manually curated images by medical experts is both slow and utterly expensive. In this paper, we set out to explore whether expert knowledge is a strict requirement for the creation of annotated data sets on which machine learning can successfully be trained. To do so, we gauged the performance of three segmentation models, namely U-Net, Attention U-Net, and ENet, trained with different loss functions on expert and non-expert ground truth for cardiac cine–MRI segmentation. Evaluation was done with classic segmentation metrics (Dice index and Hausdorff distance) as well as clinical measurements, such as the ventricular ejection fractions and the myocardial mass. The results reveal that generalization performances of a segmentation neural network trained on non-expert ground truth data is, to all practical purposes, as good as that trained on expert ground truth data, particularly when the non-expert receives a decent level of training, highlighting an opportunity for the efficient and cost-effective creation of annotations for cardiac data sets.

Download Full-text

Abstract 20: Deep Learning-Based Automated Intracranial Hemorrhage Detection and Notification

Stroke ◽

10.1161/str.51.suppl_1.20 ◽

2020 ◽

Vol 51 (Suppl_1) ◽

Author(s):

Benjamin Zahneisen ◽

Matus Straka ◽

Shalini Bammer ◽

Greg Albers ◽

Roland Bammer

Keyword(s):

Deep Learning ◽

Sensitivity And Specificity ◽

Intracranial Hemorrhage ◽

Health Care Professionals ◽

Gold Standard ◽

Ground Truth ◽

Mobile App ◽

Training Dataset ◽

Convolutional Network ◽

Ground Truth Data

Introduction: Ruling out hemorrhage (stroke or traumatic) prior to administration of thrombolytics is critical for Code Strokes. A triage software that identifies hemorrhages on head CTs and alerts radiologists would help to streamline patient care and increase diagnostic confidence and patient safety. ML approach: We trained a deep convolutional network with a hybrid 3D/2D architecture on unenhanced head CTs of 805 patients. Our training dataset comprised 348 positive hemorrhage cases (IPH=245, SAH=67, Sub/Epi-dural=70, IVH=83) (128 female) and 457 normal controls (217 female). Lesion outlines were drawn by experts and stored as binary masks that were used as ground truth data during the training phase (random 80/20 train/test split). Diagnostic sensitivity and specificity were defined on a per patient study level, i.e. a single, binary decision for presence/absence of a hemorrhage on a patient’s CT scan. Final validation was performed in 380 patients (167 positive). Tool: The hemorrhage detection module was prototyped in Python/Keras. It runs on a local LINUX server (4 CPUs, no GPUs) and is embedded in a larger image processing platform dedicated to stroke. Results: Processing time for a standard whole brain CT study (3-5mm slices) was around 2min. Upon completion, an instant notification (by email and/or mobile app) was sent to users to alert them about the suspected presence of a hemorrhage. Relative to neuroradiologist gold standard reads the algorithm’s sensitivity and specificity is 90.4% and 92.5% (95% CI: 85%-94% for both). Detection of acute intracranial hemorrhage can be automatized by deploying deep learning. It yielded very high sensitivity/specificity when compared to gold standard reads by a neuroradiologist. Volumes as small as 0.5mL could be detected reliably in the test dataset. The software can be deployed in busy practices to prioritize worklists and alert health care professionals to speed up therapeutic decision processes and interventions.

Download Full-text

P432 Effects and safety of a colon-long absorbing budesonide product in patients with mild to moderate ulcerative colitis

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjz203.561 ◽

2020 ◽

Vol 14 (Supplement_1) ◽

pp. S393-S394

Author(s):

D Pigniczki ◽

K Szántó ◽

M Rutka ◽

K Farkas ◽

A Bálint ◽

...

Keyword(s):

Ulcerative Colitis ◽

Body Composition ◽

Side Effects ◽

Disease Activity ◽

The Body ◽

Composition Analysis ◽

Hormonal Changes ◽

Aminosalicylic Acid ◽

Laboratory Parameters ◽

Body Composition Analysis

Abstract Background Budesonide is an oral corticosteroid, which is widely used in moderate-to-severe ulcerative colitis (UC) to obtain and maintain remission in those cases, where 5-aminosalicylic-acid was ineffective. Unlike previous forms of budesonide absorbing from the ileal and ascending colon region, the new-generation budesonide-MMX contains a formula, that allows absorption throughout the whole colon, and therefore in the whole potentially inflamed area in UC. We aimed to evaluate the effects and safety of budesonide-MMX in our UC patients who meet the above mentioned therapeutic demands in a real-life study. Methods We enrolled 22 patients with mild to severe UC in this single-centre prospective study until August of 2019. Patients received 9 mg oral budesonide-MMX once daily until 8 weeks. Laboratory parameters (cholesterol, triglyceride, CRP) and serum hormone levels (parathormone [PTH], dehydroepiandrosterone [DHEA] and cortisol) were monitored before and after the 8-week therapy to follow metabolic and hormonal changes. During these visits, body composition analysis was also performed with InBody 770 machine to observe the adverse steroid effects of budesonide-MMX in respect of body fat mass, body mass index, protein content of the body and bone mineral content. Disease activity was followed by the partial Mayo (pMayo) score. Statistical analysis was performed by paired t-test and Wilcoxon signed-rank test with SigmaPlot 1.25. Results The total of 22 patients (age: 44.4 ± 15.1 years, 6 male and 16 female patients) had received the 2-month budesonide-MMX therapy (2.0 ± 0.3 months). Mean disease duration was 8.3 years. By the end of follow-up, 15 (68.2%) patients experienced remission and 7 patients (31.8%) were primary non-responders. The disease activity decreased significantly from the mean of 3.95 to 1.64 (p < 0.001). No significant changes were observable in case of any body composition analysis parameter. Regarding the laboratory parameters, serum cholesterin level showed a significant increase (p < 0.001), while triglyceride and CRP showed did not show significant changes. Serum cortisol levels were elevated (p < 0.001), while PTH and DHEA showed no significant decrease. Only two patients experienced side effects: one of them hypertonia, headache and acnes, while the other patient experienced mild diarrhoea. One patient had a relapse during the treatment. Conclusion In our study, budesonide-MMX proved to be safe by bringing up a low number of side effects, while more than two-thirds of the patients could reach remission with this short-term therapy. Hormonal changes were not mentionable, although the drug’s effects on serum lipid content have to be examined further.

Download Full-text

Estimating the deep replicability of scientific findings using human and artificial intelligence

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1909046117 ◽

2020 ◽

Vol 117 (20) ◽

pp. 10762-10768

Author(s):

Yang Yang ◽

Wu Youyou ◽

Brian Uzzi

Keyword(s):

Artificial Intelligence ◽

Ground Truth ◽

Machine Intelligence ◽

Prediction Markets ◽

Self Assessment ◽

Ground Truth Data ◽

Out Of Sample ◽

Scientific Papers ◽

Assessment Techniques ◽

Better Than

Replicability tests of scientific papers show that the majority of papers fail replication. Moreover, failed papers circulate through the literature as quickly as replicating papers. This dynamic weakens the literature, raises research costs, and demonstrates the need for new approaches for estimating a study’s replicability. Here, we trained an artificial intelligence model to estimate a paper’s replicability using ground truth data on studies that had passed or failed manual replication tests, and then tested the model’s generalizability on an extensive set of out-of-sample studies. The model predicts replicability better than the base rate of reviewers and comparably as well as prediction markets, the best present-day method for predicting replicability. In out-of-sample tests on manually replicated papers from diverse disciplines and methods, the model had strong accuracy levels of 0.65 to 0.78. Exploring the reasons behind the model’s predictions, we found no evidence for bias based on topics, journals, disciplines, base rates of failure, persuasion words, or novelty words like “remarkable” or “unexpected.” We did find that the model’s accuracy is higher when trained on a paper’s text rather than its reported statistics and that n-grams, higher order word combinations that humans have difficulty processing, correlate with replication. We discuss how combining human and machine intelligence can raise confidence in research, provide research self-assessment techniques, and create methods that are scalable and efficient enough to review the ever-growing numbers of publications—a task that entails extensive human resources to accomplish with prediction markets and manual replication alone.

Download Full-text

DeepFRET: Rapid and automated single molecule FRET data classification using deep learning

10.1101/2020.06.26.173260 ◽

2020 ◽

Cited By ~ 1

Author(s):

Johannes Thomsen ◽

Magnus B. Sletfjerding ◽

Stefano Stella ◽

Bijoya Paul ◽

Simon Bo Jensen ◽

...

Keyword(s):

Deep Learning ◽

Structural Biology ◽

Single Molecule ◽

Resonance Energy Transfer ◽

Resonance Energy ◽

Ground Truth ◽

Real Data ◽

Ground Truth Data ◽

Single Molecule Fret ◽

Human Operators

AbstractSingle molecule Förster Resonance energy transfer (smFRET) is a mature and adaptable method for studying the structure of biomolecules and integrating their dynamics into structural biology. The development of high throughput methodologies and the growth of commercial instrumentation have outpaced the development of rapid, standardized, and fully automated methodologies to objectively analyze the wealth of produced data. Here we present DeepFRET, an automated standalone solution based on deep learning, where the only crucial human intervention in transiting from raw microscope images to histogram of biomolecule behavior, is a user-adjustable quality threshold. Integrating all standard features of smFRET analysis, DeepFRET will consequently output common kinetic information metrics for biomolecules. We validated the utility of DeepFRET by performing quantitative analysis on simulated, ground truth, data and real smFRET data. The accuracy of classification by DeepFRET outperformed human operators and current commonly used hard threshold and reached >95% precision accuracy only requiring a fraction of the time (<1% as compared to human operators) on ground truth data. Its flawless and rapid operation on real data demonstrates its wide applicability. This level of classification was achieved without any preprocessing or parameter setting by human operators, demonstrating DeepFRET’s capacity to objectively quantify biomolecular dynamics. The provided a standalone executable based on open source code capitalises on the widespread adaptation of machine learning and may contribute to the effort of benchmarking smFRET for structural biology insights.

Download Full-text

Towards Replacing Late Gadolinium Enhancement with Artificial Intelligence Virtual Native Enhancement for Gadolinium-Free Cardiovascular Magnetic Resonance Tissue Characterization in Hypertrophic Cardiomyopathy

Circulation ◽

10.1161/circulationaha.121.054432 ◽

2021 ◽

Author(s):

Qiang Zhang ◽

Matthew K. Burrage ◽

Elena Lukaschuk ◽

Mayooran Shanmuganathan ◽

Iulia A. Popescu ◽

...

Keyword(s):

Artificial Intelligence ◽

Cardiovascular Magnetic Resonance ◽

Deep Learning ◽

Late Gadolinium Enhancement ◽

Hypertrophic Cardiomyopathy ◽

Magnetic Resonance ◽

Image Quality ◽

Tissue Characterization ◽

Wilcoxon Test ◽

Gadolinium Enhancement

Background: Late gadolinium enhancement (LGE) cardiovascular magnetic resonance (CMR) imaging is the gold standard for non-invasive myocardial tissue characterization, but requires intravenous contrast agent administration. It is highly desired to develop a contrast-agent-free technology to replace LGE for faster and cheaper CMR scans. Methods: A CMR Virtual Native Enhancement (VNE) imaging technology was developed using artificial intelligence. The deep learning model for generating VNE uses multiple streams of convolutional neural networks to exploit and enhance the existing signals in native T1-maps (pixel-wise maps of tissue T1 relaxation times) and cine imaging of cardiac structure and function, presenting them as LGE-equivalent images. The VNE generator was trained using generative adversarial networks. This technology was first developed on CMR datasets from the multi-center Hypertrophic Cardiomyopathy Registry (HCMR), using HCM as an exemplar. The datasets were randomized into two independent groups for deep learning training and testing. The test data of VNE and LGE were scored and contoured by experienced human operators to assess image quality, visuospatial agreement and myocardial lesion burden quantification. Image quality was compared using nonparametric Wilcoxon test. Intra- and inter-observer agreement was analyzed using intraclass correlation coefficients (ICC). Lesion quantification by VNE and LGE were compared using linear regression and ICC. Results: 1348 HCM patients provided 4093 triplets of matched T1-maps, cines, and LGE datasets. After randomization and data quality control, 2695 datasets were used for VNE method development, and 345 for independent testing. VNE had significantly better image quality than LGE, as assessed by 4 operators (n=345 datasets, p<0.001, Wilcoxon test). VNE revealed characteristic HCM lesions in high visuospatial agreement with LGE. In 121 patients (n=326 datasets), VNE correlated with LGE in detecting and quantifying both hyper-intensity myocardial lesions (r=0.77-0.79, ICC=0.77-0.87; p<0.001) and intermediate-intensity lesions (r=0.70-0.76, ICC=0.82-0.85; p<0.001). The native CMR images (cine plus T1-map) required for VNE can be acquired within 15 minutes. Producing a VNE image takes less than one second. Conclusions: VNE is a new CMR technology that resembles conventional LGE, without the need for contrast administration. VNE achieved high agreement with LGE in the distribution and quantification of lesions, with significantly better image quality.

Download Full-text