scholarly journals Transcriptomic learning for digital pathology

2019 ◽  
Author(s):  
Benoît Schmauch ◽  
Alberto Romagnoni ◽  
Elodie Pronier ◽  
Charlie Saillard ◽  
Pascale Maillé ◽  
...  

Deep learning methods for digital pathology analysis have proved an effective way to address multiple clinical questions, from diagnosis to prognosis and even to prediction of treatment outcomes. They have also recently been used to predict gene mutations from pathology images, but no comprehensive evaluation of their potential for extracting molecular features from histology slides has yet been performed. We propose a novel approach based on the integration of multiple data modes, and show that our deep learning model, HE2RNA, can be trained to systematically predict RNA-Seq profiles from whole-slide images alone, without the need for expert annotation. HE2RNA is interpretable by design, opening up new opportunities for virtual staining. In fact, it provides virtual spatialization of gene expression, as validated by double-staining on an independent dataset. Moreover, the transcriptomic representation learned by HE2RNA can be transferred to improve predictive performance for other tasks, particularly for small datasets. As an example of a task with direct clinical impact, we studied the prediction of microsatellite instability from hematoxylin & eosin stained images and our results show that better performance can be achieved in this setting.

2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
S Rao ◽  
Y Li ◽  
R Ramakrishnan ◽  
A Hassaine ◽  
D Canoy ◽  
...  

Abstract Background/Introduction Predicting incident heart failure has been challenging. Deep learning models when applied to rich electronic health records (EHR) offer some theoretical advantages. However, empirical evidence for their superior performance is limited and they remain commonly uninterpretable, hampering their wider use in medical practice. Purpose We developed a deep learning framework for more accurate and yet interpretable prediction of incident heart failure. Methods We used longitudinally linked EHR from practices across England, involving 100,071 patients, 13% of whom had been diagnosed with incident heart failure during follow-up. We investigated the predictive performance of a novel transformer deep learning model, “Transformer for Heart Failure” (BEHRT-HF), and validated it using both an external held-out dataset and an internal five-fold cross-validation mechanism using area under receiver operating characteristic (AUROC) and area under the precision recall curve (AUPRC). Predictor groups included all outpatient and inpatient diagnoses within their temporal context, medications, age, and calendar year for each encounter. By treating diagnoses as anchors, we alternatively removed different modalities (ablation study) to understand the importance of individual modalities to the performance of incident heart failure prediction. Using perturbation-based techniques, we investigated the importance of associations between selected predictors and heart failure to improve model interpretability. Results BEHRT-HF achieved high accuracy with AUROC 0.932 and AUPRC 0.695 for external validation, and AUROC 0.933 (95% CI: 0.928, 0.938) and AUPRC 0.700 (95% CI: 0.682, 0.718) for internal validation. Compared to the state-of-the-art recurrent deep learning model, RETAIN-EX, BEHRT-HF outperformed it by 0.079 and 0.030 in terms of AUPRC and AUROC. Ablation study showed that medications were strong predictors, and calendar year was more important than age. Utilising perturbation, we identified and ranked the intensity of associations between diagnoses and heart failure. For instance, the method showed that established risk factors including myocardial infarction, atrial fibrillation and flutter, and hypertension all strongly associated with the heart failure prediction. Additionally, when population was stratified into different age groups, incident occurrence of a given disease had generally a higher contribution to heart failure prediction in younger ages than when diagnosed later in life. Conclusions Our state-of-the-art deep learning framework outperforms the predictive performance of existing models whilst enabling a data-driven way of exploring the relative contribution of a range of risk factors in the context of other temporal information. Funding Acknowledgement Type of funding source: Private grant(s) and/or Sponsorship. Main funding source(s): National Institute for Health Research, Oxford Martin School, Oxford Biomedical Research Centre


2018 ◽  
Vol 19 (9) ◽  
pp. 2817 ◽  
Author(s):  
Haixia Long ◽  
Bo Liao ◽  
Xingyu Xu ◽  
Jialiang Yang

Protein hydroxylation is one type of post-translational modifications (PTMs) playing critical roles in human diseases. It is known that protein sequence contains many uncharacterized residues of proline and lysine. The question that needs to be answered is: which residue can be hydroxylated, and which one cannot. The answer will not only help understand the mechanism of hydroxylation but can also benefit the development of new drugs. In this paper, we proposed a novel approach for predicting hydroxylation using a hybrid deep learning model integrating the convolutional neural network (CNN) and long short-term memory network (LSTM). We employed a pseudo amino acid composition (PseAAC) method to construct valid benchmark datasets based on a sliding window strategy and used the position-specific scoring matrix (PSSM) to represent samples as inputs to the deep learning model. In addition, we compared our method with popular predictors including CNN, iHyd-PseAAC, and iHyd-PseCp. The results for 5-fold cross-validations all demonstrated that our method significantly outperforms the other methods in prediction accuracy.


Author(s):  
Antonios Alexos ◽  
Sotirios Chatzis

In this paper we address the understanding of the problem, of why a deep learning model decides that an individual is eligible for a loan or not. Here we propose a novel approach for inferring, which attributes matter the most, for making a decision in each specific individual case. Specifically we leverage concepts from neural attention to devise a novel feature wise attention mechanism. As we show, using real world datasets, our approach offers unique insights into the importance of various features, by producing a decision explanation for each specific loan case. At the same time, we observe that our novel mechanism, generates decisions which are much closer to the decisions generated by human experts, compared to the existent competitors.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Jian Xing ◽  
Miao Yu ◽  
Shupeng Wang ◽  
Yaru Zhang ◽  
Yu Ding

Several studies have shown that the phone number and call behavior generated by a phone call reveal the type of phone call. By analyzing the phone number rules and call behavior patterns, we can recognize the fraudulent phone call. The success of this recognition heavily depends on the particular set of features that are used to construct the classifier. Since these features are human-labor engineered, any change introduced to the telephone fraud can render these carefully constructed features ineffective. In this paper, we show that we can automate the feature engineering process and, thus, automatically recognize the fraudulent phone call by applying our proposed novel approach based on deep learning. We design and construct a new classifier based on Call Detail Records (CDR) for fraudulent phone call recognition and find that the performance achieved by our deep learning-based approach outperforms competing methods. Experimental results demonstrate the effectiveness of the proposed approach. Specifically, in our accuracy evaluation, the obtained accuracy exceeds 99%, and the most performant deep learning model is 4.7% more accurate than the state-of-the-art recognition model on average. Furthermore, we show that our deep learning approach is very stable in real-world environments, and the implicit features automatically learned by our approach are far more resilient to dynamic changes of a fraudulent phone number and its call behavior over time. We conclude that the ability to automatically construct the most relevant phone number features and call behavior features and perform accurate fraudulent phone call recognition makes our deep learning-based approach a precise, efficient, and robust technique for fraudulent phone call recognition.


Author(s):  
M. Knott ◽  
R. Groenendijk

Abstract. This research is the first to apply MeshCNN – a deep learning model that is specifically designed for 3D triangular meshes – in the photogrammetry domain. We highlight the challenges that arise when applying a mesh-based deep learning model to a photogrammetric mesh, especially w.r.t. data set properties. We provide solutions on how to prepare a remotely sensed mesh for a machine learning task. The most notable pre-processing step proposed is a novel application of the Breadth-First Search algorithm for chunking a large mesh into computable pieces. Furthermore, this work extends MeshCNN such that photometric features based on the mesh texture are considered in addition to the geometric information. Experiments show that including color information improves the predictive performance of the model by a large margin. Besides, experimental results indicate that segmentation performance could be advanced substantially with the introduction of a high-quality benchmark for semantic segmentation on meshes.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zhaorui Zuo ◽  
Penglei Wang ◽  
Xiaowei Chen ◽  
Li Tian ◽  
Hui Ge ◽  
...  

Abstract Background One of the major challenges in precision medicine is accurate prediction of individual patient’s response to drugs. A great number of computational methods have been developed to predict compounds activity using genomic profiles or chemical structures, but more exploration is yet to be done to combine genetic mutation, gene expression, and cheminformatics in one machine learning model. Results We presented here a novel deep-learning model that integrates gene expression, genetic mutation, and chemical structure of compounds in a multi-task convolutional architecture. We applied our model to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets. We selected relevant cancer-related genes based on oncology genetics database and L1000 landmark genes, and used their expression and mutations as genomic features in model training. We obtain the cheminformatics features for compounds from PubChem or ChEMBL. Our finding is that combining gene expression, genetic mutation, and cheminformatics features greatly enhances the predictive performance. Conclusion We implemented an extended Graph Neural Network for molecular graphs and Convolutional Neural Network for gene features. With the employment of multi-tasking and self-attention functions to monitor the similarity between compounds, our model outperforms recently published methods using the same training and testing datasets.


Author(s):  
David J. Alouani ◽  
Roshani R.P. Rajapaksha ◽  
Mehul Jani ◽  
Daniel D. Rhoads ◽  
Navid Sadri

Real time polymerase chain reaction (RT-PCR) is widely used to diagnose human pathogens. RT-PCR data is traditionally analyzed by estimating the threshold cycle (CT) at which the fluorescence signal produced by emission of a probe crosses a baseline level. Current models used to estimate the CT value are based on approximations that do not adequately account for the stochastic variation of the fluorescence signal that is detected during RT-PCR. Less common deviations become more apparent as the sample size increases, as is the case in the current SARS-CoV-2 pandemic. In this work we employ a method independent of CT value to interpret to RT-PCR data. In this novel approach we built and trained a deep learning model, qPCRdeepNet, to analyze the fluorescent readings obtained during RT-PCR. We describe how this model can be deployed as a quality assurance tool to monitor results interpretation in real-time. The model’s performance with the TaqPath COVID19 Combo Kit, widely used for SARS-CoV-2 detection, is described. This model can be applied broadly for the primary interpretation of RT-PCR assays and potentially replace the CT interpretive paradigm.


2021 ◽  
Author(s):  
Kristopher D McCombe ◽  
Stephanie G Craig ◽  
Amélie Viratham Pulsawatdi ◽  
Javier I Quezada-Marín ◽  
Matthew Hagan ◽  
...  

The growth of digital pathology over the past decade has opened new research pathways and insights in cancer prediction and prognosis. In particular, there has been a surge in deep learning and computer vision techniques to analyse digital images. Common practice in this area is to use image pre-processing and augmentation to prevent bias and overfitting, creating a more robust deep learning model. Herein we introduce HistoClean; user-friendly, graphical user interface that brings together multiple image processing modules into one easy to use toolkit. In this study, we utilise HistoClean to pre-process images for a simple convolutional neural network used to detect stromal maturity, improving the accuracy of the model at a tile, region of interest, and patient level. HistoClean is free and open-source and can be downloaded from the Github repository here: https://github.com/HistoCleanQUB/HistoClean.


Sign in / Sign up

Export Citation Format

Share Document