scholarly journals Genome annotation across species using deep convolutional neural networks

2020 ◽  
Vol 6 ◽  
pp. e278 ◽  
Author(s):  
Ghazaleh Khodabandelou ◽  
Etienne Routhier ◽  
Julien Mozziconacci

Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations. These sets are then split into learning, test and validation sets in order to train the networks. While the obtained networks perform well on validation sets, they often perform poorly when applied on whole genomes in which the ratio of positive over negative examples can be very different than in the training set. We here address this issue by assessing the genome-wide performance of networks trained with sets exhibiting different ratios of positive to negative examples. As a case study, we use sequences encompassing gene starts from the RefGene database as positive examples and random genomic sequences as negative examples. We then demonstrate that models trained using data from one organism can be used to predict gene-start sites in a related species, when using training sets providing good genome-wide performance. This cross-species application of convolutional neural networks provides a new way to annotate any genome from existing high-quality annotations in a related reference species. It also provides a way to determine whether the sequence motifs recognised by chromatin-associated proteins in different species are conserved or not.

2018 ◽  
Author(s):  
Ghazaleh Khodabandelou ◽  
Etienne Routhier ◽  
Julien Mozziconacci

ABSTRACTDeep neural network application is today a skyrocketing field in many disciplinary domains. In genomics the development of deep neural networks is expected to revolutionize current practice. Several approaches relying on convolutional neural networks have been developed to associate short genomic sequences with a functional role such as promoters, enhancers or protein binding sites along genomes. These approaches rely on the generation of sequences batches with known annotations for learning purpose. While they show good performance to predict annotations from a test subset of these batches, they usually perform poorly when applied genome-wide.In this study, we address this issue and propose an optimal strategy to train convolutional neural networks for this specific application. We use as a case study transcription start sites and show that a model trained on one organism can be used to predict transcription start sites in a different specie. This cross-species application of convolutional neural networks trained with genomic sequence data provides a new technique to annotate any genome from previously existing annotations in related species. It also provides a way to determine whether the sequence patterns recognized by chromatin associated proteins in different species are conserved or not.


2019 ◽  
Author(s):  
Amelia J. Solon ◽  
Vernon J. Lawhern ◽  
Jonathan Touryan ◽  
Jonathan R. McDaniel ◽  
Anthony J. Ries ◽  
...  

AbstractDeep convolutional neural networks (CNN) have previously been shown to be useful tools for signal decoding and analysis in a variety of complex domains, such as image processing and speech recognition. By learning from large amounts of data, the representations encoded by these deep networks are often invariant to moderate changes in the underlying feature spaces. Recently, we proposed a CNN architecture that could be applied to electroencephalogram (EEG) decoding and analysis. In this article, we train our CNN model using data from prior experiments in order to later decode the P300 evoked response from an unseen, hold-out experiment. We analyze the CNN output as a function of the underlying variability in the P300 response and demonstrate that the CNN output is sensitive to the experiment-induced changes in the neural response. We then assess the utility of our approach as a means of improving the overall signal-to-noise ratio in the EEG record. Finally, we show an example of how CNN-based decoding can be applied to the analysis of complex data.


2020 ◽  
Author(s):  
Zijun Zhang ◽  
Christopher Y. Park ◽  
Chandra L. Theesfeld ◽  
Olga G. Troyanskaya

AbstractConvolutional neural networks (CNNs) have become a standard for analysis of biological sequences. Tuning of network architectures is essential for CNN’s performance, yet it requires substantial knowledge of machine learning and commitment of time and effort. This process thus imposes a major barrier to broad and effective application of modern deep learning in genomics. Here, we present AMBER, a fully automated framework to efficiently design and apply CNNs for genomic sequences. AMBER designs optimal models for user-specified biological questions through the state-of-the-art Neural Architecture Search (NAS). We applied AMBER to the task of modelling genomic regulatory features and demonstrated that the predictions of the AMBER-designed model are significantly more accurate than the equivalent baseline non-NAS models and match or even exceed published expert-designed models. Interpretation of AMBER architecture search revealed its design principles of utilizing the full space of computational operations for accurately modelling genomic sequences. Furthermore, we illustrated the use of AMBER to accurately discover functional genomic variants in allele-specific binding and disease heritability enrichment. AMBER provides an efficient automated method for designing accurate deep learning models in genomics.


2020 ◽  
Vol 2020 (10) ◽  
pp. 28-1-28-7 ◽  
Author(s):  
Kazuki Endo ◽  
Masayuki Tanaka ◽  
Masatoshi Okutomi

Classification of degraded images is very important in practice because images are usually degraded by compression, noise, blurring, etc. Nevertheless, most of the research in image classification only focuses on clean images without any degradation. Some papers have already proposed deep convolutional neural networks composed of an image restoration network and a classification network to classify degraded images. This paper proposes an alternative approach in which we use a degraded image and an additional degradation parameter for classification. The proposed classification network has two inputs which are the degraded image and the degradation parameter. The estimation network of degradation parameters is also incorporated if degradation parameters of degraded images are unknown. The experimental results showed that the proposed method outperforms a straightforward approach where the classification network is trained with degraded images only.


2019 ◽  
Vol 277 ◽  
pp. 02024 ◽  
Author(s):  
Lincan Li ◽  
Tong Jia ◽  
Tianqi Meng ◽  
Yizhe Liu

In this paper, an accurate two-stage deep learning method is proposed to detect vulnerable plaques in ultrasonic images of cardiovascular. Firstly, a Fully Convonutional Neural Network (FCN) named U-Net is used to segment the original Intravascular Optical Coherence Tomography (IVOCT) cardiovascular images. We experiment on different threshold values to find the best threshold for removing noise and background in the original images. Secondly, a modified Faster RCNN is adopted to do precise detection. The modified Faster R-CNN utilize six-scale anchors (122,162,322,642,1282,2562) instead of the conventional one scale or three scale approaches. First, we present three problems in cardiovascular vulnerable plaque diagnosis, then we demonstrate how our method solve these problems. The proposed method in this paper apply deep convolutional neural networks to the whole diagnostic procedure. Test results show the Recall rate, Precision rate, IoU (Intersection-over-Union) rate and Total score are 0.94, 0.885, 0.913 and 0.913 respectively, higher than the 1st team of CCCV2017 Cardiovascular OCT Vulnerable Plaque Detection Challenge. AP of the designed Faster RCNN is 83.4%, higher than conventional approaches which use one-scale or three-scale anchors. These results demonstrate the superior performance of our proposed method and the power of deep learning approaches in diagnose cardiovascular vulnerable plaques.


Sign in / Sign up

Export Citation Format

Share Document