Genome Functional Annotation across Species using Deep Convolutional Neural Networks

ABSTRACTDeep neural network application is today a skyrocketing field in many disciplinary domains. In genomics the development of deep neural networks is expected to revolutionize current practice. Several approaches relying on convolutional neural networks have been developed to associate short genomic sequences with a functional role such as promoters, enhancers or protein binding sites along genomes. These approaches rely on the generation of sequences batches with known annotations for learning purpose. While they show good performance to predict annotations from a test subset of these batches, they usually perform poorly when applied genome-wide.In this study, we address this issue and propose an optimal strategy to train convolutional neural networks for this specific application. We use as a case study transcription start sites and show that a model trained on one organism can be used to predict transcription start sites in a different specie. This cross-species application of convolutional neural networks trained with genomic sequence data provides a new technique to annotate any genome from previously existing annotations in related species. It also provides a way to determine whether the sequence patterns recognized by chromatin associated proteins in different species are conserved or not.

Download Full-text

Genome annotation across species using deep convolutional neural networks

PeerJ Computer Science ◽

10.7717/peerj-cs.278 ◽

2020 ◽

Vol 6 ◽

pp. e278 ◽

Cited By ~ 2

Author(s):

Ghazaleh Khodabandelou ◽

Etienne Routhier ◽

Julien Mozziconacci

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Functional Role ◽

Genomic Sequences ◽

Sequence Motifs ◽

Deep Convolutional Neural Networks ◽

Genome Wide ◽

Associated Proteins ◽

Using Data ◽

Genome Annotations

Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations. These sets are then split into learning, test and validation sets in order to train the networks. While the obtained networks perform well on validation sets, they often perform poorly when applied on whole genomes in which the ratio of positive over negative examples can be very different than in the training set. We here address this issue by assessing the genome-wide performance of networks trained with sets exhibiting different ratios of positive to negative examples. As a case study, we use sequences encompassing gene starts from the RefGene database as positive examples and random genomic sequences as negative examples. We then demonstrate that models trained using data from one organism can be used to predict gene-start sites in a related species, when using training sets providing good genome-wide performance. This cross-species application of convolutional neural networks provides a new way to annotate any genome from existing high-quality annotations in a related reference species. It also provides a way to determine whether the sequence motifs recognised by chromatin-associated proteins in different species are conserved or not.

Download Full-text

Faculty Opinions recommendation of Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.737976542.793577284 ◽

2020 ◽

Author(s):

Erich Bornberg-Bauer ◽

Daniel Dowling

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Genomic Sequence ◽

Mrna Abundance ◽

Deep Convolutional Neural Networks

Download Full-text

Faculty Opinions recommendation of Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.737976542.793586931 ◽

2021 ◽

Author(s):

Roderic Guigo ◽

Manuel Muñoz Aguirre

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Genomic Sequence ◽

Mrna Abundance ◽

Deep Convolutional Neural Networks

Download Full-text

Deep convolutional neural networks for predicting leukemia-related transcription factor binding sites from DNA sequence data

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2020.103976 ◽

2020 ◽

Vol 199 ◽

pp. 103976 ◽

Cited By ~ 1

Author(s):

Jian He ◽

Xuemei Pu ◽

Menglong Li ◽

Chuan Li ◽

Yanzhi Guo

Keyword(s):

Neural Networks ◽

Transcription Factor ◽

Dna Sequence ◽

Convolutional Neural Networks ◽

Binding Sites ◽

Sequence Data ◽

Transcription Factor Binding Sites ◽

Deep Convolutional Neural Networks ◽

Dna Sequence Data ◽

Related Transcription Factor

Download Full-text

Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks

Cell Reports ◽

10.1016/j.celrep.2020.107663 ◽

2020 ◽

Vol 31 (7) ◽

pp. 107663 ◽

Cited By ~ 14

Author(s):

Vikram Agarwal ◽

Jay Shendure

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Genomic Sequence ◽

Mrna Abundance ◽

Deep Convolutional Neural Networks

Download Full-text

Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks

10.1101/416685 ◽

2018 ◽

Cited By ~ 4

Author(s):

Vikram Agarwal ◽

Jay Shendure

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Transcriptional Activity ◽

Genomic Sequence ◽

Specific Gene ◽

Grand Challenge ◽

Mrna Levels ◽

Primary Sequence ◽

Deep Convolutional Neural Networks ◽

Cpg Dinucleotides

SUMMARYAlgorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here we sought to apply deep convolutional neural networks towards this goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, which we call Xpresso, more than doubles the accuracy of alternative sequence-based models, and isolates rules as predictive as models relying on ChIP-seq data. Xpresso recapitulates genome-wide patterns of transcriptional activity and predicts the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose the accurate prediction of cell type-specific gene expression based solely on primary sequence as a grand challenge for the field.

Download Full-text

Detecting adaptive introgression in human evolution using convolutional neural networks

10.1101/2020.09.18.301069 ◽

2020 ◽

Cited By ~ 2

Author(s):

Graham Gower ◽

Pablo Iáñez Picazo ◽

Matteo Fumagalli ◽

Fernando Racimo

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Genomic Sequence ◽

Sequence Data ◽

Simulated Data ◽

Alternative Methods ◽

Adaptive Introgression ◽

Donor Population ◽

Human Genomic ◽

Related Population

AbstractStudies in a variety of species have shown evidence for positively selected variants introduced into one population via introgression from another, distantly related population—a process known as adaptive introgression. However, there are few explicit frameworks for jointly modelling introgression and positive selection, in order to detect these variants using genomic sequence data. Here, we develop an approach based on convolutional neural networks (CNNs). CNNs do not require the specification of an analytical model of allele frequency dynamics, and have outperformed alternative methods for classification and parameter estimation tasks in various areas of population genetics. Thus, they are potentially well suited to the identification of adaptive introgression. Using simulations, we trained CNNs on genotype matrices derived from genomes sampled from the donor population, the recipient population and a related non-introgressed population, in order to distinguish regions of the genome evolving under adaptive introgression from those evolving neutrally or experiencing selective sweeps. Our CNN architecture exhibits 95% accuracy on simulated data, even when the genomes are unphased, and accuracy decreases only moderately in the presence of heterosis. As a proof of concept, we applied our trained CNNs to human genomic datasets—both phased and unphased—to detect candidates for adaptive introgression that shaped our evolutionary history.

Download Full-text

Detecting adaptive introgression in human evolution using convolutional neural networks

eLife ◽

10.7554/elife.64669 ◽

2021 ◽

Vol 10 ◽

Author(s):

Graham Gower ◽

Pablo Iáñez Picazo ◽

Matteo Fumagalli ◽

Fernando Racimo

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Genomic Sequence ◽

Sequence Data ◽

Simulated Data ◽

Alternative Methods ◽

Adaptive Introgression ◽

Donor Population ◽

Human Genomic ◽

Related Population

Studies in a variety of species have shown evidence for positively selected variants introduced into a population via introgression from another, distantly related population - a process known as adaptive introgression. However, there are few explicit frameworks for jointly modelling introgression and positive selection, in order to detect these variants using genomic sequence data. Here, we develop an approach based on convolutional neural networks (CNNs). CNNs do not require the specification of an analytical model of allele frequency dynamics, and have outperformed alternative methods for classification and parameter estimation tasks in various areas of population genetics. Thus, they are potentially well suited to the identification of adaptive introgression. Using simulations, we trained CNNs on genotype matrices derived from genomes sampled from the donor population, the recipient population and a related non-introgressed population, in order to distinguish regions of the genome evolving under adaptive introgression from those evolving neutrally or experiencing selective sweeps. Our CNN architecture exhibits 95% accuracy on simulated data, even when the genomes are unphased, and accuracy decreases only moderately in the presence of heterosis. As a proof of concept, we applied our trained CNNs to human genomic datasets - both phased and unphased - to detect candidates for adaptive introgression that shaped our evolutionary history.

Download Full-text

AUTOMATIC DIAGNOSIS OF BREAST CANCER IN HISTOLOGY IMAGES USING DEEP CONVOLUTIONAL NEURAL NETWORKS

KỶ YẾU HỘI NGHỊ KHOA HỌC CÔNG NGHỆ QUỐC GIA LẦN THỨ XI NGHIÊN CỨU CƠ BẢN VÀ ỨNG DỤNG CÔNG NGHỆ THÔNG TIN ◽

10.15625/vap.2018.0009 ◽

2018 ◽

Author(s):

Hung Le Minh ◽

Manh Mai Van ◽

Toan Tran Dinh ◽

Tot Tran Dac ◽

Tran Van Lang

Keyword(s):

Breast Cancer ◽

Neural Networks ◽

Convolutional Neural Networks ◽

Deep Convolutional Neural Networks ◽

Automatic Diagnosis

Download Full-text

CNN-based Classification of Degraded Images

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-028 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 28-1-28-7 ◽

Cited By ~ 1

Author(s):

Kazuki Endo ◽

Masayuki Tanaka ◽

Masatoshi Okutomi

Keyword(s):

Neural Networks ◽

Image Restoration ◽

Image Classification ◽

Convolutional Neural Networks ◽

Deep Convolutional Neural Networks ◽

Alternative Approach ◽

Degraded Image ◽

Degraded Images ◽

Straightforward Approach

Classification of degraded images is very important in practice because images are usually degraded by compression, noise, blurring, etc. Nevertheless, most of the research in image classification only focuses on clean images without any degradation. Some papers have already proposed deep convolutional neural networks composed of an image restoration network and a classification network to classify degraded images. This paper proposes an alternative approach in which we use a degraded image and an additional degradation parameter for classification. The proposed classification network has two inputs which are the degraded image and the degradation parameter. The estimation network of degradation parameters is also incorporated if degradation parameters of degraded images are unknown. The experimental results showed that the proposed method outperforms a straightforward approach where the classification network is trained with degraded images only.

Download Full-text