scholarly journals Genome Functional Annotation across Species using Deep Convolutional Neural Networks

2018 ◽  
Author(s):  
Ghazaleh Khodabandelou ◽  
Etienne Routhier ◽  
Julien Mozziconacci

ABSTRACTDeep neural network application is today a skyrocketing field in many disciplinary domains. In genomics the development of deep neural networks is expected to revolutionize current practice. Several approaches relying on convolutional neural networks have been developed to associate short genomic sequences with a functional role such as promoters, enhancers or protein binding sites along genomes. These approaches rely on the generation of sequences batches with known annotations for learning purpose. While they show good performance to predict annotations from a test subset of these batches, they usually perform poorly when applied genome-wide.In this study, we address this issue and propose an optimal strategy to train convolutional neural networks for this specific application. We use as a case study transcription start sites and show that a model trained on one organism can be used to predict transcription start sites in a different specie. This cross-species application of convolutional neural networks trained with genomic sequence data provides a new technique to annotate any genome from previously existing annotations in related species. It also provides a way to determine whether the sequence patterns recognized by chromatin associated proteins in different species are conserved or not.

2020 ◽  
Vol 6 ◽  
pp. e278 ◽  
Author(s):  
Ghazaleh Khodabandelou ◽  
Etienne Routhier ◽  
Julien Mozziconacci

Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations. These sets are then split into learning, test and validation sets in order to train the networks. While the obtained networks perform well on validation sets, they often perform poorly when applied on whole genomes in which the ratio of positive over negative examples can be very different than in the training set. We here address this issue by assessing the genome-wide performance of networks trained with sets exhibiting different ratios of positive to negative examples. As a case study, we use sequences encompassing gene starts from the RefGene database as positive examples and random genomic sequences as negative examples. We then demonstrate that models trained using data from one organism can be used to predict gene-start sites in a related species, when using training sets providing good genome-wide performance. This cross-species application of convolutional neural networks provides a new way to annotate any genome from existing high-quality annotations in a related reference species. It also provides a way to determine whether the sequence motifs recognised by chromatin-associated proteins in different species are conserved or not.


2018 ◽  
Author(s):  
Vikram Agarwal ◽  
Jay Shendure

SUMMARYAlgorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here we sought to apply deep convolutional neural networks towards this goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, which we call Xpresso, more than doubles the accuracy of alternative sequence-based models, and isolates rules as predictive as models relying on ChIP-seq data. Xpresso recapitulates genome-wide patterns of transcriptional activity and predicts the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose the accurate prediction of cell type-specific gene expression based solely on primary sequence as a grand challenge for the field.


Author(s):  
Graham Gower ◽  
Pablo Iáñez Picazo ◽  
Matteo Fumagalli ◽  
Fernando Racimo

AbstractStudies in a variety of species have shown evidence for positively selected variants introduced into one population via introgression from another, distantly related population—a process known as adaptive introgression. However, there are few explicit frameworks for jointly modelling introgression and positive selection, in order to detect these variants using genomic sequence data. Here, we develop an approach based on convolutional neural networks (CNNs). CNNs do not require the specification of an analytical model of allele frequency dynamics, and have outperformed alternative methods for classification and parameter estimation tasks in various areas of population genetics. Thus, they are potentially well suited to the identification of adaptive introgression. Using simulations, we trained CNNs on genotype matrices derived from genomes sampled from the donor population, the recipient population and a related non-introgressed population, in order to distinguish regions of the genome evolving under adaptive introgression from those evolving neutrally or experiencing selective sweeps. Our CNN architecture exhibits 95% accuracy on simulated data, even when the genomes are unphased, and accuracy decreases only moderately in the presence of heterosis. As a proof of concept, we applied our trained CNNs to human genomic datasets—both phased and unphased—to detect candidates for adaptive introgression that shaped our evolutionary history.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Graham Gower ◽  
Pablo Iáñez Picazo ◽  
Matteo Fumagalli ◽  
Fernando Racimo

Studies in a variety of species have shown evidence for positively selected variants introduced into a population via introgression from another, distantly related population - a process known as adaptive introgression. However, there are few explicit frameworks for jointly modelling introgression and positive selection, in order to detect these variants using genomic sequence data. Here, we develop an approach based on convolutional neural networks (CNNs). CNNs do not require the specification of an analytical model of allele frequency dynamics, and have outperformed alternative methods for classification and parameter estimation tasks in various areas of population genetics. Thus, they are potentially well suited to the identification of adaptive introgression. Using simulations, we trained CNNs on genotype matrices derived from genomes sampled from the donor population, the recipient population and a related non-introgressed population, in order to distinguish regions of the genome evolving under adaptive introgression from those evolving neutrally or experiencing selective sweeps. Our CNN architecture exhibits 95% accuracy on simulated data, even when the genomes are unphased, and accuracy decreases only moderately in the presence of heterosis. As a proof of concept, we applied our trained CNNs to human genomic datasets - both phased and unphased - to detect candidates for adaptive introgression that shaped our evolutionary history.


2020 ◽  
Vol 2020 (10) ◽  
pp. 28-1-28-7 ◽  
Author(s):  
Kazuki Endo ◽  
Masayuki Tanaka ◽  
Masatoshi Okutomi

Classification of degraded images is very important in practice because images are usually degraded by compression, noise, blurring, etc. Nevertheless, most of the research in image classification only focuses on clean images without any degradation. Some papers have already proposed deep convolutional neural networks composed of an image restoration network and a classification network to classify degraded images. This paper proposes an alternative approach in which we use a degraded image and an additional degradation parameter for classification. The proposed classification network has two inputs which are the degraded image and the degradation parameter. The estimation network of degradation parameters is also incorporated if degradation parameters of degraded images are unknown. The experimental results showed that the proposed method outperforms a straightforward approach where the classification network is trained with degraded images only.


Sign in / Sign up

Export Citation Format

Share Document