scholarly journals Projection layers improve deep learning models of regulatory DNA function

F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 151 ◽  
Author(s):  
Alex Hawkins-Hooker ◽  
Henry Kenlay ◽  
John E. Reid

With the increasing application of deep learning methods to the modelling of regulatory DNA sequences has come an interest in exploring what types of architecture are best suited to the domain. Networks designed to predict many functional characteristics of noncoding DNA in a multitask framework have to recognise a large number of motifs and as a result benefit from large numbers of convolutional filters in the first layer. The use of large first layers in turn motivates an exploration of strategies for addressing the sparsity of output and possibility for overfitting that result. To this end we propose the use of a dimensionality-reducing linear projection layer after the initial motif-recognising convolutions. In experiments with a reduced version of the DeepSEA dataset we find that inserting this layer in combination with dropout into convolutional and convolutional-recurrent architectures can improve predictive performance across a range of first layer sizes. We further validate our approach by incorporating the projection layer into a new convolutional-recurrent architecture which achieves state of the art performance on the full DeepSEA dataset. Analysis of the learned projection weights shows that the inclusion of this layer simplifies the network’s internal representation of the occurrence of motifs, notably by projecting features representing forward and reverse-complement motifs to similar positions in the lower dimensional feature space output by the layer.

2018 ◽  
Author(s):  
Alex Hawkins-Hooker ◽  
Henry Kenlay ◽  
John Reid

AbstractWith the increasing application of deep learning methods to the modelling of regulatory DNA sequences has come an interest in exploring what types of architecture are best suited to the domain. Networks designed to predict many functional characteristics of noncoding DNA in a multitask framework have to recognise a large number of motifs and as a result benefit from large numbers of convolutional filters in the first layer. The use of large first layers in turn motivates an exploration of strategies for addressing the sparsity of output and possibility for overfitting that result. To this end we propose the use of a dimensionality-reducing linear projection layer after the initial motif-recognising convolutions. In experiments with a reduced version of the DeepSEA dataset we find that inserting this layer in combination with dropout into convolutional and convolutional-recurrent architectures can improve predictive performance across a range of first layer sizes. We further validate our approach by incorporating the projection layer into a new convolutional-recurrent architecture which achieves state of the art performance on the full DeepSEA dataset. Analysis of the learned projection weights shows that the inclusion of this layer simplifies the network’s internal representation of the occurrence of motifs, notably by projecting features representing forward and reverse-complement motifs to similar positions in the lower dimensional feature space output by the layer.


PLoS ONE ◽  
2019 ◽  
Vol 14 (6) ◽  
pp. e0218073 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  

1991 ◽  
Vol 96 (2) ◽  
pp. 162-167 ◽  
Author(s):  
Chuan-Kui Jiang ◽  
Howard S Epstein ◽  
Marjana Tomic ◽  
Irwin M Freedberg ◽  
Miroslav Blumenberg

2018 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  

AbstractThe relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ~500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.


2018 ◽  
Author(s):  
George E. Gentsch ◽  
Thomas Spruce ◽  
Nick D. L. Owens ◽  
James C. Smith

ABSTRACTEmbryonic development yields many different cell types in response to just a few families of inductive signals. The property of a signal-receiving cell that determines how it responds to such signals, including the activation of cell type-specific genes, is known as its competence. Here, we show how maternal factors modify chromatin to specify initial competence in the frog Xenopus tropicalis. We identified the earliest engaged regulatory DNA sequences, and inferred from them critical activators of the zygotic genome. Of these, we showed that the pioneering activity of the maternal pluripotency factors Pou5f3 and Sox3 predefines competence for germ layer formation by extensively remodeling compacted chromatin before the onset of signaling. The remodeling includes the opening and marking of thousands of regulatory elements, extensive chromatin looping, and the co-recruitment of signal-mediating transcription factors. Our work identifies significant developmental principles that inform our understanding of how pluripotent stem cells interpret inductive signals.


2004 ◽  
Vol 40 ◽  
pp. 121-136 ◽  
Author(s):  
Bruce Gottlieb ◽  
Lenore K Beitel ◽  
Jianhui Wu ◽  
Youssef A Elhaji ◽  
Mark Trifiro

The androgen receptor (AR) protein regulates transcription of certain genes. Usually this depends upon a central DNA-binding domain that permits the binding of androgen–AR complexes to regulatory DNA sequences near or in a target gene. The AR also has a C-terminal ligand-binding domain and an Nterminal transcription modulatory domain. These N- and C-terminal domains interact directly, and with co-regulatory, non-receptor proteins, to exert precise control over a gene’s transcription rate. The precise roles of these proteins are active research areas. Severe X-linked AR gene (AR) mutations cause complete androgen insensitivity, mild ones impair virilization with or without infertility, and moderate ones yield a wide phenotypic spectrum sometimes among siblings. Different phenotype expressivity may reflect variability of ARinteractive proteins. Mutations occur throughout the AR but are concentrated in specific areas of the gene known as hot spots. A number of these mutations of somatic origin are associated with prostate cancer. N-terminal polyglutamine (polyGln) tract expansion reduces AR transactivation, and when there are more than 38 glutamine residues it causes spinobulbar muscular atrophy, a motor neuron disease, due to a gain of function. Variations in polyGln tract length have been associated as risk factors with prostate, breast, uterine, endometrial and colorectal cancer, as well as male infertility.


1991 ◽  
Vol 11 (10) ◽  
pp. 5154-5163
Author(s):  
D D Barker ◽  
H Wu ◽  
S Hartung ◽  
M Breindl ◽  
R Jaenisch

The Mov13 mouse strain carries a mutation in the alpha 1(I) procollagen gene which is due to the insertion of a Moloney murine leukemia provirus into the first intron. This insertion results in the de novo methylation of the provirus and flanking DNA, the alteration of chromatin structure, and the transcriptional inactivity of the collagen promoter. To address the mechanism of mutagenesis, we reintroduced a cloned and therefore demethylated version of the Mov13 mutant allele into mouse fibroblasts. The transfected gene was not transcribed, indicating that the transcriptional defect was not due to the hypermethylation. Rather, this result strongly suggests that the mutation is due to the displacement or disruption of cis-acting regulatory DNA sequences within the first intron. We also constructed a Mov13 variant allele containing a single long terminal repeat instead of the whole provirus. This construct also failed to express mRNA, indicating that the Mov13 mutation does not revert by provirus excision as has been observed for other retrovirus-induced mutations.


Sign in / Sign up

Export Citation Format

Share Document