scholarly journals Targeted optimization of regulatory DNA sequences with neural editing architectures

2019 ◽  
Author(s):  
Anvita Gupta ◽  
Anshul Kundaje

AbstractTargeted optimizing of existing DNA sequences for useful properties, has the potential to enable several synthetic biology applications from modifying DNA to treat genetic disorders to designing regulatory elements to fine tune context-specific gene expression. Current approaches for targeted genome editing are largely based on prior biological knowledge or ad-hoc rules. Few if any machine learning approaches exist for targeted optimization of regulatory DNA sequences.Here, we propose a novel generative neural network architecture for targeted DNA sequence editing – the EDA architecture – consisting of an encoder, decoder, and analyzer. We showcase the use of EDA to optimize regulatory DNA sequences to bind to the transcription factor SPI1. Compared to other state-of-the-art approaches such as a textual variational autoencoder and rule-based editing, EDA significantly improves predicted binding of SPI1 of genomic sequences with the minimal set of edits. We also use EDA to design regulatory elements with optimized grammars of CREB1 binding sites that can tune reporter expression levels as measured by massively parallel reporter assays (MPRA). We analyze the properties of the binding sites in the edited sequences and find patterns that are consistent with previously reported grammatical rules which tie gene expression to CRE binding site density, spacing and affinity.

2007 ◽  
Vol 4 (2) ◽  
pp. 1-23
Author(s):  
Amitava Karmaker ◽  
Kihoon Yoon ◽  
Mark Doderer ◽  
Russell Kruzelock ◽  
Stephen Kwek

Summary Revealing the complex interaction between trans- and cis-regulatory elements and identifying these potential binding sites are fundamental problems in understanding gene expression. The progresses in ChIP-chip technology facilitate identifying DNA sequences that are recognized by a specific transcription factor. However, protein-DNA binding is a necessary, but not sufficient, condition for transcription regulation. We need to demonstrate that their gene expression levels are correlated to further confirm regulatory relationship. Here, instead of using a linear correlation coefficient, we used a non-linear function that seems to better capture possible regulatory relationships. By analyzing tissue-specific gene expression profiles of human and mouse, we delineate a list of pairs of transcription factor and gene with highly correlated expression levels, which may have regulatory relationships. Using two closely-related species (human and mouse), we perform comparative genome analysis to cross-validate the quality of our prediction. Our findings are confirmed by matching publicly available TFBS databases (like TRANFAC and ConSite) and by reviewing biological literature. For example, according to our analysis, 80% and 85.71% of the targets genes associated with E2F5 and RELB transcription factors have the corresponding known binding sites. We also substantiated our results on some oncogenes with the biomedical literature. Moreover, we performed further analysis on them and found that BCR and DEK may be regulated by some common transcription factors. Similar results for BTG1, FCGR2B and LCK genes were also reported.


2018 ◽  
Author(s):  
George E. Gentsch ◽  
Thomas Spruce ◽  
Nick D. L. Owens ◽  
James C. Smith

ABSTRACTEmbryonic development yields many different cell types in response to just a few families of inductive signals. The property of a signal-receiving cell that determines how it responds to such signals, including the activation of cell type-specific genes, is known as its competence. Here, we show how maternal factors modify chromatin to specify initial competence in the frog Xenopus tropicalis. We identified the earliest engaged regulatory DNA sequences, and inferred from them critical activators of the zygotic genome. Of these, we showed that the pioneering activity of the maternal pluripotency factors Pou5f3 and Sox3 predefines competence for germ layer formation by extensively remodeling compacted chromatin before the onset of signaling. The remodeling includes the opening and marking of thousands of regulatory elements, extensive chromatin looping, and the co-recruitment of signal-mediating transcription factors. Our work identifies significant developmental principles that inform our understanding of how pluripotent stem cells interpret inductive signals.


2016 ◽  
Author(s):  
Monther Alhamdoosh ◽  
Dianhui Wang

Understanding protein-DNA binding affinity is still a mystery for many transcription factors (TFs). Although several approaches have been proposed in the literature to model the DNA-binding specificity of TFs, they still have some limitations. Most of the methods require a cut-off threshold in order to classify a K-mer as a binding site (BS) and finding such a threshold is usually done by handcraft rather than a science. Some other approaches use a prior knowledge on the biological context of regulatory elements in the genome along with machine learning algorithms to build classifier models for TFBSs. Noticeably, these methods deliberately select the training and testing datasets so that they are very separable. Hence, the current methods do not actually capture the TF-DNA binding relationship. In this paper, we present a threshold-free framework based on a novel ensemble learning algorithm in order to locate TFBSs in DNA sequences. Our proposed approach creates TF-specific classifier models using genome-wide DNA-binding experiments and a prior biological knowledge on DNA sequences and TF binding preferences. Systematic background filtering algorithms are utilized to remove non-functional K-mers from training and testing datasets. To reduce the complexity of classifier models, a fast feature selection algorithm is employed. Finally, the created classifier models are used to scan new DNA sequences and identify potential binding sites. The analysis results show that our proposed approach is able to identify novel binding sites in the Saccharomyces cerevisiae [email protected], [email protected]://homepage.cs.latrobe.edu.au/dwang/DNNESCANweb


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
George E. Gentsch ◽  
Thomas Spruce ◽  
Nick D. L. Owens ◽  
James C. Smith

Abstract Embryonic development yields many different cell types in response to just a few families of inductive signals. The property of signal-receiving cells that determines how they respond to inductive signals is known as competence, and it differs in different cell types. Here, we explore the ways in which maternal factors modify chromatin to specify initial competence in the frog Xenopus tropicalis. We identify early-engaged regulatory DNA sequences, and infer from them critical activators of the zygotic genome. Of these, we show that the pioneering activity of the maternal pluripotency factors Pou5f3 and Sox3 determines competence for germ layer formation by extensively remodelling compacted chromatin before the onset of inductive signalling. This remodelling includes the opening and marking of thousands of regulatory elements, extensive chromatin looping, and the co-recruitment of signal-mediating transcription factors. Our work identifies significant developmental principles that inform our understanding of how pluripotent stem cells interpret inductive signals.


Fagopyrum ◽  
2018 ◽  
Vol 35 (1) ◽  
pp. 5-17 ◽  
Author(s):  
Upasna Chettry ◽  
Lashaihun Dohtdong ◽  
N. K. Chrungoo

Multiple sequence alignment of 5’UTR of SSP genes from accessions of Fagopyrum esculentumrevealed the invariant nature of sequences with the transcription start site at P761and TATA box located -30bp upstream the TSS. Other cis-elements identified in the sequences included the legumin box (-581, -524, -184, -135, -91), the -131 prolamin box, DOF element (-718, -649, -540,-432, -272,-225, -128) and CAAT box (-692, -530, -475, -411, -282, -168, -54). Other elements identified included those involved in abscisic acid signallingviz., ABI3 at P-470,-95,-68,RAV1 at P-694and -543and AGL15 at P-671. A comparative analysis of regulatory elements of SSP gene promoters of distantly related species the presence of five cis-regulatory elements viz. TATA BOX, E-BOX, RY- element, CAAT box and the Endosperm box, which interplay in seed specific SSP gene expression. Other modulators influencing seed specific gene expression detected in the sequences included the  ABA-responsive elements ABI3, RAV1 and AGL15 which play an integral role in seed maturation. Identification of potential nucleosome binding sites in SSP gene promoters of Cicer arietinum, Brassica napus, B. campestris, Vicia faba, and Pisum sativumat positions 78, 635, 195, 112 and 152 respectively surmises the spatial fine tuning of SSP gene transcriptional regulation in these species. On the other hand, absence of nucleosome binding sites in the promoters of Fagopyrum esculentum, Zea mays, Avena sativa, Triticum aestivum and Oryza sativamay indicate relatively easier access of transcription factors to the proximal promoter, thereby providing higher level of gene expression.


2021 ◽  
Author(s):  
Eeshit Dhaval Vaishnav ◽  
Carl G. de Boer ◽  
Moran Yassour ◽  
Jennifer Molinet ◽  
Lin Fan ◽  
...  

Mutations in non-coding cis-regulatory DNA sequences can alter gene expression, organismal phenotype, and fitness. Fitness landscapes, which map DNA sequence to organismal fitness, are a long-standing goal in biology, but have remained elusive because it is challenging to generalize accurately to the vast space of possible sequences using models built on measurements from a limited number of endogenous regulatory sequences. Here, we construct a sequence-to-expression model for such a landscape and use it to decipher principles of cis-regulatory evolution. Using tens of millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Sacccharomyces cerevisiae, we construct a deep transformer neural network model that generalizes with exceptional accuracy, and enables sequence design for gene expression engineering. Using our model, we predict and experimentally validate expression divergence under random genetic drift and strong selection weak mutation regimes, show that conflicting expression objectives in different environments constrain expression adaptation, and find that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for detecting selective constraint on gene expression using our model and natural sequence variation, and validate it using observed cis-regulatory diversity across 1,011 yeast strains, cross-species RNA-seq from three different clades, and measured expression-to-fitness curves. Finally, we develop a characterization of regulatory evolvability, use it to visualize fitness landscapes in two dimensions, discover evolvability archetypes, quantify the mutational robustness of individual sequences and highlight the mutational robustness of extant natural regulatory sequence populations. Our work provides a general framework that addresses key questions in the evolution of cis-regulatory sequences.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
José L Ruiz ◽  
Lisa C Ranford-Cartwright ◽  
Elena Gómez-Díaz

Abstract Anopheles gambiae mosquitoes are primary human malaria vectors, but we know very little about their mechanisms of transcriptional regulation. We profiled chromatin accessibility by the assay for transposase-accessible chromatin by sequencing (ATAC-seq) in laboratory-reared A. gambiae mosquitoes experimentally infected with the human malaria parasite Plasmodium falciparum. By integrating ATAC-seq, RNA-seq and ChIP-seq data, we showed a positive correlation between accessibility at promoters and introns, gene expression and active histone marks. By comparing expression and chromatin structure patterns in different tissues, we were able to infer cis-regulatory elements controlling tissue-specific gene expression and to predict the in vivo binding sites of relevant transcription factors. The ATAC-seq assay also allowed the precise mapping of active regulatory regions, including novel transcription start sites and enhancers that were annotated to mosquito immune-related genes. Not only is this study important for advancing our understanding of mechanisms of transcriptional regulation in the mosquito vector of human malaria, but the information we produced also has great potential for developing new mosquito-control and anti-malaria strategies.


2021 ◽  
pp. 002203452110120
Author(s):  
C. Gluck ◽  
S. Min ◽  
A. Oyelakin ◽  
M. Che ◽  
E. Horeth ◽  
...  

The parotid, submandibular, and sublingual glands represent a trio of oral secretory glands whose primary function is to produce saliva, facilitate digestion of food, provide protection against microbes, and maintain oral health. While recent studies have begun to shed light on the global gene expression patterns and profiles of salivary glands, particularly those of mice, relatively little is known about the location and identity of transcriptional control elements. Here we have established the epigenomic landscape of the mouse submandibular salivary gland (SMG) by performing chromatin immunoprecipitation sequencing experiments for 4 key histone marks. Our analysis of the comprehensive SMG data sets and comparisons with those from other adult organs have identified critical enhancers and super-enhancers of the mouse SMG. By further integrating these findings with complementary RNA-sequencing based gene expression data, we have unearthed a number of molecular regulators such as members of the Fox family of transcription factors that are enriched and likely to be functionally relevant for SMG biology. Overall, our studies provide a powerful atlas of cis-regulatory elements that can be leveraged for better understanding the transcriptional control mechanisms of the mouse SMG, discovery of novel genetic switches, and modulating tissue-specific gene expression in a targeted fashion.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Julius Judd ◽  
Hayley Sanderson ◽  
Cédric Feschotte

Abstract Background Transposable elements are increasingly recognized as a source of cis-regulatory variation. Previous studies have revealed that transposons are often bound by transcription factors and some have been co-opted into functional enhancers regulating host gene expression. However, the process by which transposons mature into complex regulatory elements, like enhancers, remains poorly understood. To investigate this process, we examined the contribution of transposons to the cis-regulatory network controlling circadian gene expression in the mouse liver, a well-characterized network serving an important physiological function. Results ChIP-seq analyses reveal that transposons and other repeats contribute ~ 14% of the binding sites for core circadian regulators (CRs) including BMAL1, CLOCK, PER1/2, and CRY1/2, in the mouse liver. RSINE1, an abundant murine-specific SINE, is the only transposon family enriched for CR binding sites across all datasets. Sequence analyses and reporter assays reveal that the circadian regulatory activity of RSINE1 stems from the presence of imperfect CR binding motifs in the ancestral RSINE1 sequence. These motifs matured into canonical motifs through point mutations after transposition. Furthermore, maturation occurred preferentially within elements inserted in the proximity of ancestral CR binding sites. RSINE1 also acquired motifs that recruit nuclear receptors known to cooperate with CRs to regulate circadian gene expression specifically in the liver. Conclusions Our results suggest that the birth of enhancers from transposons is predicated both by the sequence of the transposon and by the cis-regulatory landscape surrounding their genomic integration site.


Sign in / Sign up

Export Citation Format

Share Document