scholarly journals A comprehensive fitness landscape model reveals the evolutionary history and future evolvability of eukaryotic cis-regulatory DNA sequences

2021 ◽  
Author(s):  
Eeshit Dhaval Vaishnav ◽  
Carl G. de Boer ◽  
Moran Yassour ◽  
Jennifer Molinet ◽  
Lin Fan ◽  
...  

Mutations in non-coding cis-regulatory DNA sequences can alter gene expression, organismal phenotype, and fitness. Fitness landscapes, which map DNA sequence to organismal fitness, are a long-standing goal in biology, but have remained elusive because it is challenging to generalize accurately to the vast space of possible sequences using models built on measurements from a limited number of endogenous regulatory sequences. Here, we construct a sequence-to-expression model for such a landscape and use it to decipher principles of cis-regulatory evolution. Using tens of millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Sacccharomyces cerevisiae, we construct a deep transformer neural network model that generalizes with exceptional accuracy, and enables sequence design for gene expression engineering. Using our model, we predict and experimentally validate expression divergence under random genetic drift and strong selection weak mutation regimes, show that conflicting expression objectives in different environments constrain expression adaptation, and find that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for detecting selective constraint on gene expression using our model and natural sequence variation, and validate it using observed cis-regulatory diversity across 1,011 yeast strains, cross-species RNA-seq from three different clades, and measured expression-to-fitness curves. Finally, we develop a characterization of regulatory evolvability, use it to visualize fitness landscapes in two dimensions, discover evolvability archetypes, quantify the mutational robustness of individual sequences and highlight the mutational robustness of extant natural regulatory sequence populations. Our work provides a general framework that addresses key questions in the evolution of cis-regulatory sequences.

2018 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  

AbstractThe relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ~500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.


Author(s):  
A. Meera ◽  
Lalitha Rangarajan

Understanding how the regulation of gene networks is orchestrated is an important challenge for characterizing complex biological processes. The DNA sequences that comprise promoters do not provide much direct information about regulation. A substantial part of the regulation results from the interaction of transcription factors (TFs) with specific cis regulatory DNA sequences. These regulatory sequences are organized in a modular fashion, with each module (enhancer) containing one or more binding sites for a specific combination of TFs. In the present work, the authors have proposed to investigate the inter motif distance between the important motifs in the promoter sequences of citrate synthase of different mammals. The authors have used a new distance measure to compare the promoter sequences. Results reveal that there exists more similarity between organisms in the same chromosome.


Author(s):  
A. Meera ◽  
Lalitha Rangarajan

Understanding how the regulation of gene networks is orchestrated is an important challenge for characterizing complex biological processes. The DNA sequences that comprise promoters do not provide much direct information about regulation. A substantial part of the regulation results from the interaction of transcription factors (TFs) with specific cis regulatory DNA sequences. These regulatory sequences are organized in a modular fashion, with each module (enhancer) containing one or more binding sites for a specific combination of TFs. In the present work, the authors have proposed to investigate the inter motif distance between the important motifs in the promoter sequences of citrate synthase of different mammals. The authors have used a new distance measure to compare the promoter sequences. Results reveal that there exists more similarity between organisms in the same chromosome.


2021 ◽  
Author(s):  
Timothy T. Harden ◽  
Ben J. Vincent ◽  
Angela H. DePace

SUMMARYMost animal transcription factors are categorized as activators or repressors without specifying their mechanisms of action. Defining their specific roles is critical for deciphering the logic of transcriptional regulation and predicting the function of regulatory sequences. Here, we define the kinetic roles of three activating transcription factors in the Drosophila embryo—Zelda, Bicoid and Stat92E—by introducing their binding sites into theeven skippedstripe 2 enhancer and measuring transcriptional output with live imaging. We find that these transcription factors act on different subsets of kinetic parameters, and these subsets can change over the course of nuclear cycle (NC) 14. These transcription factors all increase the fraction of active nuclei. Zelda dramatically shortens the time interval between the start of NC 14 and initial activation, and Stat92E increases the duration of active transcription intervals throughout NC 14. Zelda also decreases the time intervals between instances of active transcription early in NC 14, while Stat92E does so later. Different transcription factors therefore play distinct kinetic roles in activating transcription; this has consequences for understanding both regulatory DNA sequences as well as the biochemical function of transcription factors.


2019 ◽  
Author(s):  
Anvita Gupta ◽  
Anshul Kundaje

AbstractTargeted optimizing of existing DNA sequences for useful properties, has the potential to enable several synthetic biology applications from modifying DNA to treat genetic disorders to designing regulatory elements to fine tune context-specific gene expression. Current approaches for targeted genome editing are largely based on prior biological knowledge or ad-hoc rules. Few if any machine learning approaches exist for targeted optimization of regulatory DNA sequences.Here, we propose a novel generative neural network architecture for targeted DNA sequence editing – the EDA architecture – consisting of an encoder, decoder, and analyzer. We showcase the use of EDA to optimize regulatory DNA sequences to bind to the transcription factor SPI1. Compared to other state-of-the-art approaches such as a textual variational autoencoder and rule-based editing, EDA significantly improves predicted binding of SPI1 of genomic sequences with the minimal set of edits. We also use EDA to design regulatory elements with optimized grammars of CREB1 binding sites that can tune reporter expression levels as measured by massively parallel reporter assays (MPRA). We analyze the properties of the binding sites in the edited sequences and find patterns that are consistent with previously reported grammatical rules which tie gene expression to CRE binding site density, spacing and affinity.


1990 ◽  
Vol 259 (4) ◽  
pp. L185-L197
Author(s):  
B. R. Stripp ◽  
J. A. Whitsett ◽  
D. L. Lattier

Gene transcription is regulated by the formation of protein-DNA complexes that influence the rate of specific initiation of transcription by RNA polymerase. Recent experimental advances allowing the identification of cis regulatory sequences that specify the binding of trans acting protein factors have made significant contributions to our understanding of the mechanistic complexities of transcriptional regulation. These methodologies have prompted the use of similar strategies to elucidate transcriptional control mechanisms involved in the tissue specific and developmental regulation of pulmonary surfactant protein gene expression. The purpose of this review is to describe various methodologies by which molecular biologists identify and subsequently assay regions of nucleic acids presumed to be integral in gene regulation at the level of transcription. It is well established that genes encoding surfactant proteins are subject to regulation by hormones, cytokines, and a variety of biologically active reagents. Perhaps future studies utilizing molecular tools outlined in this review will be valuable in identification of DNA sequences and protein factors required for the regulation of lung surfactant genes.


PLoS ONE ◽  
2019 ◽  
Vol 14 (6) ◽  
pp. e0218073 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  

1991 ◽  
Vol 96 (2) ◽  
pp. 162-167 ◽  
Author(s):  
Chuan-Kui Jiang ◽  
Howard S Epstein ◽  
Marjana Tomic ◽  
Irwin M Freedberg ◽  
Miroslav Blumenberg

Sign in / Sign up

Export Citation Format

Share Document