scholarly journals Automated Model-Predictive Design of Synthetic Promoters to Control Transcriptional Profiles in Bacteria

2021 ◽  
Author(s):  
Travis L La Fleur ◽  
Ayaan Hossain ◽  
Howard M Salis

Transcription rates are regulated by the interactions between RNA polymerase, sigma factor, and promoter DNA sequences in bacteria. However, it remains unclear how non-canonical sequence motifs collectively control transcription rates. Here, we combined massively parallel assays, biophysics, and machine learning to develop a 346-parameter model that predicts site-specific transcription initiation rates for any σ70 promoter sequence, validated across 17396 bacterial promoters with diverse sequences. We applied the model to predict genetic context effects, design σ70 promoters with desired transcription rates, and identify undesired promoters inside engineered genetic systems. The model provides a biophysical basis for understanding gene regulation in natural genetic systems and precise transcriptional control for engineering synthetic genetic systems.

Author(s):  
Masahiko Imashimizu ◽  
Yuji Tokunaga ◽  
Ariel Afek ◽  
Hiroki Takahashi ◽  
Nobuo Shimamoto ◽  
...  

In the process of transcription initiation by RNA polymerase, promoter DNA sequences affect multiple reaction pathways determining the productivity of transcription. However, the question of how the molecular mechanism of transcription initiation depends on sequence properties of promoter DNA remains poorly understood. Here, combining the statistical mechanical approach with high-throughput sequencing results, we characterize abortive transcription and pausing during transcription initiation by Escherichia coli RNA polymerase at a genome-wide level. Our results suggest that initially transcribed sequences enriched with thymine bases represent the signal inducing abortive transcription. On the other hand, certain repetitive sequence elements broadly embedded in promoter regions constitute the signal inducing pausing. Both signals decrease the productivity of transcription initiation. Based on solution NMR and in vitro transcription measurements, we also suggest that repetitive sequence elements of promoter DNA modulate the rigidity of its double-stranded form, which profoundly influences the reaction coordinates of the productive initiation via pausing.


Author(s):  
Drake Jensen ◽  
Eric A. Galburt

The fitness of an individual bacterial cell is highly dependent upon temporally tuning gene expression levels when subjected to different environmental cues. Kinetic regulation of transcription initiation is a key step in modulating the levels of transcribed genes to promote bacterial survival. The initiation phase encompasses the binding of RNA polymerase (RNAP) to promoter DNA and a series of coupled protein-DNA conformational changes prior to entry into processive elongation. The time required to complete the initiation phase can vary by orders of magnitude and is ultimately dictated by the DNA sequence of the promoter. In this review, we aim to provide the required background to understand how promoter sequence motifs may affect initiation kinetics during promoter recognition and binding, subsequent conformational changes which lead to DNA opening around the transcription start site, and promoter escape. By calculating the steady-state flux of RNA production as a function of these effects, we illustrate that the presence/absence of a consensus promoter motif cannot be used in isolation to make conclusions regarding promoter strength. Instead, the entire series of linked, sequence-dependent structural transitions must be considered holistically. Finally, we describe how individual transcription factors take advantage of the broad distribution of sequence-dependent basal kinetics to either increase or decrease RNA flux.


2021 ◽  
Vol 118 (30) ◽  
pp. e2021941118
Author(s):  
Dylan M. Plaskon ◽  
Kate L. Henderson ◽  
Lindsey C. Felth ◽  
Cristen M. Molzahn ◽  
Claire Evensen ◽  
...  

Transcription initiation is highly regulated by promoter sequence, transcription factors, and ligands. All known transcription inhibitors, an important class of antibiotics, act in initiation. To understand regulation and inhibition, the biophysical mechanisms of formation and stabilization of the “open” promoter complex (OC), of synthesis of a short RNA–DNA hybrid upon nucleotide addition, and of escape of RNA polymerase (RNAP) from the promoter must be understood. We previously found that RNAP forms three different OC with λPR promoter DNA. The 37 °C RNAP-λPR OC (RPO) is very stable. At lower temperatures, RPO is less stable and in equilibrium with an intermediate OC (I3). Here, we report step-by-step rapid quench-flow kinetic data for initiation and growth of the RNA–DNA hybrid at 25 and 37 °C that yield rate constants for each step of productive nucleotide addition. Analyzed together, with previously published data at 19 °C, our results reveal that I3 and not RPO is the productive initiation complex at all temperatures. From the strong variations of rate constants and activation energies and entropies for individual steps of hybrid extension, we deduce that contacts of RNAP with the bubble strands are disrupted stepwise as the hybrid grows and translocates. Stepwise disruption of RNAP-strand contacts is accompanied by stepwise bubble collapse, base stacking, and duplex formation, as the hybrid extends to a 9-mer prior to disruption of upstream DNA–RNAP contacts and escape of RNAP from the promoter.


2005 ◽  
Vol 187 (19) ◽  
pp. 6762-6769 ◽  
Author(s):  
Olga V. Kourennaia ◽  
Laura Tsujikawa ◽  
Pieter L. deHaseth

ABSTRACT Upon the exposure of Escherichia coli to high temperature (heat shock), cellular levels of the transcription factor σ32 rise greatly, resulting in the increased formation of the σ32 holoenzyme, which is capable of transcription initiation at heat shock promoters. Higher levels of heat shock proteins render the cell better able to cope with the effects of higher temperatures. To conduct structure-function studies on σ32 in vivo, we have carried out site-directed mutagenesis and employed a previously developed system involving σ32 expression from one plasmid and a β-galactosidase reporter gene driven by the σ32-dependent groE promoter on another in order to monitor the effects of single amino acid substitutions on σ32 activity. It was found that the recognition of the −35 region involves similar amino acid residues in regions 4.2 of E. coli σ32 and σ70. Three conserved amino acids in region 2.3 of σ32 were found to be only marginally important in determining activity in vivo. Differences between σ32 and σ70 in the effects of mutation in region 2.4 on the activities of the two sigma factors are consistent with the pronounced differences between both the amino acid sequences in this region and the recognized promoter DNA sequences.


Biomolecules ◽  
2020 ◽  
Vol 10 (9) ◽  
pp. 1299 ◽  
Author(s):  
Masahiko Imashimizu ◽  
Yuji Tokunaga ◽  
Ariel Afek ◽  
Hiroki Takahashi ◽  
Nobuo Shimamoto ◽  
...  

In the process of transcription initiation by RNA polymerase, promoter DNA sequences affect multiple reaction pathways determining the productivity of transcription. However, the question of how the molecular mechanism of transcription initiation depends on the sequence properties of promoter DNA remains poorly understood. Here, combining the statistical mechanical approach with high-throughput sequencing results, we characterize abortive transcription and pausing during transcription initiation by Escherichia coli RNA polymerase at a genome-wide level. Our results suggest that initially transcribed sequences, when enriched with thymine bases, contain the signal for inducing abortive transcription, whereas certain repetitive sequence elements embedded in promoter regions constitute the signal for inducing pausing. Both signals decrease the productivity of transcription initiation. Based on solution NMR and in vitro transcription measurements, we suggest that repetitive sequence elements within the promoter DNA modulate the nonlocal base pair stability of its double-stranded form. This stability profoundly influences the reaction coordinates of the productive initiation via pausing.


2021 ◽  
Author(s):  
Jamie Y Auxillos ◽  
Samuel J Haynes ◽  
Abhishek Jain ◽  
Clemence Alibert ◽  
Weronika Danecka ◽  
...  

Genes are commonly abstracted into a coding sequence and cis-regulatory elements (CREs), such as promoter and terminator regions, and short sequence motifs within these regions. Modern cloning techniques allow easy assembly of synthetic genetic constructs from discrete cis-regulatory modules. However, it is unclear how much the contributions of CREs to gene expression depend on other CREs in the host gene. Using budding yeast, we probe the extent of composability, or independent effects, of distinct CREs. We confirm that the quantitative effect of a terminator on gene expression depends on both promoter and coding sequence. We then explore whether individual cis-regulatory motifs within terminator regions display similar context dependence, focusing on putative regulatory motifs inferred using transcriptome-wide datasets of mRNA decay. We construct a library of diverse reporter genes, consisting of different combinations of motifs within various terminator contexts, paired with different promoters, to test the extent of composability. Our results show that the effect of a motif on RNA abundance depends both on its host terminator, and also on the associated promoter sequence. Consequently, this emphasises the need for improved motif inference algorithms that include both local and global context effects, which in turn could aid researchers in the accurate use of diverse CREs for the engineering of synthetic genetic constructs.


2021 ◽  
Author(s):  
Shao-Pei Chou ◽  
Adriana K Alexander ◽  
Edward J Rice ◽  
Lauren A Choate ◽  
Paula E Cohen ◽  
...  

How DNA sequence affects the dynamics and position of RNA Polymerase II during transcription remains poorly understood. Here we used naturally occurring genetic variation in F1 hybrid mice to explore how DNA sequence differences affect the genome-wide distribution of Pol II. We measured the position and orientation of Pol II in eight organs collected from heterozygous F1 hybrid mice using ChRO-seq. Our data revealed a strong genetic basis for the precise coordinates of transcription initiation and promoter proximal pause, which was composed of both existing and novel DNA sequence motifs, and allowed us to redefine molecular models of core transcriptional processes. Our results implicate the strength of base pairing between A-T or G-C dinucleotides as key determinants to the position of Pol II initiation and pause. We reveal substantial differences in the position of transcription termination, which frequently do not affect the composition of the mature mRNA. Finally, we identified frequent, organ-specific changes in transcription that affect mRNA and ncRNA expression across broad genomic domains. Collectively, we reveal how DNA sequences shape core transcriptional processes at single nucleotide resolution in mammals.


Author(s):  
David P. Bazett-Jones ◽  
Mark L. Brown

A multisubunit RNA polymerase enzyme is ultimately responsible for transcription initiation and elongation of RNA, but recognition of the proper start site by the enzyme is regulated by general, temporal and gene-specific trans-factors interacting at promoter and enhancer DNA sequences. To understand the molecular mechanisms which precisely regulate the transcription initiation event, it is crucial to elucidate the structure of the transcription factor/DNA complexes involved. Electron spectroscopic imaging (ESI) provides the opportunity to visualize individual DNA molecules. Enhancement of DNA contrast with ESI is accomplished by imaging with electrons that have interacted with inner shell electrons of phosphorus in the DNA backbone. Phosphorus detection at this intermediately high level of resolution (≈lnm) permits selective imaging of the DNA, to determine whether the protein factors compact, bend or wrap the DNA. Simultaneously, mass analysis and phosphorus content can be measured quantitatively, using adjacent DNA or tobacco mosaic virus (TMV) as mass and phosphorus standards. These two parameters provide stoichiometric information relating the ratios of protein:DNA content.


Author(s):  
Yanrong Ji ◽  
Zhihan Zhou ◽  
Han Liu ◽  
Ramana V Davuluri

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Francisco Macías ◽  
Raquel Afonso-Lehmann ◽  
Patricia E. Carreira ◽  
M. Carmen Thomas

Abstract Background Trypanosomatid genomes are colonized by active and inactive mobile DNA elements, such as LINE, SINE-like, SIDER and DIRE retrotransposons. These elements all share a 77-nucleotide-long sequence at their 5′ ends, known as Pr77, which activates transcription, thereby generating abundant unspliced and translatable transcripts. However, transcription factors that mediates this process have still not been reported. Methods TATA-binding protein (TBP) and small nuclear RNA-activating protein 50 kDa (SNAP50) recombinant proteins and specific antibodies raised against them were generated. Protein capture assay, electrophoretic mobility-shift assays (EMSA) and EMSA competition assays carried out using these proteins and nuclear proteins of the parasite together to specific DNA sequences used as probes allowed detecting direct interaction of these transcription factors to Pr77 sequence. Results This study identified TBP and SNAP50 as part of the DNA-protein complex formed by the Pr77 promoter sequence and nuclear proteins of Trypanosoma cruzi. TBP establishes direct and specific contact with the Pr77 sequence, where the DPE and DPE downstream regions are docking sites with preferential binding. TBP binds cooperatively (Hill coefficient = 1.67) to Pr77 and to both strands of the Pr77 sequence, while the conformation of this highly structured sequence is not involved in TBP binding. Direct binding of SNAP50 to the Pr77 sequence is weak and may be mediated by protein–protein interactions through other trypanosomatid nuclear proteins. Conclusions Identification of the transcription factors that mediate Pr77 transcription may help to elucidate how these retrotransposons are mobilized within the trypanosomatid genomes and their roles in gene regulation processes in this human parasite. Graphic abstract


Sign in / Sign up

Export Citation Format

Share Document