Programmable cross-ribosome-binding sites to fine-tune the dynamic range of transcription factor-based biosensor

Abstract Currently, predictive translation tuning of regulatory elements to the desired output of transcription factor (TF)-based biosensors remains a challenge. The gene expression of a biosensor system must exhibit appropriate translation intensity, which is controlled by the ribosome-binding site (RBS), to achieve fine-tuning of its dynamic range (i.e. fold change in gene expression between the presence and absence of inducer) by adjusting the translation level of the TF and reporter. However, existing TF-based biosensors generally suffer from unpredictable dynamic range. Here, we elucidated the connections and partial mechanisms between RBS, translation level, protein folding and dynamic range, and presented a design platform that predictably tuned the dynamic range of biosensors based on deep learning of large datasets cross-RBSs (cRBSs). In doing so, a library containing 7053 designed cRBSs was divided into five sub-libraries through fluorescence-activated cell sorting to establish a classification model based on convolutional neural network in deep learning. Finally, the present work exhibited a powerful platform to enable predictable translation tuning of RBS to the dynamic range of biosensors.

Download Full-text

Fine-tuning biosensor dynamic range based on rational design of cross-ribosome-binding sites in bacteria

10.1101/2020.01.27.922302 ◽

2020 ◽

Author(s):

Nana Ding ◽

Shenghu Zhou ◽

Zhenqi Yuan ◽

Xiaojuan Zhang ◽

Jing Chen ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Deep Learning ◽

Translation Initiation ◽

Rational Design ◽

Dynamic Range ◽

Regulatory Elements ◽

Fine Tuning ◽

Initiation Rate ◽

Ribosome Binding

ABSTRACTCurrently, predictive translation tuning of regulatory elements to the desired output of transcription factor based biosensors remains a challenge. The gene expression of a biosensor system must exhibit appropriate translation intensity, which is controlled by the ribosome-binding site (RBS), to achieve fine-tuning of its dynamic range (i.e., fold change in gene expression between the presence and absence of inducer) by adjusting the translation initiation rate of the transcription factor and reporter. However, existing genetically encoded biosensors generally suffer from unpredictable translation tuning of regulatory elements to dynamic range. Here, we elucidated the connections and partial mechanisms between RBS, translation initiation rate, protein folding and dynamic range, and presented a rational design platform that predictably tuned the dynamic range of biosensors based on deep learning of large datasets cross-RBSs (cRBSs). A library containing 24,000 semi-rationally designed cRBSs was constructed using DNA microarray, and was divided into five sub-libraries through fluorescence-activated cell sorting. To explore the relationship between cRBSs and dynamic range, we established a classification model with the cRBSs and average dynamic range of five sub-libraries to accurately predict the dynamic range of biosensors based on convolutional neural network in deep learning. Thus, this work provides a powerful platform to enable predictable translation tuning of RBS to the dynamic range of biosensors.

Download Full-text

Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping

10.1101/2020.01.23.915405 ◽

2020 ◽

Cited By ~ 1

Author(s):

Simon Höllerer ◽

Laetitia Papaxanthos ◽

Anja Cathrin Gumpinger ◽

Katrin Fischer ◽

Christian Beisel ◽

...

Keyword(s):

Gene Expression ◽

Deep Learning ◽

Large Scale ◽

Dynamic Range ◽

Regulatory Elements ◽

Major Advance ◽

Single Experiment ◽

Quantitative Expression ◽

Very Large Datasets ◽

A Site

AbstractPredicting quantitative effects of gene regulatory elements (GREs) on gene expression is a longstanding challenge in biology. Machine learning models for gene expression prediction may be able to address this challenge, but they require experimental datasets that link large numbers of GREs to their quantitative effect. However, current methods to generate such datasets experimentally are either restricted to specific applications or limited by their technical complexity and error-proneness. Here we introduce DNA-based phenotypic recording as a widely applicable and practical approach to generate very large datasets linking GREs to quantitative functional readouts of high precision, temporal resolution, and dynamic range, solely relying on sequencing. This is enabled by a novel DNA architecture comprising a site-specific recombinase, a GRE that controls recombinase expression, and a DNA substrate modifiable by the recombinase. Both GRE sequence and substrate state can be determined in a single sequencing read, and the frequency of modified substrates amongst constructs harbouring the same GRE is a quantitative, internally normalized readout of this GRE’s effect on recombinase expression. Using next-generation sequencing, the quantitative expression effect of extremely large GRE sets can be assessed in parallel. As a proof of principle, we apply this approach to record translation kinetics of more than 300,000 bacterial ribosome binding sites (RBSs), collecting over 2.7 million sequence-function pairs in a single experiment. Further, we generalize from these large-scale datasets by a novel deep learning approach that combines ensembling and uncertainty modelling to predict the function of untested RBSs with high accuracy, substantially outperforming state-of-the-art methods. The combination of DNA-based phenotypic recording and deep learning represents a major advance in our ability to predict quantitative function from genetic sequence.

Download Full-text

Evolutionary Analysis of Transcriptional Regulation Mediated by Cdx2 in Rodents

10.1101/2021.03.01.433326 ◽

2021 ◽

Author(s):

Weizheng Liang ◽

Guipeng Li ◽

Huanhuan Cui ◽

Yukai Wang ◽

Wencheng Wei ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Amino Acid ◽

Dna Binding ◽

Phenotypic Diversity ◽

Embryonic Stem ◽

Regulatory Elements ◽

Evolutionary Analysis ◽

Protein Sequence Analysis ◽

Systematic Analysis

AbstractDifferences in gene expression, which can arise from divergence in cis-regulatory elements or alterations in transcription factors binding specificity, are one of the most important causes of phenotypic diversity during evolution. By protein sequence analysis, we observed high sequence conservation in the DNA binding domain (DBD) of the transcription factor Cdx2 across many vertebrates, whereas three amino acid changes were exclusively found in mouse Cdx2 (mCdx2), suggesting potential positive selection in the mouse lineage. Multi-omics analyses were then carried out to investigate the effects of these changes. Surprisingly, there were no significant functional differences between mCdx2 and its rat homologue (rCdx2), and none of the three amino acid changes had any impact on its function. Finally, we used rat-mouse allodiploid embryonic stem cells (RMES) to study the cis effects of Cdx2-mediated gene regulation between the two rodents. Interestingly, whereas Cdx2 binding is largely divergent between mouse and rat, the transcriptional effect induced by Cdx2 is conserved to a much larger extent.Author summaryOur study 1) represented a first systematic analysis of species-specific adaptation in DNA binding pattern of transcription factor. Although the mouse-specific amino acid changes did not manifest functional impact in our system, several explanations may account for it (See Discussion part for the detail); 2) represented a first study of cis-regulation between two reproductively isolated species by using a novel allodiploid system; 3) demonstrated a higher conservation of transcriptional output than that of DNA binding, suggesting the evolvability/plasticity of the latter; 4) finally provided a rich data resource for Cdx2 mediated regulation, including gene expression, chromatin accessibility and DNA binding etc.

Download Full-text

Early genetic responses in rat vascular tissue after simulated diving

Physiological Genomics ◽

10.1152/physiolgenomics.00073.2012 ◽

2012 ◽

Vol 44 (24) ◽

pp. 1201-1207 ◽

Cited By ~ 13

Author(s):

Ingrid Eftedal ◽

Arve Jørgensen ◽

Ragnhild Røsbjørgen ◽

Arnar Flatberg ◽

Alf O. Brubakk

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Plasminogen Activator ◽

Differential Gene Expression ◽

Vascular Function ◽

Plasminogen Activator Inhibitor ◽

Fine Tuning ◽

Plasminogen Activator Inhibitor 1 ◽

Differential Gene ◽

Activator Inhibitor

Diving causes a transient reduction of vascular function, but the mechanisms behind this are largely unknown. The aim of this study was therefore to analyze genetic reactions that may be involved in acute changes of vascular function in divers. Rats were exposed to 709 kPa of hyperbaric air (149 kPa Po2) for 50 min followed by postdive monitoring of vascular bubble formation and full genome microarray analysis of the aorta from diving rats ( n = 8) and unexposed controls ( n = 9). Upregulation of 23 genes was observed 1 h after simulated diving. The differential gene expression was characteristic of cellular responses to oxidative stress, with functions of upregulated genes including activation and fine-tuning of stress-responsive transcription, cytokine/cytokine receptor signaling, molecular chaperoning, and coagulation. By qRT-PCR, we verified increased transcription of neuron-derived orphan receptor-1 ( Nr4a3), plasminogen activator inhibitor 1 ( Serpine1), cytokine TWEAK receptor FN14 ( Tnfrsf12a), transcription factor class E basic helix-loop-helix protein 40 ( Bhlhe40), and adrenomedullin ( Adm). Hypoxia-inducible transcription factor HIF1 subunit HIF1-α was stabilized in the aorta 1 h after diving, and after 4 h there was a fivefold increase in total protein levels of the procoagulant plasminogen activator inhibitor 1 (PAI1) in blood plasma from diving rats. The study did not have sufficient power for individual assessment of effects of hyperoxia and decompression-induced bubbles on postdive gene expression. However, differential gene expression in rats without venous bubbles was similar to that of all the diving rats, indicating that elevated Po2 instigated the observed genetic reactions.

Download Full-text

Ship Classification in High-Resolution SAR Images Using Deep Learning of Small Datasets

Sensors ◽

10.3390/s18092929 ◽

2018 ◽

Vol 18 (9) ◽

pp. 2929 ◽

Cited By ~ 14

Author(s):

Yuanyuan Wang ◽

Chao Wang ◽

Hong Zhang

Keyword(s):

Deep Learning ◽

High Resolution ◽

Classification Accuracy ◽

Experimental Results ◽

Fine Tuning ◽

Classification Model ◽

Great Success ◽

Sar Images ◽

Convolutional Networks ◽

Ship Classification

With the capability to automatically learn discriminative features, deep learning has experienced great success in natural images but has rarely been explored for ship classification in high-resolution SAR images due to the training bottleneck caused by the small datasets. In this paper, convolutional neural networks (CNNs) are applied to ship classification by using SAR images with the small datasets. First, ship chips are constructed from high-resolution SAR images and split into training and validation datasets. Second, a ship classification model is constructed based on very deep convolutional networks (VGG). Then, VGG is pretrained via ImageNet, and fine tuning is utilized to train our model. Six scenes of COSMO-SkyMed images are used to evaluate our proposed model with regard to the classification accuracy. The experimental results reveal that (1) our proposed ship classification model trained by fine tuning achieves more than 95% average classification accuracy, even with 5-cross validation; (2) compared with other models, the ship classification model based on VGG16 achieves at least 2% higher accuracies for classification. These experimental results reveal the effectiveness of our proposed method.

Download Full-text

Classification of Kidney Cancer Data Using Cost-Sensitive Hybrid Deep Learning Approach

Symmetry ◽

10.3390/sym12010154 ◽

2020 ◽

Vol 12 (1) ◽

pp. 154 ◽

Cited By ~ 5

Author(s):

Ho Sun Shon ◽

Erdenebileg Batbaatar ◽

Kyoung Ok Kim ◽

Eun Jong Cha ◽

Kyung-Ah Kim

Keyword(s):

Gene Expression ◽

Data Mining ◽

Deep Learning ◽

Kidney Cancer ◽

Gene Expression Data ◽

Genomic Data ◽

Classification Model ◽

Expression Data ◽

Cancer Data ◽

Prognosis Prediction

Recently, large-scale bioinformatics and genomic data have been generated using advanced biotechnology methods, thus increasing the importance of analyzing such data. Numerous data mining methods have been developed to process genomic data in the field of bioinformatics. We extracted significant genes for the prognosis prediction of 1157 patients using gene expression data from patients with kidney cancer. We then proposed an end-to-end, cost-sensitive hybrid deep learning (COST-HDL) approach with a cost-sensitive loss function for classification tasks on imbalanced kidney cancer data. Here, we combined the deep symmetric auto encoder; the decoder is symmetric to the encoder in terms of layer structure, with reconstruction loss for non-linear feature extraction and neural network with balanced classification loss for prognosis prediction to address data imbalance problems. Combined clinical data from patients with kidney cancer and gene data were used to determine the optimal classification model and estimate classification accuracy by sample type, primary diagnosis, tumor stage, and vital status as risk factors representing the state of patients. Experimental results showed that the COST-HDL approach was more efficient with gene expression data for kidney cancer prognosis than other conventional machine learning and data mining techniques. These results could be applied to extract features from gene biomarkers for prognosis prediction of kidney cancer and prevention and early diagnosis.

Download Full-text

Insight into GATA1 transcriptional activity through interrogation of cis elements disrupted in human erythroid disorders

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1521754113 ◽

2016 ◽

Vol 113 (16) ◽

pp. 4434-4439 ◽

Cited By ~ 40

Author(s):

Aoi Wakabayashi ◽

Jacob C. Ulirsch ◽

Leif S. Ludwig ◽

Claudia Fiorini ◽

Makiko Yasuda ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Transcriptional Activity ◽

Regulatory Elements ◽

Binding Motif ◽

Targeted Disruption ◽

Altered Gene Expression ◽

Whole Exome ◽

Altered Gene ◽

Insight Into

Whole-exome sequencing has been incredibly successful in identifying causal genetic variants and has revealed a number of novel genes associated with blood and other diseases. One limitation of this approach is that it overlooks mutations in noncoding regulatory elements. Furthermore, the mechanisms by which mutations in transcriptional cis-regulatory elements result in disease remain poorly understood. Here we used CRISPR/Cas9 genome editing to interrogate three such elements harboring mutations in human erythroid disorders, which in all cases are predicted to disrupt a canonical binding motif for the hematopoietic transcription factor GATA1. Deletions of as few as two to four nucleotides resulted in a substantial decrease (>80%) in target gene expression. Isolated deletions of the canonical GATA1 binding motif completely abrogated binding of the cofactor TAL1, which binds to a separate motif. Having verified the functionality of these three GATA1 motifs, we demonstrate strong evolutionary conservation of GATA1 motifs in regulatory elements proximal to other genes implicated in erythroid disorders, and show that targeted disruption of such elements results in altered gene expression. By modeling transcription factor binding patterns, we show that multiple transcription factors are associated with erythroid gene expression, and have created predictive maps modeling putative disruptions of their binding sites at key regulatory elements. Our study provides insight into GATA1 transcriptional activity and may prove a useful resource for investigating the pathogenicity of noncoding variants in human erythroid disorders.

Download Full-text

Identifying Transcription Regulatory Elements in the Human and Mouse Genomes Using Tissue-specific Gene Expression Profiles

Journal of Integrative Bioinformatics ◽

10.1515/jib-2007-59 ◽

2007 ◽

Vol 4 (2) ◽

pp. 1-23

Author(s):

Amitava Karmaker ◽

Kihoon Yoon ◽

Mark Doderer ◽

Russell Kruzelock ◽

Stephen Kwek

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Transcription Factors ◽

Binding Sites ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Regulatory Elements ◽

Specific Gene ◽

Tissue Specific Gene ◽

Human And Mouse

Summary Revealing the complex interaction between trans- and cis-regulatory elements and identifying these potential binding sites are fundamental problems in understanding gene expression. The progresses in ChIP-chip technology facilitate identifying DNA sequences that are recognized by a specific transcription factor. However, protein-DNA binding is a necessary, but not sufficient, condition for transcription regulation. We need to demonstrate that their gene expression levels are correlated to further confirm regulatory relationship. Here, instead of using a linear correlation coefficient, we used a non-linear function that seems to better capture possible regulatory relationships. By analyzing tissue-specific gene expression profiles of human and mouse, we delineate a list of pairs of transcription factor and gene with highly correlated expression levels, which may have regulatory relationships. Using two closely-related species (human and mouse), we perform comparative genome analysis to cross-validate the quality of our prediction. Our findings are confirmed by matching publicly available TFBS databases (like TRANFAC and ConSite) and by reviewing biological literature. For example, according to our analysis, 80% and 85.71% of the targets genes associated with E2F5 and RELB transcription factors have the corresponding known binding sites. We also substantiated our results on some oncogenes with the biomedical literature. Moreover, we performed further analysis on them and found that BCR and DEK may be regulated by some common transcription factors. Similar results for BTG1, FCGR2B and LCK genes were also reported.

Download Full-text

Genomic Characterization of Endothelial Enhancers Reveals a Multifunctional Role for NR2F2 in Regulation of Arteriovenous Gene Expression

Circulation Research ◽

10.1161/circresaha.119.316075 ◽

2020 ◽

Vol 126 (7) ◽

pp. 875-888 ◽

Cited By ~ 6

Author(s):

Samir Sissaoui ◽

Jun Yu ◽

Aimin Yan ◽

Rui Li ◽

Onur Yukselen ◽

...

Keyword(s):

Gene Expression ◽

Endothelial Cells ◽

Transcription Factor ◽

Deep Sequencing ◽

Chromatin Immunoprecipitation ◽

Regulatory Elements ◽

Bhlh Transcription Factor ◽

Genome Wide ◽

A Genome

Rationale: Significant progress has revealed transcriptional inputs that underlie regulation of artery and vein endothelial cell fates. However, little is known concerning genome-wide regulation of this process. Therefore, such studies are warranted to address this gap. Objective: To identify and characterize artery- and vein-specific endothelial enhancers in the human genome, thereby gaining insights into mechanisms by which blood vessel identity is regulated. Methods and Results: Using chromatin immunoprecipitation and deep sequencing for markers of active chromatin in human arterial and venous endothelial cells, we identified several thousand artery- and vein-specific regulatory elements. Computational analysis revealed that NR2F2 (nuclear receptor subfamily 2, group F, member 2) sites were overrepresented in vein-specific enhancers, suggesting a direct role in promoting vein identity. Subsequent integration of chromatin immunoprecipitation and deep sequencing data sets with RNA sequencing revealed that NR2F2 regulated 3 distinct aspects related to arteriovenous identity. First, consistent with previous genetic observations, NR2F2 directly activated enhancer elements flanking cell cycle genes to drive their expression. Second, NR2F2 was essential to directly activate vein-specific enhancers and their associated genes. Our genomic approach further revealed that NR2F2 acts with ERG (ETS-related gene) at many of these sites to drive vein-specific gene expression. Finally, NR2F2 directly repressed only a small number of artery enhancers in venous cells to prevent their activation, including a distal element upstream of the artery-specific transcription factor, HEY2 (hes related family bHLH transcription factor with YRPW motif 2). In arterial endothelial cells, this enhancer was normally bound by ERG, which was also required for arterial HEY2 expression. By contrast, in venous endothelial cells, NR2F2 was bound to this site, together with ERG, and prevented its activation. Conclusions: By leveraging a genome-wide approach, we revealed mechanistic insights into how NR2F2 functions in multiple roles to maintain venous identity. Importantly, characterization of its role at a crucial artery enhancer upstream of HEY2 established a novel mechanism by which artery-specific expression can be achieved.

Download Full-text

Enhancer Landscapes Reveal Transcription Factor Network Dependencies in Chronic Lymphocytic Leukemia

Blood ◽

10.1182/blood.v126.23.436.436 ◽

2015 ◽

Vol 126 (23) ◽

pp. 436-436 ◽

Cited By ~ 1

Author(s):

Christopher J. Ott ◽

Alexander J. Federation ◽

Siddha Kasar ◽

Josephine L. Klitgaard ◽

Stacey M. Fernandes ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Chronic Lymphocytic Leukemia ◽

Genome Sequencing ◽

Chromatin Structure ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Lymphocytic Leukemia ◽

Protein Coding ◽

Transcription Factor Network

Abstract Genome sequencing efforts of chronic lymphocytic leukemia have revealed mutations that disrupt protein-coding elements of the genome (Puente et al, 2011; Wang et al, 2011; Landau et al, 2013). Recently, comprehensive whole-genome sequencing efforts have begun to reveal the genetic aberrations that occur outside of protein-coding exons, many that may perturb gene regulatory sites (Puente et al, 2015). These include enhancer elements that make physical contact with gene promoters to regulate gene expression in a cell-type specific manner. While mutations certainly promote CLL leukemogenesis, epigenomic alterations may also play an important role in facilitating disease progression and maintenance by inducing the gene expression aberrations that have long been observed in CLL. Epigenomic alterations include chromatin structure changes that facilitate altered transcription and chromatin factor recruitment to regulatory elements. While comprehensive genome-wide DNA methylation studies have been performed on human cancers and normal cell counterparts including CLL, other comprehensive studies of cancer epigenomes have been lacking. We have completed an analysis of chromatin structures in a cohort of primary chronic lymphocytic leukemia (CLL) samples with comparisons to normal CD19+ B lymphocytes (n = 18 CLL samples, n = 5 normal B lymphocyte samples). We used chromatin accessibility assays (ATAC-seq) and genome-wide enhancer mapping (H3K27ac ChIP-seq) to comprehensively define the transcriptionally active chromatin landscape of CLL. We have discovered greater than 15,000 novel regulatory elements when compared to previously annotated regulatory elements. Moreover, sites within the loci of several hundred genes were found to have large regions of gained chromatin accessibility and H3K27 acetylation, revealing the appearance of aberrant enhancer activity. These gained enhancer elements correspond with increased gene expression and are found at gene loci such as LEF1, PLCG1, CTLA4, and ITGB1. We have also systematically identified the super-enhancers of CLL - large complex regulatory regions that possess unique tissue-specific regulatory capabilities. Many of these super-enhancers are found in normal B lymphocytes, yet the super-enhancer at the ITGB1 and LEF1 loci are CLL-specific and may be considered to facilitate leukemia-specific expression. We have found CLL-specific enhancers are also significantly associated with annotated CLL risk variants, and have identified enhancer-associated SNPs found within CLL-risk loci predicted to disrupt transcription factor binding sites. These include SNPs at the IRF8 and LEF1 locithat lead to the creation and destruction of SMAD4 and RXRA binding sites, respectively. Additionally, we have analyzed whole-genome sequencing data from a subset of our sample cohort. Mutational hotspots in the CXCR4 and BACH2 promoters occur within open, acetylated regions. Moreover, we discover recurrent mutations in enhancers of the ETS1 and ST6GAL1 locus that have not been previously annotated. Using a transcription factor network modeling approach, we used these global chromatin structure characteristics to determine networks that are highly active in CLL. We find that transcription factors such as NFATc1, E2F5, and NR3C2 are among the most interconnected transcription factors of the CLL genome, and their connectivity is significantly higher in CLL cells compared to normal B cells. In contrast, network profiling of CLL cells predicts loss of MXI1 connectivity, a negative regulator of the MYC oncogene. By treating cells with specific pharmacological inhibitors of NFAT family members including cyclosporin and FK506, we are able to reduce NFAT-mediated network connectivity, resulting in a selective loss of NFAT-bound enhancers. This leads to CLL cell death in vitro of both cell lines and primary CLL patient samples. Our results reveal the unique chromatin structure landscape of CLL for the first time, and identify the CLL-specific enhancer elements that confer the transcriptional dysregulation that has long been observed in this disease. Use of these chromatin structure analyses and enhancer landscapes has allowed us to construct the intrinsic transcription factor network of CLL, and determine a particular dependency on NFAT signaling for cell survival. Disclosures No relevant conflicts of interest to declare.

Download Full-text