scholarly journals TITER: predicting translation initiation sites by deep learning

2017 ◽  
Author(s):  
Sai Zhang ◽  
Hailin Hu ◽  
Tao Jiang ◽  
Lei Zhang ◽  
Jianyang Zeng

AbstractMotivationTranslation initiation is a key step in the regulation of gene expression. In addition to the annotated translation initiation sites (TISs), the translation process may also start at multiple alternative TISs (including both AUG and non-AUG codons), which makes it challenging to predict TISs and study the underlying regulatory mechanisms. Meanwhile, the advent of several high-throughput sequencing techniques for profiling initiating ribosomes at single-nucleotide resolution, e.g., GTI-seq and QTI-seq, provides abundant data for systematically studying the general principles of translation initiation and the development of computational method for TIS identification.MethodsWe have developed a deep learning based framework, named TITER, for accurately predicting TISs on a genome-wide scale based on QTI-seq data. TITER extracts the sequence features of translation initiation from the surrounding sequence contexts of TISs using a hybrid neural network and further integrates the prior preference of TIS codon composition into a unified prediction framework.ResultsExtensive tests demonstrated that TITER can greatly outperform the state-of-the-art prediction methods in identifying TISs. In addition, TITER was able to identify important sequence signatures for individual types of TIS codons, including a Kozak-sequence-like motif for AUG start codon. Furthermore, the TITER prediction score can be related to the strength of translation initiation in various biological scenarios, including the repressive effect of the upstream open reading frames (uORFs) on gene expression and the mutational effects influencing translation initiation efficiency.AvailabilityTITER is available as an open-source software and can be downloaded from https://github.com/zhangsaithu/[email protected] and [email protected]

2009 ◽  
Vol 191 (10) ◽  
pp. 3203-3211 ◽  
Author(s):  
Karla D. Passalacqua ◽  
Anjana Varadarajan ◽  
Brian D. Ondov ◽  
David T. Okou ◽  
Michael E. Zwick ◽  
...  

ABSTRACT Although gene expression has been studied in bacteria for decades, many aspects of the bacterial transcriptome remain poorly understood. Transcript structure, operon linkages, and information on absolute abundance all provide valuable insights into gene function and regulation, but none has ever been determined on a genome-wide scale for any bacterium. Indeed, these aspects of the prokaryotic transcriptome have been explored on a large scale in only a few instances, and consequently little is known about the absolute composition of the mRNA population within a bacterial cell. Here we report the use of a high-throughput sequencing-based approach in assembling the first comprehensive, single-nucleotide resolution view of a bacterial transcriptome. We sampled the Bacillus anthracis transcriptome under a variety of growth conditions and showed that the data provide an accurate and high-resolution map of transcript start sites and operon structure throughout the genome. Further, the sequence data identified previously nonannotated regions with significant transcriptional activity and enhanced the accuracy of existing genome annotations. Finally, our data provide estimates of absolute transcript abundance and suggest that there is significant transcriptional heterogeneity within a clonal, synchronized bacterial population. Overall, our results offer an unprecedented view of gene expression and regulation in a bacterial cell.


2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Godfrey Grech ◽  
Marieke von Lindern

Organisation of RNAs into functional subgroups that are translated in response to extrinsic and intrinsic factors underlines a relatively unexplored gene expression modulation that drives cell fate in the same manner as regulation of the transcriptome by transcription factors. Recent studies on the molecular mechanisms of inflammatory responses and haematological disorders indicate clearly that the regulation of mRNA translation at the level of translation initiation, mRNA stability, and protein isoform synthesis is implicated in the tight regulation of gene expression. This paper outlines how these posttranscriptional control mechanisms, including control at the level of translation initiation factors and the role of RNA binding proteins, affect hematopoiesis. The clinical relevance of these mechanisms in haematological disorders indicates clearly the potential therapeutic implications and the need of molecular tools that allow measurement at the level of translational control. Although the importance of miRNAs in translation control is well recognised and studied extensively, this paper will exclude detailed account of this level of control.


2017 ◽  
Author(s):  
Adam J. Hockenberry ◽  
Aaron J. Stern ◽  
Luís A.N. Amaral ◽  
Michael C. Jewett

AbstractThe Shine-Dalgarno (SD) sequence is often found upstream of protein coding genes across the bacterial kingdom, where it enhances start codon recognition via hybridization to the anti-SD (aSD) sequence on the small ribosomal subunit. Despite widespread conservation of the aSD sequence, the proportion of SD-led genes within a genome varies widely across species, and the evolutionary pressures shaping this variation remain largely unknown. Here, we conduct a phylogenetically-informed analysis and show that species capable of rapid growth have a significantly higher proportion of SD-led genes in their genome, suggesting a role for SD sequences in meeting the protein production demands of rapidly growing species. Further, we show that utilization of the SD sequence mechanism co-varies with: i) genomic traits that are indicative of efficient translation, and ii) optimal growth temperatures. In contrast to prior surveys, our results demonstrate that variation in translation initiation mechanisms across genomes is largely predictable, and that SD sequence utilization is part of a larger suite of translation-associated traits whose diversity is driven by the differential growth strategies of individual species.


2021 ◽  
Vol 12 ◽  
Author(s):  
Huiyuan Wang ◽  
Sheng Liu ◽  
Xiufang Dai ◽  
Yongkang Yang ◽  
Yunjun Luo ◽  
...  

Populus trichocarpa (P. trichocarpa) is a model tree for the investigation of wood formation. In recent years, researchers have generated a large number of high-throughput sequencing data in P. trichocarpa. However, no comprehensive database that provides multi-omics associations for the investigation of secondary growth in response to diverse stresses has been reported. Therefore, we developed a public repository that presents comprehensive measurements of gene expression and post-transcriptional regulation by integrating 144 RNA-Seq, 33 ChIP-seq, and six single-molecule real-time (SMRT) isoform sequencing (Iso-seq) libraries prepared from tissues subjected to different stresses. All the samples from different studies were analyzed to obtain gene expression, co-expression network, and differentially expressed genes (DEG) using unified parameters, which allowed comparison of results from different studies and treatments. In addition to gene expression, we also identified and deposited pre-processed data about alternative splicing (AS), alternative polyadenylation (APA) and alternative transcription initiation (ATI). The post-transcriptional regulation, differential expression, and co-expression network datasets were integrated into a new P. trichocarpa Stem Differentiating Xylem (PSDX) database, which further highlights gene families of RNA-binding proteins and stress-related genes. The PSDX also provides tools for data query, visualization, a genome browser, and the BLAST option for sequence-based query. Much of the data is also available for bulk download. The availability of PSDX contributes to the research related to the secondary growth in response to stresses in P. trichocarpa, which will provide new insights that can be useful for the improvement of stress tolerance in woody plants.


2021 ◽  
Vol 8 ◽  
Author(s):  
Vandana Yadav ◽  
Inayat Ullah Irshad ◽  
Hemant Kumar ◽  
Ajeet K. Sharma

Quantitative prediction on protein synthesis requires accurate translation initiation and codon translation rates. Ribosome profiling data, which provide steady-state distribution of relative ribosome occupancies along a transcript, can be used to extract these rate parameters. Various methods have been developed in the past few years to measure translation-initiation and codon translation rates from ribosome profiling data. In the review, we provide a detailed analysis of the key methods employed to extract the translation rate parameters from ribosome profiling data. We further discuss how these approaches were used to decipher the role of various structural and sequence-based features of mRNA molecules in the regulation of gene expression. The utilization of these accurate rate parameters in computational modeling of protein synthesis may provide new insights into the kinetic control of the process of gene expression.


2019 ◽  
Author(s):  
Kathrin Witmer ◽  
Sabine AK Fraschka ◽  
Dina Vlachou ◽  
Richárd Bártfai ◽  
George K Christophides

ABSTRACTEpigenetic regulation of gene expression is an important attribute in the survival and adaptation of the malaria parasite Plasmodium in its human host. Our understanding of epigenetic regulation of gene expression in Plasmodium developmental stages beyond asexual replication in the mammalian host is sparse. We used chromatin immune-precipitation (ChIP) and RNA sequencing to create an epigenetic and transcriptomic map of the murine parasite Plasmodium berghei development from asexual blood stages to male and female gametocytes, and finally, to ookinetes. We show that heterochromatin 1 (HP1) almost exclusively associates with variantly expressed gene families at subtelomeric regions and remains stable across stages and various parasite lines. Variant expression based on heterochromatic silencing is observed only in very few genes. In contrast, the active histone mark histone 3 Lysine 9 acetylation (H3K9ac) is found between heterochromatin boundaries and occurs as a sharp peak around the start codon for ribosomal protein genes. H3K9ac occupancy positively correlates with gene transcripts in asexual blood stages, male gametocytes and ookinetes. Interestingly, H3K9ac occupancy does not correlate with transcript abundance in female gametocytes. Finally, we identify novel DNA motifs upstream of ookinete-specific genes thought to be involved in transcriptional activation upon fertilization.


2019 ◽  
Vol 294 (28) ◽  
pp. 10998-11010 ◽  
Author(s):  
Xiao-Juan Yang ◽  
Hong Zhu ◽  
Shi-Rong Mu ◽  
Wen-Juan Wei ◽  
Xun Yuan ◽  
...  

The Y-box binding protein 1 (YB-1) is a member of the cold shock domain (CSD) protein family and is recognized as an oncogenic factor in several solid tumors. By binding to RNA, YB-1 participates in several steps of posttranscriptional regulation of gene expression, including mRNA splicing, stability, and translation; microRNA processing; and stress granule assembly. However, the mechanisms in YB-1–mediated regulation of RNAs are unclear. Previously, we used both systematic evolution of ligands by exponential enrichment (SELEX) and individual-nucleotide resolution UV cross-linking and immunoprecipitation coupled RNA-Seq (iCLIP-Seq) analyses, which defined the RNA-binding consensus sequence of YB-1 as CA(U/C)C. We also reported that through binding to its core motif CAUC in primary transcripts, YB-1 regulates the alternative splicing of a CD44 variable exon and the biogenesis of miR-29b-2 during both Drosha and Dicer steps. To elucidate the molecular basis of the YB-1–RNA interactions, we report high-resolution crystal structures of the YB-1 CSD in complex with different RNA oligos at 1.7 Å resolution. The structure revealed that CSD interacts with RNA mainly through π–π stacking interactions assembled by four highly conserved aromatic residues. Interestingly, YB-1 CSD forms a homodimer in solution, and we observed that two residues, Tyr-99 and Asp-105, at the dimer interface are important for YB-1 CSD dimerization. Substituting these two residues with Ala reduced CSD's RNA-binding activity and abrogated the splicing activation of YB-1 targets. The YB-1 CSD–RNA structures presented here at atomic resolution provide mechanistic insights into gene expression regulated by CSD-containing proteins.


Author(s):  
Fanli Meng ◽  
Hainan Zhao ◽  
Bo Zhu ◽  
Tao Zhang ◽  
Mingyu Yang ◽  
...  

Abstract Enhancers located in introns are abundant and play a major role in the regulation of gene expression in mammalian species. By contrast, the functions of intronic enhancers in plants have largely been unexplored and only a handful of plant intronic enhancers have been reported. We performed a genome-wide prediction of intronic enhancers in Arabidopsis thaliana using open chromatin signatures based on DNase I sequencing. We identified 941 candidate intronic enhancers associated with 806 genes in seedling tissue and 1,271 intronic enhancers associated with 1,069 genes in floral tissue. We validated the function of 15 of 21 (71%) of the predicted intronic enhancers in transgenic assays using a reporter gene. We also created deletion lines of three intronic enhancers associated with two different genes using CRISPR/Cas. Deletion of these enhancers, which span key transcription factor binding sites, did not abolish gene expression but caused varying levels of transcriptional repression of their cognate genes. Remarkably, the transcriptional repression of the deletion lines occurred at specific developmental stages and resulted in distinct phenotypic effects on plant morphology and development. Clearly, these three intronic enhancers are important in fine-tuning tissue- and development-specific expression of their cognate genes.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 2392-2392 ◽  
Author(s):  
Ilango Balakrishnan ◽  
Xiaodong Yang ◽  
Beverly Torok-Storb ◽  
Jay Hesselberth ◽  
Manoj M Pillai

Abstract Abstract 2392 There is increasing recognition of the role of small noncoding RNAs in post-transcriptional regulation of gene expression in diverse tissues of eukaryotic organisms including vertebrates. MicroRNAs (miRNAs) are the best studied amongst these small RNAs and are thought to act by binding to the 3' untranslated regions (3' UTRs) of mature mRNAs in a sequence-specific fashion and preventing the initiation of peptide translation and/ or initiating mRNA degradation. Recent evidence suggests that miRNA-based regulation might involve binding to regions other than 3' UTRs including coding regions. Current approaches to defining miRNA-mRNA interactions are mostly restricted to those based on bio-informatic prediction, protein down-regulation following in-vitro transfection of miRNA precursors and luciferase assays to determine binding to 3' UTRs. None of these methods however show direct interaction between a specific miRNA and its purported target RNA. Bio-informatics-based approaches are also prone to false positive and negative results given the short length of sequence matching, and reliance on heuristics and cross-species conservation. Newer genome-wide approaches like HITS-CLIP (High Throughput Sequencing following Cross Linked Immuno Precipitation, or CLIP-Seq) overcome some of these limitations by directly isolating the miRNA-mRNA interactome bound to argonaute (AGO), a critical component of the rna-induced silencing complex (RISC)1. HITS-CLIP utilizes the ability of ultraviolet (UV) light to cross-link RNAs to proteins in their close proximity. The crosslinked miRNA-mRNA-Ago complexes are then isolated and the RNA reverse transcribed to cDNA libraries and sequenced by next generation sequencing (NGS). Given the widespread role of miRNAs in several vertebrate tissues, we hypothesized that miRNA-regulation of gene expression is operant in the hematopoietic microenvironment (ME) and thus contributes to regulation of hematopoiesis. We hence used HITS-CLIP to analyze the miRNA-mRNA interactome of three key cellular components of the ME: stromal cells, endothelium and macrophages. We have previously reported on the use of the stromal cell lines Hs27a and Hs5 to define specific functional niches within the ME. Hs27a can functionally support primitive hematopoietic stem and progenitor cells (HSPC) in cobblestone areas (CSAs) and express high levels of factors known to support HSPC such as SDF1, Jagged1 and Angiopoietin1. In contrast, Hs5 drives HSPC to mature lineages and secretes high levels of cytokines like IL1, IL6 and GCSF. Human umbilical vein endothelial cells (HUVECs) and MCSF-treated CD14+ cells were utilized for the endothelial and macrophage cultures respectively. The HITS-CLIP datasets from each of these populations were enriched for a putative binding site for miR-9 in the coding region of Matrix Metalloproteinase 2 (MMP2) mRNA. MMP2 belongs to a family of endopeptidases critical in the remodeling of extracellular matrix in several tissues and in the egress/ homing of HSPC to their functional niches in the ME. Functional binding of miR-9 to MMP2 was validated by Western-blotting of stromal cells transfected with miR-9 which revealed > 50% reduction of protein levels when compared to control-transfected cells. This was also confirmed by gelatin zymography which showed significantly reduced MMP2 activity in stromal cells transfected with miR-9. Finally, to confirm direct binding of miR-9 to the putative binding region on the MMP2 transcript, we cloned this microRNA responsive region (MRE) downstream of the Renilla luciferase gene and assayed its activity by luciferase assays. MiR-9 transfection down-regulated luciferase activity > 50% confirming direct binding to the MRE. Our results show that genome-wide approaches such as HITS-CLIP can be used to define in vivo miRNA-mRNA interactions in the ME and should be considered in studies that define such interactions given the significant false-positive and false negative results associated with approaches based on bio-informatics alone. The approach can also define specific interactions between miRNAs and mRNAs such as MMP2, of relevance to regulation of the hematopoietic ME. Disclosures: No relevant conflicts of interest to declare.


2018 ◽  
Author(s):  
Zhaolian Lu ◽  
Zhenguo Lin

AbstractTranscription initiation is finely regulated to ensure the proper expression and function of these genes. The regulated transcription initiation in response to various environmental cues in the model organism Saccharomyces cerevisiae has not been systematically investigated. In this study, we generated quantitative maps of transcription start site (TSS) at a single-nucleotide resolution for S. cerevisiae grown in nine different conditions using no-amplification non-tagging Cap analysis of gene expression (nAnT-iCAGE) sequencing. Based on 337 million uniquely mapped CAGE tags, we mapped ~1 million well-supported TSSs, suggesting highly pervasive transcription initiation in the compact genome of yeast. The comprehensive TSS maps allowed us to identify core promoters for ~96% verified protein-coding genes and to revise the predicted translation start codon for 183 genes. We found that 56% of yeast genes have at least two core promoters and alternative usage of different core promoters in a gene is widespread in response to changing environments. More importantly, most core promoter shifts are coupled with differential gene expression, indicating that core promoter shift might play an important role in controlling transcriptional activity of yeast genes. Based on their dynamic activities, we divided yeast core promoters as constitutive core promoters (55%) and inducible core promoters (45%). The two classes of core promoters exhibit distinctive patterns in transcriptional abundance, chromatin structure, promoter shape, and sequence context. In summary, the quantitative TSS maps generated by this study improved the annotation of yeast genome, and revealed a highly pervasive and dynamic nature of transcription initiation in yeast.


Sign in / Sign up

Export Citation Format

Share Document