scholarly journals Structure and Complexity of a Bacterial Transcriptome

2009 ◽  
Vol 191 (10) ◽  
pp. 3203-3211 ◽  
Author(s):  
Karla D. Passalacqua ◽  
Anjana Varadarajan ◽  
Brian D. Ondov ◽  
David T. Okou ◽  
Michael E. Zwick ◽  
...  

ABSTRACT Although gene expression has been studied in bacteria for decades, many aspects of the bacterial transcriptome remain poorly understood. Transcript structure, operon linkages, and information on absolute abundance all provide valuable insights into gene function and regulation, but none has ever been determined on a genome-wide scale for any bacterium. Indeed, these aspects of the prokaryotic transcriptome have been explored on a large scale in only a few instances, and consequently little is known about the absolute composition of the mRNA population within a bacterial cell. Here we report the use of a high-throughput sequencing-based approach in assembling the first comprehensive, single-nucleotide resolution view of a bacterial transcriptome. We sampled the Bacillus anthracis transcriptome under a variety of growth conditions and showed that the data provide an accurate and high-resolution map of transcript start sites and operon structure throughout the genome. Further, the sequence data identified previously nonannotated regions with significant transcriptional activity and enhanced the accuracy of existing genome annotations. Finally, our data provide estimates of absolute transcript abundance and suggest that there is significant transcriptional heterogeneity within a clonal, synchronized bacterial population. Overall, our results offer an unprecedented view of gene expression and regulation in a bacterial cell.

2017 ◽  
Author(s):  
Sai Zhang ◽  
Hailin Hu ◽  
Tao Jiang ◽  
Lei Zhang ◽  
Jianyang Zeng

AbstractMotivationTranslation initiation is a key step in the regulation of gene expression. In addition to the annotated translation initiation sites (TISs), the translation process may also start at multiple alternative TISs (including both AUG and non-AUG codons), which makes it challenging to predict TISs and study the underlying regulatory mechanisms. Meanwhile, the advent of several high-throughput sequencing techniques for profiling initiating ribosomes at single-nucleotide resolution, e.g., GTI-seq and QTI-seq, provides abundant data for systematically studying the general principles of translation initiation and the development of computational method for TIS identification.MethodsWe have developed a deep learning based framework, named TITER, for accurately predicting TISs on a genome-wide scale based on QTI-seq data. TITER extracts the sequence features of translation initiation from the surrounding sequence contexts of TISs using a hybrid neural network and further integrates the prior preference of TIS codon composition into a unified prediction framework.ResultsExtensive tests demonstrated that TITER can greatly outperform the state-of-the-art prediction methods in identifying TISs. In addition, TITER was able to identify important sequence signatures for individual types of TIS codons, including a Kozak-sequence-like motif for AUG start codon. Furthermore, the TITER prediction score can be related to the strength of translation initiation in various biological scenarios, including the repressive effect of the upstream open reading frames (uORFs) on gene expression and the mutational effects influencing translation initiation efficiency.AvailabilityTITER is available as an open-source software and can be downloaded from https://github.com/zhangsaithu/[email protected] and [email protected]


Parasitology ◽  
2009 ◽  
Vol 136 (5) ◽  
pp. 469-485 ◽  
Author(s):  
A. S. TAFT ◽  
J. J. VERMEIRE ◽  
J. BERNIER ◽  
S. R. BIRKELAND ◽  
M. J. CIPRIANO ◽  
...  

SUMMARYInfection of the snail,Biomphalaria glabrata, by the free-swimming miracidial stage of the human blood fluke,Schistosoma mansoni, and its subsequent development to the parasitic sporocyst stage is critical to establishment of viable infections and continued human transmission. We performed a genome-wide expression analysis of theS. mansonimiracidia and developing sporocyst using Long Serial Analysis of Gene Expression (LongSAGE). Five cDNA libraries were constructed from miracidia andin vitrocultured 6- and 20-day-old sporocysts maintained in sporocyst medium (SM) or in SM conditioned by previous cultivation with cells of theB. glabrataembryonic (Bge) cell line. We generated 21 440 SAGE tags and mapped 13 381 to theS. mansonigene predictions (v4.0e) either by estimating theoretical 3′ UTR lengths or using existing 3′ EST sequence data. Overall, 432 transcripts were found to be differentially expressed amongst all 5 libraries. In total, 172 tags were differentially expressed between miracidia and 6-day conditioned sporocysts and 152 were differentially expressed between miracidia and 6-day unconditioned sporocysts. In addition, 53 and 45 tags, respectively, were differentially expressed in 6-day and 20-day cultured sporocysts, due to the effects of exposure to Bge cell-conditioned medium.


2020 ◽  
Author(s):  
Yang Young Lu ◽  
Jiaxing Bai ◽  
Yiwen Wang ◽  
Ying Wang ◽  
Fengzhu Sun

AbstractMotivationRapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption.ResultsWe report CRAFT, a general genomic/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing (HTS) data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102 – 104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures.AvailabilityCRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https://github.com/jiaxingbai/[email protected]; [email protected] informationSupplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 ◽  
Author(s):  
Huiyuan Wang ◽  
Sheng Liu ◽  
Xiufang Dai ◽  
Yongkang Yang ◽  
Yunjun Luo ◽  
...  

Populus trichocarpa (P. trichocarpa) is a model tree for the investigation of wood formation. In recent years, researchers have generated a large number of high-throughput sequencing data in P. trichocarpa. However, no comprehensive database that provides multi-omics associations for the investigation of secondary growth in response to diverse stresses has been reported. Therefore, we developed a public repository that presents comprehensive measurements of gene expression and post-transcriptional regulation by integrating 144 RNA-Seq, 33 ChIP-seq, and six single-molecule real-time (SMRT) isoform sequencing (Iso-seq) libraries prepared from tissues subjected to different stresses. All the samples from different studies were analyzed to obtain gene expression, co-expression network, and differentially expressed genes (DEG) using unified parameters, which allowed comparison of results from different studies and treatments. In addition to gene expression, we also identified and deposited pre-processed data about alternative splicing (AS), alternative polyadenylation (APA) and alternative transcription initiation (ATI). The post-transcriptional regulation, differential expression, and co-expression network datasets were integrated into a new P. trichocarpa Stem Differentiating Xylem (PSDX) database, which further highlights gene families of RNA-binding proteins and stress-related genes. The PSDX also provides tools for data query, visualization, a genome browser, and the BLAST option for sequence-based query. Much of the data is also available for bulk download. The availability of PSDX contributes to the research related to the secondary growth in response to stresses in P. trichocarpa, which will provide new insights that can be useful for the improvement of stress tolerance in woody plants.


2010 ◽  
Vol 08 (supp01) ◽  
pp. 177-192 ◽  
Author(s):  
XI WANG ◽  
ZHENGPENG WU ◽  
XUEGONG ZHANG

Due to its unprecedented high-resolution and detailed information, RNA-seq technology based on next-generation high-throughput sequencing significantly boosts the ability to study transcriptomes. The estimation of genes' transcript abundance levels or gene expression levels has always been an important question in research on the transcriptional regulation and gene functions. On the basis of the concept of Reads Per Kilo-base per Million reads (RPKM), taking the union-intersection genes (UI-based) and summing up inferred isoform abundance (isoform-based) are the two current strategies to estimate gene expression levels, but produce different estimations. In this paper, we made the first attempt to compare the two strategies' performances through a series of simulation studies. Our results showed that the isoform-based method gives not only more accurate estimation but also has less uncertainty than the UI-based strategy. If taking into account the non-uniformity of read distribution, the isoform-based method can further reduce estimation errors. We applied both strategies to real RNA-seq datasets of technical replicates, and found that the isoform-based strategy also displays a better performance. For a more accurate estimation of gene expression levels from RNA-seq data, even if the abundance levels of isoforms are not of interest, it is still better to first infer the isoform abundance and sum them up to get the expression level of a gene as a whole.


Author(s):  
Tao Yan ◽  
Yao Yao ◽  
Dezhi Wu ◽  
Lixi Jiang

Abstract Rapeseed (Brassica napus L.) is a typical polyploid crop and one of the most important oilseed crops worldwide. With the rapid progress on high-throughput sequencing technologies and the reduction of sequencing cost, large-scale genomic data of a specific crop have become available. However, raw sequence data are mostly deposited in the sequence read archive of the National Center of Biotechnology Information (NCBI) and the European Nucleotide Archive (ENA), which is freely accessible to all researchers. Extensive tools for practical purposes should be developed to efficiently utilize these large raw data. Here, we report a web-based rapeseed genomic variation database (BnaGVD, http://rapeseed.biocloud.net/home) from which genomic variations, such as single nucleotide polymorphisms (SNPs) and insertions/deletions (InDels) across a world-wide collection of rapeseed accessions, can be referred. The current release of the BnaGVD contains 34,591,899 high-quality SNPs and 12,281,923 high-quality InDels and provides search tools to retrieve genomic variations and gene annotations across 1,007 accessions of worldwide rapeseed germplasm. We implement a variety of built-in tools (e.g., BnaGWAS, BnaPCA, and BnaStructure) to help users perform in-depth analyses. We recommend this web resource for accelerating studies on the functional genomics and screening of molecular markers for rapeseed breeding.


2017 ◽  
Author(s):  
Veronika A. Herzog ◽  
Brian Reichholf ◽  
Tobias Neumann ◽  
Philipp Rescheneder ◽  
Pooja Bhat ◽  
...  

AbstractGene expression profiling by high-throughput sequencing reveals qualitative and quantitative changes in RNA species at steady-state but obscures the intracellular dynamics of RNA transcription, processing and decay. We developed thiol(SH)-linked alkylation for the metabolic sequencing of RNA (SLAM-seq), an orthogonal chemistry-based epitranscriptomics-sequencing technology that uncovers 4-thiouridine (s4U)-incorporation in RNA species at single-nucleotide resolution. In combination with well-established metabolic RNA labeling protocols and coupled to standard, low-input, high-throughput RNA sequencing methods, SLAM-seq enables rapid access to RNA polymerase II-dependent gene expression dynamics in the context of total RNA. When applied to mouse embryonic stem cells, SLAM-seq provides global and transcript-specific insights into pluripotency-associated gene expression. We validated the method by showing that the RNA-polymerase II-dependent transcriptional output scales with Oct4/Sox2/Nanog-defined enhancer activity; and we provide quantitative and mechanistic evidence for transcript-specific RNA turnover mediated by post-transcriptional gene regulatory pathways initiated by microRNAs and N6-methyladenosine. SLAM-seq facilitates the dissection of fundamental mechanisms that control gene expression in an accessible, cost-effective, and scalable manner.One Sentence Summary:Chemical nucleotide-analog derivatization provides global insights into transcriptional and post-transcriptional gene regulation


Author(s):  
Yang Young Lu ◽  
Jiaxing Bai ◽  
Yiwen Wang ◽  
Ying Wang ◽  
Fengzhu Sun

Abstract Motivation Rapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption. Results We report CRAFT, a general genomic/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102−104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures. Availability and implementation CRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https://github.com/jiaxingbai/CRAFT. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 16 (06) ◽  
pp. 1840028 ◽  
Author(s):  
Joungmin Choi ◽  
Yoonjae Park ◽  
Sun Kim ◽  
Heejoon Chae

In recent years, there have been many studies utilizing DNA methylome data to answer fundamental biological questions. Bisulfite sequencing (BS-seq) has enabled measurement of a genome-wide absolute level of DNA methylation at single-nucleotide resolution. However, due to the ambiguity introduced by bisulfite-treatment, the aligning process especially in large-scale epigenetic research is still considered a huge burden. We present Cloud-BS, an efficient BS-seq aligner designed for parallel execution on a distributed environment. Utilizing Apache Hadoop framework, Cloud-BS splits sequencing reads into multiple blocks and transfers them to distributed nodes. By designing each aligning procedure into separate map and reducing tasks while an internal key-value structure is optimized based on the MapReduce programming model, the algorithm significantly improves alignment performance without sacrificing mapping accuracy. In addition, Cloud-BS minimizes the innate burden of configuring a distributed environment by providing a pre-configured cloud image. Cloud-BS shows significantly improved bisulfite alignment performance compared to other existing BS-seq aligners. We believe our algorithm facilitates large-scale methylome data analysis. The algorithm is freely available at https://paryoja.github.io/Cloud-BS/ .


2005 ◽  
Vol 18 (3) ◽  
pp. 229-243 ◽  
Author(s):  
Thomas A. Randall ◽  
Rex A. Dwyer ◽  
Edgar Huitema ◽  
Katinka Beyer ◽  
Cristina Cvitanich ◽  
...  

To overview the gene content of the important pathogen Phytophthora infestans, large-scale cDNA and genomic sequencing was performed. A set of 75,757 high-quality expressed sequence tags (ESTs) from P. infestans was obtained from 20 cDNA libraries representing a broad range of growth conditions, stress responses, and developmental stages. These included libraries from P. infestans-potato and -tomato interactions, from which 963 pathogen ESTs were identified. To complement the ESTs, onefold coveragethe P. infestans genome was obtained and regions of coding potential identified. A unigene set of 18,256 sequences was derived from the EST and genomic data and characterized for potential functions, stage-specific patterns of expression, and codon bias. Cluster analysis of ESTs revealed major differences between the expressed gene content of mycelial and spore-related stages, and affinities between some growth conditions. Comparisons with databases of fungal pathogenicity genes revealed conserved elements of pathogenicity, such as class III pectate lyases, despite the considerable evolutionary distance between oomycetes and fungi. Thirty-seven genes encoding components of flagella also were identified. Several genes not anticipated to occur in oomycetes were detected, including chitin synthases, phosphagen kinases, and a bacterial-type FtsZ cell-division protein. The sequence data described are available in a searchable public database.


Sign in / Sign up

Export Citation Format

Share Document