scholarly journals PRECISE 2.0: an expanded high-quality RNA-seq compendium for Escherichia coli K-12 reveals high-resolution transcriptional regulatory structure

2021 ◽  
Author(s):  
Cameron R. Lamoureux ◽  
Katherine T. Decker ◽  
Anand V. Sastry ◽  
John Luke McConn ◽  
Ye Gao ◽  
...  

Uncovering the structure of the transcriptional regulatory network (TRN) that modulates gene expression in prokaryotes remains an important challenge. Transcriptomics data is plentiful, necessitating the development of scalable methods for converting this data into useful knowledge about the TRN. Previously, we published the PRECISE dataset for Escherichia coli K-12 MG1655, containing 278 RNA-seq datasets created using a standardized protocol. Here, we present PRECISE 2.0, which is nearly three times the size of the original PRECISE dataset and also created using a standardized protocol. We analyze PRECISE 2.0 at multiple scales, demonstrating multiple analytical strategies for extracting knowledge from this dataset. Specifically, we: (1) highlight patterns in gene expression across the dataset; (2) utilize independent component analysis to extract 218 independently modulated groups of genes (iModulons) that describe the TRN at the systems level; (3) demonstrate the utility of iModulons over traditional differential expression analysis; and (4) uncover 6 new potential regulons. Thus, PRECISE 2.0 is a large-scale, high-quality transcriptomics dataset which may be analyzed at multiple scales to yield important biological insights.

2022 ◽  
Author(s):  
Yuan Yuan ◽  
Yara Seif ◽  
Kevin Rychel ◽  
Reo Yoo ◽  
Siddharth M Chauhan ◽  
...  

Salmonella enterica Typhimurium is a serious pathogen that is involved in human nontyphoidal infections. Tackling Typhimurium infections is difficult due to the species' dynamic adaptation to its environment, which is dictated by a complex transcriptional regulatory network (TRN). While traditional biomolecular methods provide characterizations of specific regulators, it is laborious to construct the global TRN structure from this bottom-up approach. Here, we used a machine learning technique to understand the transcriptional signatures of S. enterica Typhimurium from the top down, as a whole and in individual strains. Furthermore, we conducted cross-strain comparison of 6 strains in serovar Typhimurium to investigate similarities and differences in their TRNs with pan-genomic analysis. By decomposing all the publicly available RNA-Seq data of Typhimurium with independent component analysis (ICA), we obtained over 400 independently modulated sets of genes, called iModulons. Through analysis of these iModulons, we 1) discover three transport iModulons linked to antibiotic resistance, 2) describe concerted responses to cationic antimicrobial peptides (CAMPs), 3) uncover evidence towards new regulons, and 4) identify two iModulons linked to bile responses in strain ST4/74. We extend this analysis across the pan-genome to show that strain-specific iModulons 5) reveal different genetic signatures in pathogenicity islands that explain phenotypes and 6) capture the activity of different phages in the studied strains. Using all high-quality publicly-available RNA-Seq data to date, we present a comprehensive, data-driven Typhimurium TRN. It is conceivable that with more high-quality datasets from more strains, the approach used in this study will continue to guide our investigation in understanding the pan-transcriptome of Typhimurium. Interactive dashboards for all gene modules in this project are available at https://imodulondb.org/ to enable browsing for interested researchers.


2018 ◽  
Author(s):  
Ye Gao ◽  
James T. Yurkovich ◽  
Sang Woo Seo ◽  
Ilyas Kabimoldayev ◽  
Andreas Dräger ◽  
...  

ABSTRACTTranscriptional regulation enables cells to respond to environmental changes. Yet, among the estimated 304 candidate transcription factors (TFs) in Escherichia coli K-12 MG1655, 185 have been experimentally identified and only a few tens of them have been fully characterized by ChIP methods. Understanding the remaining TFs is key to improving our knowledge of the E. coli transcriptional regulatory network (TRN). Here, we developed an integrated workflow for the computational prediction and comprehensive experimental validation of TFs using a suite of genome-wide experiments. We applied this workflow to: 1) identify 16 candidate TFs from over a hundred candidate uncharacterized genes; 2) capture a total of 255 DNA binding peaks for 10 candidate TFs resulting in six high-confidence binding motifs; 3) reconstruct the regulons of these 10 TFs by determining gene expression changes upon deletion of each TF; and 4) determine the regulatory roles of three TFs (YiaJ, YdcI, and YeiE) as regulators of L-ascorbate utilization, proton transfer and acetate metabolism, and iron homeostasis under iron limited condition, respectively. Together, these results demonstrate how this workflow can be used to discover, characterize, and elucidate regulatory functions of uncharacterized TFs in parallel.


Animals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1745
Author(s):  
Ben-Ben Miao ◽  
Su-Fang Niu ◽  
Ren-Xie Wu ◽  
Zhen-Bang Liang ◽  
Bao-Gui Tang ◽  
...  

Pearl gentian grouper (Epinephelus fuscoguttatus ♀ × Epinephelus lanceolatus ♂) is a fish of high commercial value in the aquaculture industry in Asia. However, this hybrid fish is not cold-tolerant, and its molecular regulation mechanism underlying cold stress remains largely elusive. This study thus investigated the liver transcriptomic responses of pearl gentian grouper by comparing the gene expression of cold stress groups (20, 15, 12, and 12 °C for 6 h) with that of control group (25 °C) using PacBio SMRT-Seq and Illumina RNA-Seq technologies. In SMRT-Seq analysis, a total of 11,033 full-length transcripts were generated and used as reference sequences for further RNA-Seq analysis. In RNA-Seq analysis, 3271 differentially expressed genes (DEGs), two low-temperature specific modules (tan and blue modules), and two significantly expressed gene sets (profiles 0 and 19) were screened by differential expression analysis, weighted gene co-expression networks analysis (WGCNA), and short time-series expression miner (STEM), respectively. The intersection of the above analyses further revealed some key genes, such as PCK, ALDOB, FBP, G6pC, CPT1A, PPARα, SOCS3, PPP1CC, CYP2J, HMGCR, CDKN1B, and GADD45Bc. These genes were significantly enriched in carbohydrate metabolism, lipid metabolism, signal transduction, and endocrine system pathways. All these pathways were linked to biological functions relevant to cold adaptation, such as energy metabolism, stress-induced cell membrane changes, and transduction of stress signals. Taken together, our study explores an overall and complex regulation network of the functional genes in the liver of pearl gentian grouper, which could benefit the species in preventing damage caused by cold stress.


mSystems ◽  
2020 ◽  
Vol 5 (6) ◽  
Author(s):  
Kumari Sonal Choudhary ◽  
Julia A. Kleinmanns ◽  
Katherine Decker ◽  
Anand V. Sastry ◽  
Ye Gao ◽  
...  

ABSTRACT Escherichia coli uses two-component systems (TCSs) to respond to environmental signals. TCSs affect gene expression and are parts of E. coli’s global transcriptional regulatory network (TRN). Here, we identified the regulons of five TCSs in E. coli MG1655: BaeSR and CpxAR, which were stimulated by ethanol stress; KdpDE and PhoRB, induced by limiting potassium and phosphate, respectively; and ZraSR, stimulated by zinc. We analyzed RNA-seq data using independent component analysis (ICA). ChIP-exo data were used to validate condition-specific target gene binding sites. Based on these data, we do the following: (i) identify the target genes for each TCS; (ii) show how the target genes are transcribed in response to stimulus; and (iii) reveal novel relationships between TCSs, which indicate noncognate inducers for various response regulators, such as BaeR to iron starvation, CpxR to phosphate limitation, and PhoB and ZraR to cell envelope stress. Our understanding of the TRN in E. coli is thus notably expanded. IMPORTANCE E. coli is a common commensal microbe found in the human gut microenvironment; however, some strains cause diseases like diarrhea, urinary tract infections, and meningitis. E. coli’s two-component systems (TCSs) modulate target gene expression, especially related to virulence, pathogenesis, and antimicrobial peptides, in response to environmental stimuli. Thus, it is of utmost importance to understand the transcriptional regulation of TCSs to infer bacterial environmental adaptation and disease pathogenicity. Utilizing a combinatorial approach integrating RNA sequencing (RNA-seq), independent component analysis, chromatin immunoprecipitation coupled with exonuclease treatment (ChIP-exo), and data mining, we suggest five different modes of TCS transcriptional regulation. Our data further highlight noncognate inducers of TCSs, which emphasizes the cross-regulatory nature of TCSs in E. coli and suggests that TCSs may have a role beyond their cognate functionalities. In summary, these results can lead to an understanding of the metabolic capabilities of bacteria and correctly predict complex phenotype under diverse conditions, especially when further incorporated with genome-scale metabolic models.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Mikhail Pomaznoy ◽  
Ashu Sethi ◽  
Jason Greenbaum ◽  
Bjoern Peters

Abstract RNA-seq methods are widely utilized for transcriptomic profiling of biological samples. However, there are known caveats of this technology which can skew the gene expression estimates. Specifically, if the library preparation protocol does not retain RNA strand information then some genes can be erroneously quantitated. Although strand-specific protocols have been established, a significant portion of RNA-seq data is generated in non-strand-specific manner. We used a comprehensive stranded RNA-seq dataset of 15 blood cell types to identify genes for which expression would be erroneously estimated if strand information was not available. We found that about 10% of all genes and 2.5% of protein coding genes have a two-fold or higher difference in estimated expression when strand information of the reads was ignored. We used parameters of read alignments of these genes to construct a machine learning model that can identify which genes in an unstranded dataset might have incorrect expression estimates and which ones do not. We also show that differential expression analysis of genes with biased expression estimates in unstranded read data can be recovered by limiting the reads considered to those which span exonic boundaries. The resulting approach is implemented as a package available at https://github.com/mikpom/uslcount.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Anand V. Sastry ◽  
Ye Gao ◽  
Richard Szubin ◽  
Ying Hefner ◽  
Sibei Xu ◽  
...  

AbstractUnderlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we apply unsupervised machine learning to a diverse compendium of over 250 high-quality Escherichia coli RNA-seq datasets to identify 92 statistically independent signals that modulate the expression of specific gene sets. We show that 61 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals is validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provides: a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations; a guide to gene and regulator function discovery; and a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation describes the composition of a model prokaryotic transcriptome.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Paulo Rapazote-Flores ◽  
Micha Bayer ◽  
Linda Milne ◽  
Claus-Dieter Mayer ◽  
John Fuller ◽  
...  

Abstract Background The time required to analyse RNA-seq data varies considerably, due to discrete steps for computational assembly, quantification of gene expression and splicing analysis. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants. Results A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts – BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al. Nature; 544: 427–433, 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al. Plant Physiol; 156: 20–28, 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5′ and 3′ UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2791 differentially alternatively spliced genes and 2768 transcripts with differential transcript usage. Conclusion A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.


Sign in / Sign up

Export Citation Format

Share Document