scholarly journals Targeted enrichment outperforms other enrichment techniques and enables more multi-species RNA-Seq analyses

2018 ◽  
Author(s):  
Matthew Chung ◽  
Laura Teigen ◽  
Hong Liu ◽  
Silvia Libro ◽  
Amol Shetty ◽  
...  

AbstractEnrichment methodologies enable analysis of minor members in multi-species transcriptomic analyses. We compared standard enrichment of bacterial and eukaryotic mRNA to targeted enrichment with Agilent SureSelect (AgSS) capture for Brugia malayi, Aspergillus fumigatus, and the Wolbachia endosymbiont of B. malayi (wBm). Without introducing significant systematic bias, the AgSS quantitatively enriched samples, resulting in more reads mapping to the target organism. The AgSS-enriched libraries consistently had a positive linear correlation with its unenriched counterpart (r2=0.559-0.867). Up to a 2,242-fold enrichment of RNA from the target organism was obtained following a power law (r2=0.90), with the greatest fold enrichment achieved in samples with the largest ratio difference between the major and minor members. While using a single total library for prokaryote and eukaryote in a single sample could be beneficial for samples where RNA is limiting, we observed a decrease in reads mapping to protein coding genes and an increase of multi-mapping reads to rRNAs in AgSS enrichments from eukaryotic total RNA libraries as opposed to eukaryotic poly(A)-enriched libraries. Our results support a recommendation of using Agilent SureSelect targeted enrichment on poly(A)-enriched libraries for eukaryotic captures and total RNA libraries for prokaryotic captures to increase the robustness of multi-species transcriptomic studies.

2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Matthew Chung ◽  
Laura Teigen ◽  
Hong Liu ◽  
Silvia Libro ◽  
Amol Shetty ◽  
...  

Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 390-390
Author(s):  
Paul F. Bray ◽  
Steven E. McKenzie ◽  
Leonard C. Edelstein ◽  
Srikanth Nagalla ◽  
Kathleen Delgrosso ◽  
...  

Abstract Abstract 390 A conspicuous lesson that has emerged from the 1000 Genomes Project is the greater genetic variation in the population than previously appreciated. Transcriptomics is rapidly assuming a prominent role in the understanding of basic molecular mechanisms accounting for variation within the normal population and disease states. Besides protein-coding RNAs, the importance of non-coding RNAs (ncRNAs) – primarily as regulators of gene expression – is well recognized but largely unexplored. The platelet transcriptome reflects megakaryocyte RNA content at the time of proplatelet release, subsequent splicing events, selective packaging and platelet RNA stability. An accurate understanding of the platelet transcriptome has both biological (improved understanding of platelet protein translation and the mechanisms of megakaryocyte/platelet gene expression) and clinical (novel biomarkers of disease) relevance. We carried out transcriptome sequencing of total RNA isolated from leukocyte-depleted platelet preparations from four healthy adults using an AB/LT SOLiD™ system. For each individual, we constructed 3 libraries: a) long (≥ 40 nucleotides) total RNA, b) long RNA depleted of rRNA, and c) short (< 40 nucleotides) RNA. ∼1 billion reads from the 12 datasets were mapped on each chromosome and strand of the human genome. About one-third mapped uniquely, similar to other unbiased methods like SAGE. Normalizing for transcript length and scale using ß-actin expression level provided the ability to appropriately scale expression within a read-set and to compare expression levels across read-sets. Of the known protein-coding loci, ∼9,500 were present in human platelets. Plotting the number of protein-coding genes as a function of the level of normalized expression underscored different gene estimates between total and rRNA-depleted RNA preparations, and substantial inter-individual variation in the less abundant genes. RT-PCR validated the RNA-seq estimates of transcript levels exhibiting a range of >3 orders of magnitude of normalized read counts (r=0.7757; p=0.0001). A strong correlation was measured between mRNAs identified by RNA-seq and 3 published microarray datasets for well-expressed mRNAs, although RNA-seq identified many more transcripts of lower abundance. Unexpectedly, ribosomal RNA depletion significantly and adversely affected estimates of the relative abundance of transcripts including members of the RNA interference pathway DGCR8, DROSHA, XPO5, DICER1, EIF2C1-4, which exhibited large differences (up to 32-fold) between the total and rRNA-depleted preparations. A rigorous and highly stringent approach identified bona fide intronic regions that gave rise to 6,992 and 1,236 currently uncharacterized long and short RNA transcripts, respectively. We discovered numerous previously unreported antisense transcripts: 1) to known protein-coding regions of the genome, 2) 10 miRNA precursors where each locus generated 1–2 distinct antisense transcripts, presumably mature and “star” miRNAs, and 3) long and short RNAs antisense to several known repeat families. We did not observe enrichment of long-intergenic ncRNAs. We considered various possible explanations for the ∼60% sequence reads that could not be mapped on the genome. Much more lenient parameter settings only accounted for only ∼6.5% sequenced reads. An even smaller fraction of reads was observed when considering all possible combinations of exon-exon junctions in the genome (12,382,819 junctions) and the highly polymorphic HLA region of chr 6, indicating these did not contribute in any substantive manner to the platelet transcriptome. Lastly, RNA-seq was highly reproducible (>97 for 1 subject studied on 4 occasions). In summary, our work reveals a richness and diversity of platelet RNA molecules, suggesting a context where platelet biology transcends protein- and mRNA-centric descriptions. We will provide a publicly available web tool of these data embedded in a local mirror of the UCSC genome browser, facilitating the elucidation of previously unappreciated molecular species and molecular interactions. This will eventually permit an improved understanding of the molecular mechanisms that regulate platelet physiology and that contribute to disorders of thrombosis, hemostasis and inflammation. Disclosures: No relevant conflicts of interest to declare.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Andrea Hita ◽  
Gilles Brocart ◽  
Ana Fernandez ◽  
Marc Rehmsmeier ◽  
Anna Alemany ◽  
...  

Abstract Background Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. Results Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. Conclusions MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount.


2018 ◽  
Author(s):  
Douglas C. Wu ◽  
Jun Yao ◽  
Kevin S. Ho ◽  
Alan M. Lambowitz ◽  
Claus O. Wilke

AbstractBackgroundAlignment-free RNA quantification tools have significantly increased the speed of RNA-seq analysis. However, it is unclear whether these state-of-the-art RNA-seq analysis pipelines can quantify small RNAs as accurately as they do with long RNAs in the context of total RNA quantification.ResultWe comprehensively tested and compared four RNA-seq pipelines on the accuracies of gene quantification and fold-change estimation on a novel total RNA benchmarking dataset, in which small non-coding RNAs are highly represented along with other long RNAs. The four RNA-seq pipelines were of two commonly-used alignment-free pipelines and two variants of alignment-based pipelines. We found that all pipelines showed high accuracies for quantifying the expressions of long and highly-abundant genes. However, alignment-free pipelines showed systematically poorer performances in quantifying lowly-abundant and small RNAs.ConclusionWe have shown that alignment-free and traditional alignment-based quantification methods performed similarly for common gene targets, such as protein-coding genes. However, we identified a potential pitfall in analyzing and quantifying lowly-expressed genes and small RNAs with alignment-free pipelines, especially when these small RNAs contain mutations.


2021 ◽  
Vol 22 (14) ◽  
pp. 7298
Author(s):  
Izabela Rudzińska ◽  
Małgorzata Cieśla ◽  
Tomasz W. Turowski ◽  
Alicja Armatowska ◽  
Ewa Leśniewska ◽  
...  

The coordinated transcription of the genome is the fundamental mechanism in molecular biology. Transcription in eukaryotes is carried out by three main RNA polymerases: Pol I, II, and III. One basic problem is how a decrease in tRNA levels, by downregulating Pol III efficiency, influences the expression pattern of protein-coding genes. The purpose of this study was to determine the mRNA levels in the yeast mutant rpc128-1007 and its overdose suppressors, RBS1 and PRT1. The rpc128-1007 mutant prevents assembly of the Pol III complex and functionally mimics similar mutations in human Pol III, which cause hypomyelinating leukodystrophies. We applied RNAseq followed by the hierarchical clustering of our complete RNA-seq transcriptome and functional analysis of genes from the clusters. mRNA upregulation in rpc128-1007 cells was generally stronger than downregulation. The observed induction of mRNA expression was mostly indirect and resulted from the derepression of general transcription factor Gcn4, differently modulated by suppressor genes. rpc128-1007 mutation, regardless of the presence of suppressors, also resulted in a weak increase in the expression of ribosome biogenesis genes. mRNA genes that were downregulated by the reduction of Pol III assembly comprise the proteasome complex. In summary, our results provide the regulatory links affected by Pol III assembly that contribute differently to cellular fitness.


2021 ◽  
Vol 22 (5) ◽  
pp. 2683
Author(s):  
Princess D. Rodriguez ◽  
Hana Paculova ◽  
Sophie Kogut ◽  
Jessica Heath ◽  
Hilde Schjerven ◽  
...  

Non-coding RNAs (ncRNAs) comprise a diverse class of non-protein coding transcripts that regulate critical cellular processes associated with cancer. Advances in RNA-sequencing (RNA-Seq) have led to the characterization of non-coding RNA expression across different types of human cancers. Through comprehensive RNA-Seq profiling, a growing number of studies demonstrate that ncRNAs, including long non-coding RNA (lncRNAs) and microRNAs (miRNA), play central roles in progenitor B-cell acute lymphoblastic leukemia (B-ALL) pathogenesis. Furthermore, due to their central roles in cellular homeostasis and their potential as biomarkers, the study of ncRNAs continues to provide new insight into the molecular mechanisms of B-ALL. This article reviews the ncRNA signatures reported for all B-ALL subtypes, focusing on technological developments in transcriptome profiling and recently discovered examples of ncRNAs with biologic and therapeutic relevance in B-ALL.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Étienne Fafard-Couture ◽  
Danny Bergeron ◽  
Sonia Couture ◽  
Sherif Abou-Elela ◽  
Michelle S. Scott

Abstract Background Small nucleolar RNAs (snoRNAs) are mid-size non-coding RNAs required for ribosomal RNA modification, implying a ubiquitous tissue distribution linked to ribosome synthesis. However, increasing numbers of studies identify extra-ribosomal roles of snoRNAs in modulating gene expression, suggesting more complex snoRNA abundance patterns. Therefore, there is a great need for mapping the snoRNome in different human tissues as the blueprint for snoRNA functions. Results We used a low structure bias RNA-Seq approach to accurately quantify snoRNAs and compare them to the entire transcriptome in seven healthy human tissues (breast, ovary, prostate, testis, skeletal muscle, liver, and brain). We identify 475 expressed snoRNAs categorized in two abundance classes that differ significantly in their function, conservation level, and correlation with their host gene: 390 snoRNAs are uniformly expressed and 85 are enriched in the brain or reproductive tissues. Most tissue-enriched snoRNAs are embedded in lncRNAs and display strong correlation of abundance with them, whereas uniformly expressed snoRNAs are mostly embedded in protein-coding host genes and are mainly non- or anticorrelated with them. Fifty-nine percent of the non-correlated or anticorrelated protein-coding host gene/snoRNA pairs feature dual-initiation promoters, compared to only 16% of the correlated non-coding host gene/snoRNA pairs. Conclusions Our results demonstrate that snoRNAs are not a single homogeneous group of housekeeping genes but include highly regulated tissue-enriched RNAs. Indeed, our work indicates that the architecture of snoRNA host genes varies to uncouple the host and snoRNA expressions in order to meet the different snoRNA abundance levels and functional needs of human tissues.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Geneviève Bart ◽  
Daniel Fischer ◽  
Anatoliy Samoylenko ◽  
Artem Zhyvolozhnyi ◽  
Pavlo Stehantsev ◽  
...  

Abstract Background The human sweat is a mixture of secretions from three types of glands: eccrine, apocrine, and sebaceous. Eccrine glands open directly on the skin surface and produce high amounts of water-based fluid in response to heat, emotion, and physical activity, whereas the other glands produce oily fluids and waxy sebum. While most body fluids have been shown to contain nucleic acids, both as ribonucleoprotein complexes and associated with extracellular vesicles (EVs), these have not been investigated in sweat. In this study we aimed to explore and characterize the nucleic acids associated with sweat particles. Results We used next generation sequencing (NGS) to characterize DNA and RNA in pooled and individual samples of EV-enriched sweat collected from volunteers performing rigorous exercise. In all sequenced samples, we identified DNA originating from all human chromosomes, but only the mitochondrial chromosome was highly represented with 100% coverage. Most of the DNA mapped to unannotated regions of the human genome with some regions highly represented in all samples. Approximately 5 % of the reads were found to map to other genomes: including bacteria (83%), archaea (3%), and virus (13%), identified bacteria species were consistent with those commonly colonizing the human upper body and arm skin. Small RNA-seq from EV-enriched pooled sweat RNA resulted in 74% of the trimmed reads mapped to the human genome, with 29% corresponding to unannotated regions. Over 70% of the RNA reads mapping to an annotated region were tRNA, while misc. RNA (18,5%), protein coding RNA (5%) and miRNA (1,85%) were much less represented. RNA-seq from individually processed EV-enriched sweat collection generally resulted in fewer percentage of reads mapping to the human genome (7–45%), with 50–60% of those reads mapping to unannotated region of the genome and 30–55% being tRNAs, and lower percentage of reads being rRNA, LincRNA, misc. RNA, and protein coding RNA. Conclusions Our data demonstrates that sweat, as all other body fluids, contains a wealth of nucleic acids, including DNA and RNA of human and microbial origin, opening a possibility to investigate sweat as a source for biomarkers for specific health parameters.


2022 ◽  
Vol 0 (0) ◽  
Author(s):  
V. Janett Olzog ◽  
Lena I. Freist ◽  
Robin Goldmann ◽  
Jörg Fallmann ◽  
Christina E. Weinberg

Abstract Self-cleaving ribozymes are catalytic RNAs and can be found in all domains of life. They catalyze a site-specific cleavage that results in a 5′ fragment with a 2′,3′ cyclic phosphate (2′,3′ cP) and a 3′ fragment with a 5′ hydroxyl (5′ OH) end. Recently, several strategies to enrich self-cleaving ribozymes by targeted biochemical methods have been introduced by us and others. Here, we develop an alternative strategy in which 5ʹ OH RNAs are specifically ligated by RtcB ligase, which first guanylates the 3′ phosphate of the adapter and then ligates it directly to RNAs with 5′ OH ends. Our results demonstrate that adapter ligation to highly structured ribozyme fragments is much more efficient using the thermostable RtcB ligase from Pyrococcus horikoshii than the broadly applied Escherichia coli enzyme. Moreover, we investigated DNA, RNA and modified RNA adapters for their suitability in RtcB ligation reactions. We used the optimized RtcB-mediated ligation to produce RNA-seq libraries and captured a spiked 3ʹ twister ribozyme fragment from E. coli total RNA. This RNA-seq-based method is applicable to detect ribozyme fragments as well as other cellular RNAs with 5ʹ OH termini from total RNA.


Sign in / Sign up

Export Citation Format

Share Document