Accurate quantification of overlapping herpesvirus transcripts from RNA-seq data

2021 ◽  
Author(s):  
Alejandro Casco ◽  
Akansha Gupta ◽  
Mitchell Hayes ◽  
Reza Djavadian ◽  
Makoto Ohashi ◽  
...  

Herpesviruses employ extensive bidirectional transcription of overlapping genes to overcome length constraints on their gene product repertoire. As a consequence, many lytic transcripts cannot be measured individually by RT-qPCR or conventional RNA-seq analysis. Bruce et al. (Pathogens 2017, 6, 11; doi:10.3390/pathogens6010011) proposed an approximation method using Unique CoDing Sequences (UCDS) to estimate lytic gene abundance from KSHV RNA-seq data. Although UCDS has been widely employed, its accuracy, to our knowledge, has never been rigorously validated for any herpesvirus. In this study, we use CAGE-seq as a gold-standard to determine the accuracy of UCDS for estimating EBV lytic gene expression levels from RNA-seq data. We also introduce the Unique TranScript (UTS) method that, like UCDS, estimates transcript abundance from changes in mean RNA-seq read-depth. UTS is distinguished by its use of empirically determined 5’ and 3’ transcript ends, rather than coding sequence annotations. Compared to conventional read assignment, both UCDS and UTS improved quantitation accuracy of overlapping genes, with UTS giving the most accurate results. The UTS method discards fewer reads and may be advantageous for experiments with less sequencing depth. UTS is compatible with any aligner and, unlike isoform-aware alignment methods, can be implemented on a laptop computer. Our findings demonstrate that accuracy achieved by complex and expensive techniques such as CAGE-seq can be approximated using conventional short-read RNA-seq data when read assignment methods address transcript overlap. Although our study focuses on EBV transcription, the UTS method should be applicable across all herpesviruses and other genomes with extensively overlapping transcriptomes. IMPORTANCE Many viruses employ extensively overlapping transcript structures. This complexity makes it difficult to quantify gene expression using conventional methods including RNA-seq. Although high-throughput techniques that overcome these limitations exist, they are complex, expensive, and scarce in herpesvirus literature relative to short-read RNA-seq. Here, using Epstein-Barr virus (EBV) as a model, we demonstrate that conventional RNA-seq analysis methods fail to accurately quantify abundance of many overlapping transcripts. We further show that the previously described Unique CoDing Sequence (UCDS) and our Unique TranScript (UTS) methods greatly improve the accuracy of EBV lytic gene measurements obtained from RNA-seq data. The UTS method has the advantages of discarding fewer reads and being implementable on a laptop computer. Although this study focuses on EBV, the UCDS and UTS methods should be applicable across herpesviruses and for other viruses that make extensive use of overlapping transcription.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shuhua Zhan ◽  
Cortland Griswold ◽  
Lewis Lukens

Abstract Background Genetic variation for gene expression is a source of phenotypic variation for natural and agricultural species. The common approach to map and to quantify gene expression from genetically distinct individuals is to assign their RNA-seq reads to a single reference genome. However, RNA-seq reads from alleles dissimilar to this reference genome may fail to map correctly, causing transcript levels to be underestimated. Presently, the extent of this mapping problem is not clear, particularly in highly diverse species. We investigated if mapping bias occurred and if chromosomal features associated with mapping bias. Zea mays presents a model species to assess these questions, given it has genotypically distinct and well-studied genetic lines. Results In Zea mays, the inbred B73 genome is the standard reference genome and template for RNA-seq read assignments. In the absence of mapping bias, B73 and a second inbred line, Mo17, would each have an approximately equal number of regulatory alleles that increase gene expression. Remarkably, Mo17 had 2–4 times fewer such positively acting alleles than did B73 when RNA-seq reads were aligned to the B73 reference genome. Reciprocally, over one-half of the B73 alleles that increased gene expression were not detected when reads were aligned to the Mo17 genome template. Genes at dissimilar chromosomal ends were strongly affected by mapping bias, and genes at more similar pericentromeric regions were less affected. Biased transcript estimates were higher in untranslated regions and lower in splice junctions. Bias occurred across software and alignment parameters. Conclusions Mapping bias very strongly affects gene transcript abundance estimates in maize, and bias varies across chromosomal features. Individual genome or transcriptome templates are likely necessary for accurate transcript estimation across genetically variable individuals in maize and other species.


Author(s):  
Marine Guilcher ◽  
Arnaud Liehrmann ◽  
Chloé Seyman ◽  
Thomas Blein ◽  
Guillem Rigaill ◽  
...  

Plastid gene expression involves many post-transcriptional maturation steps resulting in a complex transcriptome composed of multiple isoforms. Although short read RNA-seq has considerably improved our understanding of the molecular mechanisms controlling these processes, it is unable to sequence full-length transcripts. This information is however crucial when it comes to understand the interplay between the various steps of plastid gene expression. Here, the study of the Arabidopsis leaf plastid transcriptome using Nanopore sequencing showed that many splicing and editing events were not independent but co-occurring. For a given transcript, maturation events also appeared to be chronologically ordered with splicing happening after most sites are edited.


2016 ◽  
Author(s):  
Olivier Poirion ◽  
Xun Zhu ◽  
Travers Ching ◽  
Lana X. Garmire

AbstractDespite its popularity, characterization of subpopulations with transcript abundance is subject to a significant amount of noise. We propose to use effective and expressed nucleotide variations (eeSNVs) from scRNA-seq as alternative features for tumor subpopulation identification. We developed a linear modeling framework, SSrGE, to link eeSNVs associated with gene expression. In all the datasets tested, eeSNVs achieve better accuracies than gene expression for identifying subpopulations. Previously validated cancer-relevant genes are also highly ranked, confirming the significance of the method. Moreover, SSrGE is capable of analyzing coupled DNA-seq and RNA-seq data from the same single cells, demonstrating its value in integrating multi-omics single cell techniques. In summary, SNV features from scRNA-seq data have merits for both subpopulation identification and linkage of genotype-phenotype relationship. The method SSrGE is available at https://github.com/lanagarmire/SSrGE.


2010 ◽  
Vol 08 (supp01) ◽  
pp. 177-192 ◽  
Author(s):  
XI WANG ◽  
ZHENGPENG WU ◽  
XUEGONG ZHANG

Due to its unprecedented high-resolution and detailed information, RNA-seq technology based on next-generation high-throughput sequencing significantly boosts the ability to study transcriptomes. The estimation of genes' transcript abundance levels or gene expression levels has always been an important question in research on the transcriptional regulation and gene functions. On the basis of the concept of Reads Per Kilo-base per Million reads (RPKM), taking the union-intersection genes (UI-based) and summing up inferred isoform abundance (isoform-based) are the two current strategies to estimate gene expression levels, but produce different estimations. In this paper, we made the first attempt to compare the two strategies' performances through a series of simulation studies. Our results showed that the isoform-based method gives not only more accurate estimation but also has less uncertainty than the UI-based strategy. If taking into account the non-uniformity of read distribution, the isoform-based method can further reduce estimation errors. We applied both strategies to real RNA-seq datasets of technical replicates, and found that the isoform-based strategy also displays a better performance. For a more accurate estimation of gene expression levels from RNA-seq data, even if the abundance levels of isoforms are not of interest, it is still better to first infer the isoform abundance and sum them up to get the expression level of a gene as a whole.


2014 ◽  
Author(s):  
David G. Robinson ◽  
Jean Wang ◽  
John D. Storey

Understanding the differences between microarray and RNA-Seq technologies for measuring gene expression is necessary for informed design of experiments and choice of data analysis methods. Previous comparisons have come to sometimes contradictory conclusions, which we suggest result from a lack of attention to the intensity-dependent nature of variation generated by the technologies. To examine this trend, we carried out a parallel nested experiment performed simultaneously on the two technologies that systematically split variation into four stages (treatment, biological variation, library preparation, and chip/lane noise), allowing a separation and comparison of the sources of variation in a well-controlled cellular system,Saccharomyces cerevisiae. With this novel dataset, we demonstrate that power and accuracy are more dependent on per-gene read depth in RNA-Seq than they are on fluorescence intensity in microarrays. However, we carried out qPCR validations which indicate that microarrays may demonstrate greater systematic bias in low-intensity genes than in RNA-seq.


Genes ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 89 ◽  
Author(s):  
Sheng-Kai Hsu ◽  
Ana Marija Jakšić ◽  
Viola Nolte ◽  
Neda Barghi ◽  
François Mallard ◽  
...  

Gene expression profiling is one of the most reliable high-throughput phenotyping methods, allowing researchers to quantify the transcript abundance of expressed genes. Because many biotic and abiotic factors influence gene expression, it is recommended to control them as tightly as possible. Here, we show that a 24 h age difference of Drosophila simulans females that were subjected to RNA sequencing (RNA-Seq) five and six days after eclosure resulted in more than 2000 differentially expressed genes. This is twice the number of genes that changed expression during 100 generations of evolution in a novel hot laboratory environment. Importantly, most of the genes differing in expression due to age introduce false positives or negatives if an adaptive gene expression analysis is not controlled for age. Our results indicate that tightly controlled experimental conditions, including precise developmental staging, are needed for reliable gene expression analyses, in particular in an evolutionary framework.


2015 ◽  
Vol 9 ◽  
pp. BBI.S33124 ◽  
Author(s):  
Peter R. LoVerso ◽  
Christopher M. Wachter ◽  
Feng Cui

The mammalian brain is characterized by distinct classes of cells that differ in morphology, structure, signaling, and function. Dysregulation of gene expression in these cell populations leads to various neurological disorders. Neural cells often need to be acutely purified from animal brains for research, which requires complicated procedure and specific expertise. Primary culture of these cells in vitro is a viable alternative, but the differences in gene expression of cells grown in vitro and in vivo remain unclear. Here, we cultured three major neural cell classes of rat brain (ie, neurons, astrocytes, and oligodendrocyte precursor cells [OPCs]) obtained from commercial sources. We measured transcript abundance of these cell types by RNA sequencing (RNA-seq) and compared with their counterparts acutely purified from mouse brains. Cross-species RNA-seq data analysis revealed hundreds of genes that are differentially expressed between the cultured and acutely purified cells. Astrocytes have more such genes compared to neurons and OPCs, indicating that signaling pathways are greatly perturbed in cultured astrocytes. This dataset provides a powerful resource to demonstrate the similarities and differences of biological processes in mammalian neural cells grown in vitro and in vivo at the molecular level.


2021 ◽  
Vol 22 (20) ◽  
pp. 11297
Author(s):  
Marine Guilcher ◽  
Arnaud Liehrmann ◽  
Chloé Seyman ◽  
Thomas Blein ◽  
Guillem Rigaill ◽  
...  

Plastid gene expression involves many post-transcriptional maturation steps resulting in a complex transcriptome composed of multiple isoforms. Although short-read RNA-Seq has considerably improved our understanding of the molecular mechanisms controlling these processes, it is unable to sequence full-length transcripts. This information is crucial, however, when it comes to understanding the interplay between the various steps of plastid gene expression. Here, we describe a protocol to study the plastid transcriptome using nanopore sequencing. In the leaf of Arabidopsis thaliana, with about 1.5 million strand-specific reads mapped to the chloroplast genome, we could recapitulate most of the complexity of the plastid transcriptome (polygenic transcripts, multiple isoforms associated with post-transcriptional processing) using virtual Northern blots. Even if the transcripts longer than about 2,500 nucleotides were missing, the study of the co-occurrence of editing and splicing events identified 42 pairs of events that were not occurring independently. This study also highlighted a preferential chronology of maturation events with splicing happening after most sites were edited.


2016 ◽  
Author(s):  
Lukas Paul ◽  
Petra Kubala ◽  
Gudrun Horner ◽  
Michael Ante ◽  
Igor Holländer ◽  
...  

AbstractSpike-In RNA variants (SIRVs) enable for the first time the validation of RNA sequencing workflows using external isoform transcript controls. 69 transcripts, derived from seven human model genes, cover the eukaryotic transcriptome complexity of start- and end-site variations, alternative splicing, overlapping genes, and antisense transcription in a condensed format. Reference RNA samples were spiked with SIRV mixes, sequenced, and exemplarily four data evaluation pipelines were challenged to account for biases introduced by the RNA-Seq workflow. The deviations of the respective isoform quantifications from the known inputs allow to determine the comparability of sequencing experiments and to extrapolate to which degree alterations in an RNA-Seq workflow affect gene expression measurements. The SIRVs as external isoform controls are an important gauge for inter-experimental comparability and a modular spike-in contribution to clear the way for diagnostic RNA-Seq applications.


2020 ◽  
Author(s):  
Abdellah Barakate ◽  
Jamie Orr ◽  
Miriam Schreiber ◽  
Isabelle Colas ◽  
Dominika Lewandowska ◽  
...  

ABSTRACTIn flowering plants, successful germinal cell development and meiotic recombination depend upon a combination of environmental and genetic factors. To gain insights into this specialised reproductive development programme we used short- and long-read RNA-sequencing (RNA-seq) to study the temporal dynamics of transcript abundance in immuno-cytologically staged barley (Hordeum vulgare) anthers and meiocytes. We show that the most significant transcriptional changes occur at the transition from pre-meiosis to leptotene–zygotene, which is followed by largely stable transcript abundance throughout prophase I. Our analysis reveals that the developing anthers and meiocytes are enriched in long non-coding RNAs (lncRNAs) and that entry to meiosis is characterized by their robust and significant down regulation. Intriguingly, only 24% of a collection of putative meiotic gene orthologues showed differential transcript abundance in at least one stage or tissue comparison. Changes in the abundance of numerous transcription factors, representatives of the small RNA processing machinery, and post-translational modification pathways highlight the complexity of the regulatory networks involved. These developmental, time-resolved, and dynamic transcriptomes increase our understanding of anther and meiocyte development and will help guide future research.One sentence summaryAnalysis of RNA-seq data from meiotically staged barley anthers and meiocytes highlights the role of lncRNAs within a complex network of transcriptional and post-transcriptional regulation accompanied by a hiatus in differential gene expression during prophase I.The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Robbie Waugh ([email protected])


Sign in / Sign up

Export Citation Format

Share Document