scholarly journals Long-read Assays Shed New Light on the Transcriptome Complexity of a Viral Pathogen and on Virus-Host Interaction

2020 ◽  
Author(s):  
Dóra Tombácz ◽  
István Prazsák ◽  
Zoltán Maróti ◽  
Norbert Moldován ◽  
Zsolt Csabai ◽  
...  

AbstractCharacterization of global transcriptomes using conventional short-read sequencing is challenging because of the insensitivity of these platforms to transcripts isoforms, multigenic RNA molecules, and transcriptional overlaps, etc. Long-read sequencing (LRS) can overcome these limitations by reading full-length transcripts. Employment of these technologies has led to the redefinition of transcriptional complexities in reported organisms. In this study, we applied LRS platforms from Pacific Biosciences and Oxford Nanopore Technologies to profile the dynamic vaccinia virus (VACV) transcriptome and assess the effect of viral infection on host gene expression. We performed cDNA and direct RNA sequencing analyses and revealed an extremely complex transcriptional landscape of this virus. In particular, VACV genes produce large numbers of transcript isoforms that vary in their start and termination sites. A significant fraction of VACV transcripts start or end within coding regions of neighboring genes. We distinguished five classes of host genes according to their temporal responses to viral infection. This study provides novel insights into the transcriptomic profile of a viral pathogen and the effect of the virus on host gene expression.Author SummaryViral transcriptomes that are determined using conventional (first- and second-generation) sequencing techniques are incomplete because these platforms are inefficient or fail to distinguish between types of transcripts and transcript isoforms. In particular, conventional sequencing techniques fail to distinguish between parallel overlapping transcripts, including alternative polycistronic transcripts, transcriptional start site (TSS) and transcriptional end site (TES) isoforms, and splice variants and RNA molecules that are produced by transcriptional read-throughs. Long-read sequencing (LRS) can provide complete sets of RNA molecules, and can therefore be used to assemble complete transcriptome atlases of organisms. Although vaccinia virus (VACV) does not produce spliced RNAs, its transcriptome contains large numbers of TSSs and TESs for individual viral genes and has a high degree of polycistronism, together leading to enormous complexity. In this study, we applied single molecule real-time and nanopore-based cDNA and direct-RNA sequencing methods to investigate transcripts of VACV and the host organism.

Author(s):  
Fairlie Reese ◽  
Ali Mortazavi

Abstract Motivation Long-read RNA-sequencing technologies such as PacBio and Oxford Nanopore have discovered an explosion of new transcript isoforms that are difficult to visually analyze using currently available tools. We introduce the Swan Python library, which is designed to analyze and visualize transcript models. Results Swan finds 4909 differentially expressed transcripts between cell lines HepG2 and HFFc6, including 279 that are differentially expressed even though the parent gene is not. Additionally, Swan discovers 285 reproducible exon skipping and 47 intron retention events not recorded in the GENCODE v29 annotation. Availability and implementation The Swan library for Python 3 is available on PyPi at https://pypi.org/project/swan-vis/ and on GitHub at https://github.com/mortazavilab/swan_vis.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S633-S634
Author(s):  
Rachael E Mahle ◽  
Sunil Suchindran ◽  
Ricardo Henao ◽  
Julie M Steinbrink ◽  
Thomas W Burke ◽  
...  

Abstract Background Difficulty distinguishing bacterial and viral infections contributes to excess antibiotic use. A host response strategy overcomes many limitations of pathogen-based tests, but depends on a functional immune system. This approach may therefore be limited in immunocompromised (IC) hosts. Here, we evaluated a host response test in IC subjects, which has not been extensively studied in this manner. Methods An 81-gene signature was measured using qRT-PCR in previously enrolled IC subjects (chemotherapy, solid organ transplant, immunomodulatory agents, AIDS) with confirmed bacterial infection, viral infection, or non-infectious illness (NI). A regularized logistic regression model estimated the likelihood of bacterial, viral, and noninfectious classes. Clinical adjudication was the reference standard. Results A host gene expression model trained in a cohort of 136 immunocompetent subjects (43 bacterial, 41 viral, and 52 NI) had an overall accuracy of 84.6% for the diagnosis of bacterial vs. non-bacterial infection and 80.8% for viral vs. non-viral infection. The model was validated in an independent cohort of 134 IC subjects (64 bacterial, 28 viral, 42 NI). The overall accuracy was 73.9% for bacterial infection (p=0.03 vs. training cohort) and 75.4% for viral infection (p=0.27). Test utility could be improved by reporting probability ranges. For example, results divided into probability quartiles would allow the highest quartile to be used to rule in infection and the lowest to rule out infection. For IC subjects in the lowest quartile, the test had 90.1% and 96.4% sensitivity for bacterial and viral infection, respectively. For the highest quartile, the test had 91.4% and 84.0% specificity for bacterial and viral infection, respectively. The type or number of immunocompromising conditions did not impact performance. Illness Etiology Probabilities Conclusion A host gene expression test discriminated bacterial, viral, and non-infectious etiologies at a lower overall accuracy in IC patients compared to immunocompetent patients, though this difference was only significant for bacterial vs non-bacterial disease. With modified interpretive criteria, a host response strategy may offer clinically useful and complementary diagnostic information for IC patients. Disclosures Thomas W. Burke, PhD, Predigen, Inc (Consultant) Geoffrey S. Ginsburg, MD PhD, Predigen, Inc (Shareholder, Other Financial or Material Support) Christopher W. Woods, MD, MPH, FIDSA, Predigen, Inc (Shareholder, Other Financial or Material Support) Ephraim L. Tsalik, MD, MHS, PhD, FIDSA, Predigen, Inc (Scientific Research Study Investigator, Shareholder, Other Financial or Material Support)


2018 ◽  
Author(s):  
Koen Van Den Berge ◽  
Katharina Hembach ◽  
Charlotte Soneson ◽  
Simone Tiberi ◽  
Lieven Clement ◽  
...  

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.


Viruses ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2531
Author(s):  
Riteng Zhang ◽  
Peixin Wang ◽  
Xin Ma ◽  
Yifan Wu ◽  
Chen Luo ◽  
...  

The TRS-mediated discontinuous transcription process is a hallmark of Arteriviruses. Precise assessment of the intricate subgenomic RNA (sg mRNA) populations is required to understand the kinetics of viral transcription. It is difficult to reconstruct and comprehensively quantify splicing events using short-read sequencing, making the identification of transcription-regulatory sequences (TRS) particularly problematic. Here, we applied long-read direct RNA sequencing to characterize the recombined RNA molecules produced in porcine alveolar macrophages during early passage infection of porcine reproductive and respiratory syndrome virus (PRRSV). Based on sequencing two PRRSV isolates, namely XM-2020 and GD, we revealed a high-resolution and diverse transcriptional landscape in PRRSV. The data revealed intriguing differences in subgenomic recombination types between the two PRRSVs while also demonstrating TRS-independent heterogeneous subpopulation not previously observed in Arteriviruses. We find that TRS usage is a regulated process and share the common preferred TRS in both strains. This study also identified a substantial number of TRS-mediated transcript variants, including alternative-sg mRNAs encoding the same annotated ORF, as well as putative sg mRNAs encoded nested internal ORFs, implying that the genetic information encoded in PRRSV may be more intensively expressed. Epigenetic modifications have emerged as an essential regulatory layer in gene expression. Here, we gained a deeper understanding of m5C modification in poly(A) RNA, elucidating a potential link between methylation and transcriptional regulation. Collectively, our findings provided meaningful insights for redefining the transcriptome complexity of PRRSV. This will assist in filling the research gaps and developing strategies for better control of the PRRS.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Zoltán Maróti ◽  
Dóra Tombácz ◽  
István Prazsák ◽  
Norbert Moldován ◽  
Zsolt Csabai ◽  
...  

Abstract Objective In this study, we applied two long-read sequencing (LRS) approaches, including single-molecule real-time and nanopore-based sequencing methods to investigate the time-lapse transcriptome patterns of host gene expression as a response to Vaccinia virus infection. Transcriptomes determined using short-read sequencing approaches are incomplete because these platforms are inefficient or fail to distinguish between polycistronic RNAs, transcript isoforms, transcriptional start sites, as well as transcriptional readthroughs and overlaps. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Results In this work, we identified a number of novel transcripts and transcript isoforms of Chlorocebus sabaeus. Additionally, analysis of the most abundant 768 host transcripts revealed a significant overrepresentation of the class of genes in the “regulation of signaling receptor activity” Gene Ontology annotation as a result of viral infection.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S629-S630
Author(s):  
Nicholas Bodkin ◽  
Melissa H Ross ◽  
Ricardo Henao ◽  
Ephraim L Tsalik

Abstract Background Host gene expression has emerged as a promising diagnostic strategy to discriminate bacterial and viral infection. Multiple gene signatures of varying size and complexity have been developed in various clinical populations. However, there has been no systematic comparison of these signatures. It is also unclear how these signatures apply to different clinical populations. This meta-analysis examined 19 published signatures, validated in 49 publicly available datasets for a total of 4750 patients. The objectives were to understand how the signatures compared to each other with respect to composition and performance, and to evaluate their performance in different patient subgroups. Methods Signatures were characterized with respect to size, platform, and discovery population. For each of the 19 signatures, we ran leave-one-out cross-validation to generate AUCs for bacterial classification and viral classification. We then applied dataset-specific thresholds to generate accuracies, pooling patients across datasets. Results Signature performance varied significantly with a median AUC across all validation datasets ranging from 0.55 to 0.94 for bacterial classification and 0.79 to 0.96 for viral classification. Signature size varied (1- 341 genes) with smaller signatures generally performing more poorly for both bacterial classification (P < .001) and for viral classification (P = 0.02). Viral infection was easier to diagnose than bacterial infection (85% vs. 80% overall accuracy, respectively; P < .001). Host gene expression classifiers performed more poorly in children < 12-years compared to those older than 12-years for both bacterial infection (77% vs. 83%, respectively; P < .001) and for viral infection (82% vs. 89%, respectively; P < .001). We did not observe differences based on illness severity as defined by ICU care for either bacterial or viral infections. Conclusion We observed significant differences among gene expression signatures for bacterial/viral discrimination, though these were not due to variations in the discovery methods or populations. Signature size directly correlated with test performance, which was generally better for the diagnosis of viral infection and in populations >12-years. Disclosures Ephraim L. Tsalik, MD, MHS, PhD, Predigen (Shareholder, Other Financial or Material Support, Founder)


2018 ◽  
Author(s):  
István Prazsák ◽  
Norbert Moldován ◽  
Dóra Tombácz ◽  
Klára Megyeri ◽  
Attila Szűcs ◽  
...  

AbstractBackgroundVaricella zoster virus (VZV) is a human pathogenic alphaherpesvirus harboring a relatively large DNA molecule. The VZV transcriptome has already been analyzed by microarray and short-read sequencing analyses. However, both approaches have substantial limitations when used for structural characterization of transcript isoforms, even if supplemented with primer extension or other techniques. Among others, they are inefficient in distinguishing between embedded RNA molecules, transcript isoforms, including splice and length variants, as well as between alternative polycistronic transcripts. It has been demonstrated in several studies that long-read sequencing is able to circumvent these problems.ResultsIn this work, we report the analysis of VZV lytic transcriptome using the Oxford Nanopore Technologies sequencing platform. These investigations have led to the identification of 114 novel transcripts, including mRNAs, non-coding RNAs, polycistronic RNAs and complex transcripts, as well as 10 novel spliced transcripts and 27 novel transcription start site isoforms and transcription end site isoforms. A novel class of transcripts, the nroRNAs are described in this study. These transcripts are encoded by the genomic region located in close vicinity to the viral replication origin. We also show that the VZV latency transcript (VLT) exhibits a more complex structural variation than formerly believed. Additionally, we have detected RNA editing in a novel non-coding RNA molecule.ConclusionsOur investigations disclosed a composite transcriptomic architecture of VZV, including the discovery of novel RNA molecules and transcript isoforms, as well as a complex meshwork of transcriptional read-throughs and overlaps. The results represent a substantial advance in the annotation VZV transcriptome and in understanding the molecular biology of the herpesviruses in general.


BMC Biology ◽  
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Signe Altmäe ◽  
Nerea M. Molina ◽  
Alberto Sola-Leyva

AbstractA recent paper in BMC Biology entitled “A tissue level atlas of the healthy human virome” by Kumata et al. describes a meta-transcriptomic analysis of RNA-sequencing datasets from the Genotype-Tissue Expression (GTEx) Project. Using a workflow that maps the GTEx sequences to the human genome, then screens unmapped sequences to detect viral transcripts, the authors present a quantitative analysis of the presence of different viruses in the non-diseased tissues of over 500 individuals and assess the impact of these viruses on host gene expression. Here we draw attention to an issue not acknowledged in this study. Namely, by relying solely on GTEx datasets, which are enriched for transcripts with poly(A) tails, the analysis will have missed non-poly(A) viral transcripts, rendering this tissue level atlas of the virome incomplete.A commentary on Kumata et al. (BMC Biol 18:55, 2020).


2017 ◽  
Vol 4 (1) ◽  
Author(s):  
Zsolt Balázs ◽  
Dóra Tombácz ◽  
Attila Szűcs ◽  
Michael Snyder ◽  
Zsolt Boldogkői

Abstract Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.


Author(s):  
Koen Van Den Berge ◽  
Katharina Hembach ◽  
Charlotte Soneson ◽  
Simone Tiberi ◽  
Lieven Clement ◽  
...  

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.


Sign in / Sign up

Export Citation Format

Share Document