scholarly journals Long-read sequencing of the human cytomegalovirus transcriptome with the Pacific Biosciences RSII platform

2017 ◽  
Vol 4 (1) ◽  
Author(s):  
Zsolt Balázs ◽  
Dóra Tombácz ◽  
Attila Szűcs ◽  
Michael Snyder ◽  
Zsolt Boldogkői

Abstract Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.

2018 ◽  
Vol 5 (1) ◽  
Author(s):  
Zsolt Balázs ◽  
Dóra Tombácz ◽  
Attila Szűcs ◽  
Michael Snyder ◽  
Zsolt Boldogkői

2021 ◽  
Author(s):  
Gábor Torma ◽  
Dóra Tombácz ◽  
Norbert Moldován ◽  
Ádám Fülöp ◽  
István Prazsák ◽  
...  

Abstract In this study, we used two long-read sequencing (LRS) techniques, Sequel from the Pacific Biosciences and MinION from Oxford Nanopore Technologies, for the transcriptional characterization of a prototype baculovirus, Autographacalifornica multiple nucleopolyhedrovirus. LRS is able to read full-length RNA molecules, and thereby to distinguish between transcript isoforms, mono- and polycistronic RNAs, and overlapping transcripts. Altogether, we detected 875 transcripts, of which 759 are novel and 116 have been annotated previously. These RNA molecules include 41 novel putative protein coding transcript (each containing 5’-truncated in-frame ORFs), 14 monocistronic transcripts, 99 multicistronic RNAs, 101 non-coding RNA, and 504 length isoforms. We also detected RNA methylation in 12 viral genes and RNA hyper-editing in the longer 5’-UTR transcript isoform of ORF 19 gene.


Viruses ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2531
Author(s):  
Riteng Zhang ◽  
Peixin Wang ◽  
Xin Ma ◽  
Yifan Wu ◽  
Chen Luo ◽  
...  

The TRS-mediated discontinuous transcription process is a hallmark of Arteriviruses. Precise assessment of the intricate subgenomic RNA (sg mRNA) populations is required to understand the kinetics of viral transcription. It is difficult to reconstruct and comprehensively quantify splicing events using short-read sequencing, making the identification of transcription-regulatory sequences (TRS) particularly problematic. Here, we applied long-read direct RNA sequencing to characterize the recombined RNA molecules produced in porcine alveolar macrophages during early passage infection of porcine reproductive and respiratory syndrome virus (PRRSV). Based on sequencing two PRRSV isolates, namely XM-2020 and GD, we revealed a high-resolution and diverse transcriptional landscape in PRRSV. The data revealed intriguing differences in subgenomic recombination types between the two PRRSVs while also demonstrating TRS-independent heterogeneous subpopulation not previously observed in Arteriviruses. We find that TRS usage is a regulated process and share the common preferred TRS in both strains. This study also identified a substantial number of TRS-mediated transcript variants, including alternative-sg mRNAs encoding the same annotated ORF, as well as putative sg mRNAs encoded nested internal ORFs, implying that the genetic information encoded in PRRSV may be more intensively expressed. Epigenetic modifications have emerged as an essential regulatory layer in gene expression. Here, we gained a deeper understanding of m5C modification in poly(A) RNA, elucidating a potential link between methylation and transcriptional regulation. Collectively, our findings provided meaningful insights for redefining the transcriptome complexity of PRRSV. This will assist in filling the research gaps and developing strategies for better control of the PRRS.


2018 ◽  
Vol 9 ◽  
Author(s):  
Zsolt Balázs ◽  
Dóra Tombácz ◽  
Attila Szűcs ◽  
Michael Snyder ◽  
Zsolt Boldogkői

2020 ◽  
Author(s):  
Gábor Torma ◽  
Dóra Tombácz ◽  
Zsolt Csabai ◽  
Norbert Moldován ◽  
István Mészáros ◽  
...  

ABSTRACTAfrican swine fever virus (ASFV) is a large DNA virus belonging to the Asfarviridae family. Despite its agricultural importance, little is known about the fundamental molecular mechanisms of this pathogen. Understanding of genetic regulation provides new insights into the virus pathogenicity, which can help prevent epidemics. Short-read sequencing (SRS) is able to produce a huge amount of high-precision sequencing reads for transcriptomic profiling, but it is inefficient for the comprehensive annotation of transcriptomes. Long-read sequencing (LRS) is able to overcome some of the limitations of SRS, but they also have drawbacks, such as low-coverage and high error rate. The limitations of the two approaches can be surmounted by the combined use of these techniques. In this study, we used Illumina SRS and Oxford Nanopore Technologies LRS platforms with multiple library preparation methods (amplified and direct cDNA sequencings and native RNA sequencing) for constructing the transcriptomic atlas of ASFV. This work identified a large number of novel genes, transcripts and RNA isoforms, and annotated the precise termini of previously described RNA molecules. In contrast to the current view that the ASFV transcripts are monocistronic, we detected a significant extent of polycistronism. A multifaceted meshwork of transcriptional overlaps is also discovered.


2020 ◽  
Author(s):  
Dóra Tombácz ◽  
István Prazsák ◽  
Zoltán Maróti ◽  
Norbert Moldován ◽  
Zsolt Csabai ◽  
...  

AbstractCharacterization of global transcriptomes using conventional short-read sequencing is challenging because of the insensitivity of these platforms to transcripts isoforms, multigenic RNA molecules, and transcriptional overlaps, etc. Long-read sequencing (LRS) can overcome these limitations by reading full-length transcripts. Employment of these technologies has led to the redefinition of transcriptional complexities in reported organisms. In this study, we applied LRS platforms from Pacific Biosciences and Oxford Nanopore Technologies to profile the dynamic vaccinia virus (VACV) transcriptome and assess the effect of viral infection on host gene expression. We performed cDNA and direct RNA sequencing analyses and revealed an extremely complex transcriptional landscape of this virus. In particular, VACV genes produce large numbers of transcript isoforms that vary in their start and termination sites. A significant fraction of VACV transcripts start or end within coding regions of neighboring genes. We distinguished five classes of host genes according to their temporal responses to viral infection. This study provides novel insights into the transcriptomic profile of a viral pathogen and the effect of the virus on host gene expression.Author SummaryViral transcriptomes that are determined using conventional (first- and second-generation) sequencing techniques are incomplete because these platforms are inefficient or fail to distinguish between types of transcripts and transcript isoforms. In particular, conventional sequencing techniques fail to distinguish between parallel overlapping transcripts, including alternative polycistronic transcripts, transcriptional start site (TSS) and transcriptional end site (TES) isoforms, and splice variants and RNA molecules that are produced by transcriptional read-throughs. Long-read sequencing (LRS) can provide complete sets of RNA molecules, and can therefore be used to assemble complete transcriptome atlases of organisms. Although vaccinia virus (VACV) does not produce spliced RNAs, its transcriptome contains large numbers of TSSs and TESs for individual viral genes and has a high degree of polycistronism, together leading to enormous complexity. In this study, we applied single molecule real-time and nanopore-based cDNA and direct-RNA sequencing methods to investigate transcripts of VACV and the host organism.


mSystems ◽  
2021 ◽  
Vol 6 (2) ◽  
Author(s):  
V. Vern Lee ◽  
Louise M. Judd ◽  
Aaron R. Jex ◽  
Kathryn E. Holt ◽  
Christopher J. Tonkin ◽  
...  

ABSTRACT Alternative splicing is a widespread phenomenon in metazoans by which single genes are able to produce multiple isoforms of the gene product. However, this has been poorly characterized in apicomplexans, a major phylum of some of the most important global parasites. Efforts have been hampered by atypical transcriptomic features, such as the high AU content of Plasmodium RNA, but also the limitations of short-read sequencing in deciphering complex splicing events. In this study, we utilized the long read direct RNA sequencing platform developed by Oxford Nanopore Technologies to survey the alternative splicing landscape of Toxoplasma gondii and Plasmodium falciparum. We find that while native RNA sequencing has a reduced throughput, it allows us to obtain full-length or nearly full-length transcripts with comparable quantification to Illumina sequencing. By comparing these data with available gene models, we find widespread alternative splicing, particularly intron retention, in these parasites. Most of these transcripts contain premature stop codons, suggesting that in these parasites, alternative splicing represents a pathway to transcriptomic diversity, rather than expanding proteomic diversity. Moreover, alternative splicing rates are comparable between parasites, suggesting a shared splicing machinery, despite notable transcriptomic differences between the parasites. This study highlights a strategy in using long-read sequencing to understand splicing events at the whole-transcript level and has implications in the future interpretation of transcriptome sequencing studies. IMPORTANCE We have used a novel nanopore sequencing technology to directly analyze parasite transcriptomes. The very long reads of this technology reveal the full-length genes of the parasites that cause malaria and toxoplasmosis. Gene transcripts must be processed in a process called splicing before they can be translated to protein. Our analysis reveals that these parasites very frequently only partially process their gene products, in a manner that departs dramatically from their human hosts.


2017 ◽  
Author(s):  
Jingyuan Hu ◽  
Prech Uapinyoying ◽  
Jeremy Goecks

AbstractBackgroundLong-read RNA sequencing, such as Pacific Biosciences’ Iso-Seq method, enables generation of sequencing reads that are 10 kilobases or even longer. These reads are ideal for discovering splice junctions and resolving full-length gene transcripts without time-consuming and error-prone techniques such as transcript assembly and junction inference.ResultsIso-Seq Browser is a Web-based visual analytics tool for long-read RNA sequencing data produced by Pacific Biosciences’ isoform sequencing (Iso-Seq) techniques. Key features of the Iso-Seq Browser are: 1) an exon-only web-based interface with zooming and exon highlighting for exploring reference gene transcripts and novel gene isoforms, 2) automated grouping of transcripts and isoforms by similarity, 3) many customization features for data exploration and creating publication ready figures, and 4) exporting selected isoforms into fasta files for further analysis. Iso-Seq Browser is written in Python using several scientific libraries. The application and analyses described in this paper are freely available to both academic and commercial users at https://github.com/goeckslab/isoseq-browserConclusionsIso-Seq Browser enables interactive genome-wide visual analysis of long RNA sequence reads. Through visualization, highlighting, clustering, and filtering of gene isoforms, ISB makes it simple to identify novel isoforms and novel isoform features such as exons, introns and untranslated regions.


2019 ◽  
Author(s):  
Ying-Chih Wang ◽  
Nathan D Olson ◽  
Gintaras Deikus ◽  
Hardik Shah ◽  
Aaron M Wenger ◽  
...  

AbstractSingle-molecule long-read sequencing datasets were generated for a son-father-mother trio of Han Chinese descent that is part of the Genome In a Bottle (GIAB) consortium portfolio. The dataset was generated using the Pacific Biosciences Sequel System. The son and each parent were sequenced to an average coverage of 60 and 30, respectively, with N50 subread lengths between 16 and 18 kb. Raw reads and reads aligned to both the GRCh37 and GRCh38 are available at the NCBI GIAB ftp site (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/) and the raw read data is archived in NCBI SRA (SRX4739017, SRX4739121, and SRX4739122). This dataset is available for anyone to develop and evaluate long-read bioinformatics methods.


Sign in / Sign up

Export Citation Format

Share Document