Dual Platform Long-Read RNA-Sequencing Dataset of the Human Cytomegalovirus Lytic Transcriptome

Abstract Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.

Download Full-text

Long-read RNA sequencing analysis of the lytic human cytomegalovirus transcriptome

10.14232/phd.10113 ◽

2019 ◽

Author(s):

Zsolt Balázs

Keyword(s):

Rna Sequencing ◽

Human Cytomegalovirus ◽

Sequencing Analysis ◽

Long Read

Download Full-text

Genome sequences of human cytomegalovirus strain TB40/E variants propagated in fibroblasts and epithelial cells

Virology Journal ◽

10.1186/s12985-021-01583-3 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Ahmed Al Qaffas ◽

Salvatore Camiolo ◽

Mai Vo ◽

Alexis Aguiar ◽

Amine Ourahmane ◽

...

Keyword(s):

Epithelial Cells ◽

Human Cytomegalovirus ◽

Viral Entry ◽

Sequence Data ◽

Laboratory Strain ◽

Serial Passage ◽

Wild Type Virus ◽

Protein Coding ◽

Genetic Changes ◽

Long Read

AbstractThe advent of whole genome sequencing has revealed that common laboratory strains of human cytomegalovirus (HCMV) have major genetic deficiencies resulting from serial passage in fibroblasts. In particular, tropism for epithelial and endothelial cells is lost due to mutations disrupting genes UL128, UL130, or UL131A, which encode subunits of a virion-associated pentameric complex (PC) important for viral entry into these cells but not for entry into fibroblasts. The endothelial cell-adapted strain TB40/E has a relatively intact genome and has emerged as a laboratory strain that closely resembles wild-type virus. However, several heterogeneous TB40/E stocks and cloned variants exist that display a range of sequence and tropism properties. Here, we report the use of PacBio sequencing to elucidate the genetic changes that occurred, both at the consensus level and within subpopulations, upon passaging a TB40/E stock on ARPE-19 epithelial cells. The long-read data also facilitated examination of the linkage between mutations. Consistent with inefficient ARPE-19 cell entry, at least 83% of viral genomes present before adaptation contained changes impacting PC subunits. In contrast, and consistent with the importance of the PC for entry into endothelial and epithelial cells, genomes after adaptation lacked these or additional mutations impacting PC subunits. The sequence data also revealed six single noncoding substitutions in the inverted repeat regions, single nonsynonymous substitutions in genes UL26, UL69, US28, and UL122, and a frameshift truncating gene UL141. Among the changes affecting protein-coding regions, only the one in UL122 was strongly selected. This change, resulting in a D390H substitution in the encoded protein IE2, has been previously implicated in rendering another viral protein, UL84, essential for viral replication in fibroblasts. This finding suggests that IE2, and perhaps its interactions with UL84, have important functions unique to HCMV replication in epithelial cells.

Download Full-text

TagSeqTools: a flexible and comprehensive analysis pipeline for NAD tagSeq data

10.1101/2020.03.09.982934 ◽

2020 ◽

Cited By ~ 1

Author(s):

Huan Zhong ◽

Zongwei Cai ◽

Zhu Yang ◽

Yiji Xia

Keyword(s):

Rna Sequencing ◽

Comprehensive Analysis ◽

Enzymatic Reactions ◽

Computational Tool ◽

Sequencing Data ◽

Analysis Pipeline ◽

Oxford Nanopore ◽

Long Read ◽

Identification And Characterization

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.

Download Full-text

Erratum: Corrigendum: Long-read sequencing of the human cytomegalovirus transcriptome with the pacific biosciences RSII platform

Scientific Data ◽

10.1038/sdata.2018.32 ◽

2018 ◽

Vol 5 (1) ◽

Author(s):

Zsolt Balázs ◽

Dóra Tombácz ◽

Attila Szűcs ◽

Michael Snyder ◽

Zsolt Boldogkői

Keyword(s):

Human Cytomegalovirus ◽

Pacific Biosciences ◽

The Pacific ◽

Long Read

Download Full-text

Abstract 2724: A combination of short-read and long-read RNA sequencing reveals NOVA1’s role in telomere biology

10.1158/1538-7445.sabcs18-2724 ◽

2019 ◽

Author(s):

Andrew T. Ludlow ◽

Mohammed E. Sayed ◽

Aaron L. Slusher ◽

Mark Ribick ◽

Anisha Pancholi ◽

...

Keyword(s):

Rna Sequencing ◽

Short Read ◽

Long Read ◽

Telomere Biology

Download Full-text

Swan: a library for the analysis and visualization of long-read transcriptomes

Bioinformatics ◽

10.1093/bioinformatics/btaa836 ◽

2020 ◽

Author(s):

Fairlie Reese ◽

Ali Mortazavi

Keyword(s):

Rna Sequencing ◽

Cell Lines ◽

Intron Retention ◽

Exon Skipping ◽

Differentially Expressed ◽

Transcript Isoforms ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read

Abstract Motivation Long-read RNA-sequencing technologies such as PacBio and Oxford Nanopore have discovered an explosion of new transcript isoforms that are difficult to visually analyze using currently available tools. We introduce the Swan Python library, which is designed to analyze and visualize transcript models. Results Swan finds 4909 differentially expressed transcripts between cell lines HepG2 and HFFc6, including 279 that are differentially expressed even though the parent gene is not. Additionally, Swan discovers 285 reproducible exon skipping and 47 intron retention events not recorded in the GENCODE v29 annotation. Availability and implementation The Swan library for Python 3 is available on PyPi at https://pypi.org/project/swan-vis/ and on GitHub at https://github.com/mortazavilab/swan_vis.

Download Full-text

Targeted Long-Read RNA Sequencing Demonstrates Transcriptional Diversity Driven by Splice-Site Variation in MYBPC3

Circulation Genomic and Precision Medicine ◽

10.1161/circgen.119.002464 ◽

2019 ◽

Vol 12 (5) ◽

Cited By ~ 1

Author(s):

Alexandra Dainis ◽

Elizabeth Tseng ◽

Tyson A. Clark ◽

Ting Hon ◽

Matthew Wheeler ◽

...

Keyword(s):

Rna Sequencing ◽

Splice Site ◽

Site Variation ◽

Long Read

Download Full-text

Accurate spliced alignment of long RNA sequencing reads

10.1101/2020.09.02.279208 ◽

2020 ◽

Author(s):

Kristoffer Sahlin ◽

Veli Mäkinen

Keyword(s):

Rna Sequencing ◽

State Of The Art ◽

Synthetic Data ◽

Biological Data ◽

Alignment Method ◽

Spliced Alignment ◽

Sequencing Technique ◽

Long Read ◽

New Challenges ◽

Novel Isoforms

AbstractLong-read RNA sequencing techniques are quickly establishing themselves as the primary sequencing technique to study the transcriptome landscape. Many such analyses are dependent upon splice alignment of reads to the genome. However, the error rate and sequencing length of long-read technologies create new challenges for accurately aligning these reads. We present an alignment method uLTRA that, on simulated and synthetic data, shows higher accuracy over state-of-the-art with substantially higher accuracy for small exons. We show several examples on biological data where uLTRA aligns to known and novel isoforms with exon structures that are not detected with other aligners. uLTRA is available at https://github.com/ksahlin/ultra.

Download Full-text

RNA sequencing data: hitchhiker's guide to expression analysis

10.7287/peerj.preprints.27283 ◽

2018 ◽

Author(s):

Koen Van Den Berge ◽

Katharina Hembach ◽

Charlotte Soneson ◽

Simone Tiberi ◽

Lieven Clement ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Large Scale ◽

Science Studies ◽

Rna Seq ◽

Sequencing Data ◽

Data Types ◽

The Past ◽

Long Read ◽

Statistical Approaches

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

Download Full-text