scholarly journals Dual Platform Long-Read RNA-Sequencing Dataset of the Human Cytomegalovirus Lytic Transcriptome

2018 ◽  
Vol 9 ◽  
Author(s):  
Zsolt Balázs ◽  
Dóra Tombácz ◽  
Attila Szűcs ◽  
Michael Snyder ◽  
Zsolt Boldogkői
2017 ◽  
Vol 4 (1) ◽  
Author(s):  
Zsolt Balázs ◽  
Dóra Tombácz ◽  
Attila Szűcs ◽  
Michael Snyder ◽  
Zsolt Boldogkői

Abstract Long-read RNA sequencing allows for the precise characterization of full-length transcripts, which makes it an indispensable tool in transcriptomics. The human cytomegalovirus (HCMV) genome has been first sequenced in 1989 and although short-read sequencing studies have uncovered much of the complexity of its transcriptome, only few of its transcripts have been fully annotated. We hereby present a long-read RNA sequencing dataset of HCMV infected human lung fibroblast cells sequenced by the Pacific Biosciences RSII platform. Seven SMRT cells were sequenced using oligo(dT) primers to reverse transcribe poly(A)-selected RNA molecules and one library was prepared using random primers for the reverse transcription of the rRNA-depleted sample. Our dataset contains 122,636 human and 33,086 viral (HMCV strain Towne) reads. The described data include raw and processed sequencing files, and combined with other datasets, they can be used to validate transcriptome analysis tools, to compare library preparation methods, to test base calling algorithms or to identify genetic variants.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Ahmed Al Qaffas ◽  
Salvatore Camiolo ◽  
Mai Vo ◽  
Alexis Aguiar ◽  
Amine Ourahmane ◽  
...  

AbstractThe advent of whole genome sequencing has revealed that common laboratory strains of human cytomegalovirus (HCMV) have major genetic deficiencies resulting from serial passage in fibroblasts. In particular, tropism for epithelial and endothelial cells is lost due to mutations disrupting genes UL128, UL130, or UL131A, which encode subunits of a virion-associated pentameric complex (PC) important for viral entry into these cells but not for entry into fibroblasts. The endothelial cell-adapted strain TB40/E has a relatively intact genome and has emerged as a laboratory strain that closely resembles wild-type virus. However, several heterogeneous TB40/E stocks and cloned variants exist that display a range of sequence and tropism properties. Here, we report the use of PacBio sequencing to elucidate the genetic changes that occurred, both at the consensus level and within subpopulations, upon passaging a TB40/E stock on ARPE-19 epithelial cells. The long-read data also facilitated examination of the linkage between mutations. Consistent with inefficient ARPE-19 cell entry, at least 83% of viral genomes present before adaptation contained changes impacting PC subunits. In contrast, and consistent with the importance of the PC for entry into endothelial and epithelial cells, genomes after adaptation lacked these or additional mutations impacting PC subunits. The sequence data also revealed six single noncoding substitutions in the inverted repeat regions, single nonsynonymous substitutions in genes UL26, UL69, US28, and UL122, and a frameshift truncating gene UL141. Among the changes affecting protein-coding regions, only the one in UL122 was strongly selected. This change, resulting in a D390H substitution in the encoded protein IE2, has been previously implicated in rendering another viral protein, UL84, essential for viral replication in fibroblasts. This finding suggests that IE2, and perhaps its interactions with UL84, have important functions unique to HCMV replication in epithelial cells.


Author(s):  
Huan Zhong ◽  
Zongwei Cai ◽  
Zhu Yang ◽  
Yiji Xia

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.


2018 ◽  
Vol 5 (1) ◽  
Author(s):  
Zsolt Balázs ◽  
Dóra Tombácz ◽  
Attila Szűcs ◽  
Michael Snyder ◽  
Zsolt Boldogkői

2019 ◽  
Author(s):  
Andrew T. Ludlow ◽  
Mohammed E. Sayed ◽  
Aaron L. Slusher ◽  
Mark Ribick ◽  
Anisha Pancholi ◽  
...  

Author(s):  
Fairlie Reese ◽  
Ali Mortazavi

Abstract Motivation Long-read RNA-sequencing technologies such as PacBio and Oxford Nanopore have discovered an explosion of new transcript isoforms that are difficult to visually analyze using currently available tools. We introduce the Swan Python library, which is designed to analyze and visualize transcript models. Results Swan finds 4909 differentially expressed transcripts between cell lines HepG2 and HFFc6, including 279 that are differentially expressed even though the parent gene is not. Additionally, Swan discovers 285 reproducible exon skipping and 47 intron retention events not recorded in the GENCODE v29 annotation. Availability and implementation The Swan library for Python 3 is available on PyPi at https://pypi.org/project/swan-vis/ and on GitHub at https://github.com/mortazavilab/swan_vis.


Author(s):  
Alexandra Dainis ◽  
Elizabeth Tseng ◽  
Tyson A. Clark ◽  
Ting Hon ◽  
Matthew Wheeler ◽  
...  

2020 ◽  
Author(s):  
Kristoffer Sahlin ◽  
Veli Mäkinen

AbstractLong-read RNA sequencing techniques are quickly establishing themselves as the primary sequencing technique to study the transcriptome landscape. Many such analyses are dependent upon splice alignment of reads to the genome. However, the error rate and sequencing length of long-read technologies create new challenges for accurately aligning these reads. We present an alignment method uLTRA that, on simulated and synthetic data, shows higher accuracy over state-of-the-art with substantially higher accuracy for small exons. We show several examples on biological data where uLTRA aligns to known and novel isoforms with exon structures that are not detected with other aligners. uLTRA is available at https://github.com/ksahlin/ultra.


2018 ◽  
Author(s):  
Koen Van Den Berge ◽  
Katharina Hembach ◽  
Charlotte Soneson ◽  
Simone Tiberi ◽  
Lieven Clement ◽  
...  

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.


Sign in / Sign up

Export Citation Format

Share Document