scholarly journals Time-Course Transcriptome Profiling of a Poxvirus Using Long-Read Full-Length Assay

Pathogens ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 919
Author(s):  
Dóra Tombácz ◽  
István Prazsák ◽  
Gábor Torma ◽  
Zsolt Csabai ◽  
Zsolt Balázs ◽  
...  

Viral transcriptomes that are determined using first- and second-generation sequencing techniques are incomplete. Due to the short read length, these methods are inefficient or fail to distinguish between transcript isoforms, polycistronic RNAs, and transcriptional overlaps and readthroughs. Additionally, these approaches are insensitive for the identification of splice and transcriptional start sites (TSSs) and, in most cases, transcriptional end sites (TESs), especially in transcript isoforms with varying transcript ends, and in multi-spliced transcripts. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Although vaccinia virus (VACV) does not produce spliced RNAs, its transcriptome has a high diversity of TSSs and TESs, and a high degree of polycistronism that leads to enormous complexity. We applied single-molecule, real-time, and nanopore-based sequencing methods to investigate the time-lapse transcriptome patterns of VACV gene expression.

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Zoltán Maróti ◽  
Dóra Tombácz ◽  
István Prazsák ◽  
Norbert Moldován ◽  
Zsolt Csabai ◽  
...  

Abstract Objective In this study, we applied two long-read sequencing (LRS) approaches, including single-molecule real-time and nanopore-based sequencing methods to investigate the time-lapse transcriptome patterns of host gene expression as a response to Vaccinia virus infection. Transcriptomes determined using short-read sequencing approaches are incomplete because these platforms are inefficient or fail to distinguish between polycistronic RNAs, transcript isoforms, transcriptional start sites, as well as transcriptional readthroughs and overlaps. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Results In this work, we identified a number of novel transcripts and transcript isoforms of Chlorocebus sabaeus. Additionally, analysis of the most abundant 768 host transcripts revealed a significant overrepresentation of the class of genes in the “regulation of signaling receptor activity” Gene Ontology annotation as a result of viral infection.


Forests ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 866
Author(s):  
Lei Kan ◽  
Qicong Liao ◽  
Zhiyao Su ◽  
Yushan Tan ◽  
Shuyu Wang ◽  
...  

Madhuca pasquieri (Dubard) Lam. is a tree on the International Union for Conservation of Nature Red List and a national key protected wild plant (II) of China, known for its seed oil and timber. However, lacking of genomic and transcriptome data for this species hampers study of its reproduction, utilization, and conservation. Here, single-molecule long-read sequencing (PacBio) and next-generation sequencing (Illumina) were combined to obtain the transcriptome from five developmental stages of M. pasquieri. Overall, 25,339 transcript isoforms were detected by PacBio, including 24,492 coding sequences (CDSs), 9440 simple sequence repeats (SSRs), 149 long non-coding RNAs (lncRNAs), and 182 alternative splicing (AS) events, a majority was retained intron (RI). A further 1058 transcripts were identified as transcriptional factors (TFs) from 51 TF families. PacBio recovered more full-length transcript isoforms with a longer length, and a higher expression level, whereas larger number of transcripts (124,405) was captured in de novo from Illumina. Using Nr, Swissprot, KOG, and KEGG databases, 24,405 transcripts (96.31%) were annotated by PacBio. Functional annotation revealed a role for the auxin, abscisic acid, gibberellin, and cytokinine metabolic pathways in seed germination and post-germination. These findings support further studies on seed germination mechanism and genome of M. pasquieri, and better protection of this endangered species.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Yueming Hu ◽  
Xing-Sheng Shu ◽  
Jiaxian Yu ◽  
Ming-an Sun ◽  
Zewei Chen ◽  
...  

AbstractHuman genes form a large variety of isoforms after transcription, encoding distinct transcripts to exert different functions. Single-molecule RNA sequencing facilitates accurate identification of the isoforms by extending nucleotide read length significantly. However, the gene or isoform diversity is lowly represented by the mRNA molecules captured by single-molecule RNA sequencing. Here, we show that a cDNA normalization procedure before the library preparation for PacBio RS II sequencing captures 3.2–6.0 fold more full-length high-quality isoform species for different human samples, as compared to the non-normalized capture procedure. Many lowly expressed, functionally important isoforms can be detected. In addition, normalized PacBio RNA sequencing also resolves more allele-specific haplotype transcripts. Finally, we apply the cDNA normalization based long-read RNA sequencing method to profile the transcriptome of human gastric signet-ring cell carcinomas, identify new cancer-specific transcriptome signatures, and thus, bring out the utility of the improved protocols in gene expression studies.


2018 ◽  
Vol 2018 ◽  
pp. 1-6 ◽  
Author(s):  
Shang-Qian Xie ◽  
Yue Han ◽  
Xiao-Zhou Chen ◽  
Tai-Yu Cao ◽  
Kai-Kai Ji ◽  
...  

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers’ datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.


2021 ◽  
Author(s):  
Mingwei Sun ◽  
Yilian Zhao ◽  
Xiaobin Shao ◽  
Jintao Ge ◽  
Xueyan Tang ◽  
...  

Abstract It is well known that transcriptional diversity plays important roles in plant biological regulation. But for the difficulty in full-length transcripts obtainment, the available tiger lily (Lilium lancifolium Thunb) transcriptome characterization are still not complete. To improve the integrity of tiger lily transcriptome information, (SMRT PacBio single-molecule long-read sequencing technology) was employed to accomplish the whole transcriptome profiling. A total of 815,624 CCS (Circular Consensus Sequence) reads with mean length of 1,295 bp were obtained. Based on these transcripts, 61,744 reads were full-length reads containing both the 5’ primer, 3’ primer and the poly (A) tail and 3,319 EST-derived SSRs were developed from 2968 unigenes. With the obtained informative reference transcriptome,768 transcription factors and 6,852 long non-coding RNAs were identified, providing a comprehensive framework of the transcriptional regulation network. Of all the annotated transcripts, 15,608 were distributed into 25 various Clusters of euKaryotic Orthologous Groups (KOG), and 10,706 unigenes were categorized into 52 functional groups which were divided into three categories. These results would provide a comprehensive set of reference transcripts and further improve our understanding of the tiger lily transcriptomes.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Ke Teng ◽  
Wenjun Teng ◽  
Haifeng Wen ◽  
Yuesen Yue ◽  
Weier Guo ◽  
...  

Abstract Background Carex L., a grass genus commonly known as sedges, is distributed worldwide and contributes constructively to turf management, forage production, and ecological conservation. The development of next-generation sequencing (NGS) technologies has considerably improved our understanding of transcriptome complexity of Carex L. and provided a valuable genetic reference. However, the current transcriptome is not satisfactory mainly because of the enormous difficulty in obtaining full-length transcripts. Results In this study, we employed PacBio single-molecule long-read sequencing (SMRT) technology for whole-transcriptome profiling in Carex breviculmis. We generated 60,353 high-confidence non-redundant transcripts with an average length of 2302-bp. A total of 3588 alternative splicing events, and 1273 long non-coding RNAs were identified. Furthermore, 40,347 complete coding sequences were predicted, providing an informative reference transcriptome. In addition, the transcriptional regulation mechanism of C. breviculmis in response to shade stress was further explored by mapping the NGS data to the reference transcriptome constructed by SMRT sequencing. Conclusions This study provided a full-length reference transcriptome of C. breviculmis using the SMRT sequencing method for the first time. The transcriptome atlas obtained will not only facilitate future functional genomics studies but also pave the way for further selective and genic engineering breeding projects for C. breviculmis.


2020 ◽  
Vol 15 (2) ◽  
pp. 165-172
Author(s):  
Chaithra Pradeep ◽  
Dharam Nandan ◽  
Arya A. Das ◽  
Dinesh Velayutham

Background: The standard approach for transcriptomic profiling involves high throughput short-read sequencing technology, mainly dominated by Illumina. However, the short reads have limitations in transcriptome assembly and in obtaining full-length transcripts due to the complex nature of transcriptomes with variable length and multiple alternative spliced isoforms. Recent advances in long read sequencing by the Oxford Nanopore Technologies (ONT) offered both cDNA as well as direct RNA sequencing and has brought a paradigm change in the sequencing technology to greatly improve the assembly and expression estimates. ONT enables molecules to be sequenced without fragmentation resulting in ultra-long read length enabling the entire genes and transcripts to be fully characterized. The direct RNA sequencing method, in addition, circumvents the reverse transcription and amplification steps. Objective: In this study, RNA sequencing methods were assessed by comparing data from Illumina (ILM), ONT cDNA (OCD) and ONT direct RNA (ODR). Methods: The sensitivity & specificity of the isoform detection was determined from the data generated by Illumina, ONT cDNA and ONT direct RNA sequencing technologies using Saccharomyces cerevisiae as model. Comparative studies were conducted with two pipelines to detect the isoforms, novel genes and variable gene length. Results: Mapping metrics and qualitative profiles for different pipelines are presented to understand these disruptive technologies. The variability in sequencing technology and the analysis pipeline were studied.


2021 ◽  
Author(s):  
Chao Fang ◽  
Xiaohuan Sun ◽  
Fei Fan ◽  
Xiaowei Zhang ◽  
Ou Wang ◽  
...  

Although several large-scale environmental microbial projects have been initiated in the past two decades, understanding of the role of complex microbiotas is still constrained by problems of detecting and identifying unknown microorganisms1-6. Currently, hypervariable regions of rRNA genes as well as internal transcribed spacer regions are broadly used to identify bacteria and fungi within complex communities7, 8, but taxonomic and phylogenetic resolution is hampered by insufficient sequencing length9-11. Direct sequencing of full length rRNA genes is currently limited by read length using second generation sequencing or sacrificed quality and throughput by using single molecule sequencing. We developed a novel method to sequence and assemble nearly full length rRNA genes using second generation sequencing. Benchmarking was performed on mock bacterial and fungal communities as well as two forest soil samples. The majority of rRNA gene sequences of all species in the mock community samples were successfully recovered with identities above 99.5% compared to the reference sequences. For soil samples we obtained exquisite coverage with identification of a large number of putative new species, as well as high abundance correlation between replicates. This approach provides a cost-effective method for obtaining extensive and accurate information on complex environmental microbial communities.


2021 ◽  
Vol 12 ◽  
Author(s):  
Tianpeng Chang ◽  
Bingxing An ◽  
Mang Liang ◽  
Xinghai Duan ◽  
Lili Du ◽  
...  

Cattle (Bos taurus) is one of the most widely distributed livestock species in the world, and provides us with high-quality milk and meat which have a huge impact on the quality of human life. Therefore, accurate and complete transcriptome and genome annotation are of great value to the research of cattle breeding. In this study, we used error-corrected PacBio single-molecule real-time (SMRT) data to perform whole-transcriptome profiling in cattle. Then, 22.5 Gb of subreads was generated, including 381,423 circular consensus sequences (CCSs), among which 276,295 full-length non-chimeric (FLNC) sequences were identified. After correction by Illumina short reads, we obtained 22,353 error-corrected isoforms. A total of 305 alternative splicing (AS) events and 3,795 alternative polyadenylation (APA) sites were detected by transcriptome structural analysis. Furthermore, we identified 457 novel genes, 120 putative transcription factors (TFs), and 569 novel long non-coding RNAs (lncRNAs). Taken together, this research improves our understanding and provides new insights into the complexity of full-length transcripts in cattle.


2015 ◽  
Author(s):  
Yuta Suzuki ◽  
Jonas Korlach ◽  
Stephen W. Turner ◽  
Tatsuya Tsukahara ◽  
Junko Taniguchi ◽  
...  

Determining the methylation state of regions with high copy numbers is challenging for second-generation sequencing, because the read length is insufficient to map reads uniquely, especially when repetitive regions are long and nearly identical to each other. Single-molecule real-time (SMRT) sequencing is a promising method for observing such regions, because it is not vulnerable to GC bias, it performs long read lengths, and its kinetic information is sensitive to DNA modifications. We propose a novel algorithm that combines the kinetic information for neighboring CpG sites and increases the confidence in identifying the methylation states of those sites. Both the sensitivity and precision of our algorithm were ∼93.7% on CpG site basis for the genome of an inbred medaka (Oryzias latipes) strain within a practical read coverage of ∼30-fold. The method is quantitatively accurate because we observed a high correlation coefficient (R = 0.884) between our method and bisulfite sequencing, and 92.0% of CpG sites were in concordance within 0.25. Using this method, we characterized the landscape of the methylation status of repetitive elements, such as LINEs, in the human genome, thereby revealing the strong correlation between CpG density and unmethylation and detecting unmethylation hot spots of LTRs and LINEs. We could uncover the methylation states for nearly identical active transposons, two novel LINE insertions of identity ∼99% and length 6050 base pairs (bp) in the human genome, and sixteen Tol2 elements of identity >99.8% and length 4682 bp in the medaka genome.


Sign in / Sign up

Export Citation Format

Share Document