PacBio Single-Molecule Long-Read Sequencing Provides New Light on the Complexity of Full-Length Transcripts in Cattle

Cattle (Bos taurus) is one of the most widely distributed livestock species in the world, and provides us with high-quality milk and meat which have a huge impact on the quality of human life. Therefore, accurate and complete transcriptome and genome annotation are of great value to the research of cattle breeding. In this study, we used error-corrected PacBio single-molecule real-time (SMRT) data to perform whole-transcriptome profiling in cattle. Then, 22.5 Gb of subreads was generated, including 381,423 circular consensus sequences (CCSs), among which 276,295 full-length non-chimeric (FLNC) sequences were identified. After correction by Illumina short reads, we obtained 22,353 error-corrected isoforms. A total of 305 alternative splicing (AS) events and 3,795 alternative polyadenylation (APA) sites were detected by transcriptome structural analysis. Furthermore, we identified 457 novel genes, 120 putative transcription factors (TFs), and 569 novel long non-coding RNAs (lncRNAs). Taken together, this research improves our understanding and provides new insights into the complexity of full-length transcripts in cattle.

Download Full-text

Full-Length Transcriptome Sequencing and EST-SSR marker development of tiger lily (Lilium lancifolium Thunb.)

10.21203/rs.3.rs-415397/v1 ◽

2021 ◽

Author(s):

Mingwei Sun ◽

Yilian Zhao ◽

Xiaobin Shao ◽

Jintao Ge ◽

Xueyan Tang ◽

...

Keyword(s):

Single Molecule ◽

Consensus Sequence ◽

Transcriptome Profiling ◽

Full Length ◽

Biological Regulation ◽

Regulation Network ◽

Long Read ◽

Lilium Lancifolium ◽

Comprehensive Framework ◽

Whole Transcriptome

Abstract It is well known that transcriptional diversity plays important roles in plant biological regulation. But for the difficulty in full-length transcripts obtainment, the available tiger lily (Lilium lancifolium Thunb) transcriptome characterization are still not complete. To improve the integrity of tiger lily transcriptome information, (SMRT PacBio single-molecule long-read sequencing technology) was employed to accomplish the whole transcriptome profiling. A total of 815,624 CCS (Circular Consensus Sequence) reads with mean length of 1,295 bp were obtained. Based on these transcripts, 61,744 reads were full-length reads containing both the 5’ primer, 3’ primer and the poly (A) tail and 3,319 EST-derived SSRs were developed from 2968 unigenes. With the obtained informative reference transcriptome,768 transcription factors and 6,852 long non-coding RNAs were identified, providing a comprehensive framework of the transcriptional regulation network. Of all the annotated transcripts, 15,608 were distributed into 25 various Clusters of euKaryotic Orthologous Groups (KOG), and 10,706 unigenes were categorized into 52 functional groups which were divided into three categories. These results would provide a comprehensive set of reference transcripts and further improve our understanding of the tiger lily transcriptomes.

Download Full-text

Single-Molecule Long-Read Sequencing Reveals the Diversity of Full-Length Transcripts in Leaves of Gnetum (Gnetales)

International Journal of Molecular Sciences ◽

10.3390/ijms20246350 ◽

2019 ◽

Vol 20 (24) ◽

pp. 6350 ◽

Cited By ~ 2

Author(s):

Nan Deng ◽

Chen Hou ◽

Fengfeng Ma ◽

Caixia Liu ◽

Yuxin Tian

Keyword(s):

Single Molecule ◽

Developmental Stages ◽

Alternative Polyadenylation ◽

Full Length ◽

Stomatal Development ◽

Rna Seq ◽

Leaf Transcriptome ◽

Long Read ◽

Non Coding Rnas ◽

A Site

The limitations of RNA sequencing make it difficult to accurately predict alternative splicing (AS) and alternative polyadenylation (APA) events and long non-coding RNAs (lncRNAs), all of which reveal transcriptomic diversity and the complexity of gene regulation. Gnetum, a genus with ambiguous phylogenetic placement in seed plants, has a distinct stomatal structure and photosynthetic characteristics. In this study, a full-length transcriptome of Gnetum luofuense leaves at different developmental stages was sequenced with the latest PacBio Sequel platform. After correction by short reads generated by Illumina RNA-Seq, 80,496 full-length transcripts were obtained, of which 5269 reads were identified as isoforms of novel genes. Additionally, 1660 lncRNAs and 12,998 AS events were detected. In total, 5647 genes in the G. luofuense leaves had APA featured by at least one poly(A) site. Moreover, 67 and 30 genes from the bHLH gene family, which play an important role in stomatal development and photosynthesis, were identified from the G. luofuense genome and leaf transcripts, respectively. This leaf transcriptome supplements the reference genome of G. luofuense, and the AS events and lncRNAs detected provide valuable resources for future studies of investigating low photosynthetic capacity of Gnetum.

Download Full-text

Full-length Transcriptome Analysis of Pecan (Carya illinoinensis) Kernels

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab182 ◽

2021 ◽

Author(s):

Chengcai Zhang ◽

Huadong Ren ◽

Xiaohua Yao ◽

Kailiang Wang ◽

Jun Chang

Keyword(s):

Single Molecule ◽

Molecular Mechanisms ◽

Draft Genome ◽

Alternative Polyadenylation ◽

Full Length ◽

Functional Annotations ◽

Predominant Type ◽

Long Read ◽

Non Coding Rnas ◽

Novel Isoforms

Abstract Pecan is rich in bioactive components such as fatty acids and flavonoids and is an important nut type worldwide. Therefore, the molecular mechanisms of phytochemical biosynthesis in pecan are a focus of research. Recently, a draft genome and several transcriptomes have been published. However, the full-length mRNA transcripts remain unclear, and the regulatory mechanisms behind the quality components biosynthesis and accumulation have not been fully investigated. In this study, single-molecule long read sequencing technology was used to obtain full-length transcripts of pecan kernels. In total, 37 504 isoforms of 16 702 genes were mapped to the reference genome. The numbers of known isoforms, new isoforms, and novel isoforms were 9013 (24.03%), 26 080 (69.54%), and 2411 (6.51%), respectively. Over 80% of the transcripts (30 751, 81.99%) had functional annotations. A total of 15 465 alternative splicing (AS) events and 65 761 alternative polyadenylation events were detected; wherein, the retained intron was the predominant type (5652, 36.55%) of AS. Furthermore, 1894 long non-coding RNAs and 1643 transcription factors were predicted using bioinformatics methods. Finally, the structural genes associated with fatty acid (FA) and flavonoid biosynthesis were characterized. A high frequency of AS accuracy (70.31%) was observed in FA synthesis-associated genes. The present study provides a full-length transcriptome dataset of pecan kernels, which will significantly enhance the understanding of the regulatory basis of phytochemical biosynthesis during pecan kernel maturation.

Download Full-text

Single-nucleus full-length RNA profiling in plants incorporates isoform information to facilitate cell type identification

10.1101/2020.11.25.397919 ◽

2020 ◽

Author(s):

Yanping Long ◽

Zhijian Liu ◽

Jinbu Jia ◽

Weipeng Mo ◽

Liang Fang ◽

...

Keyword(s):

Single Cell ◽

Single Molecule ◽

Large Scale ◽

Alternative Polyadenylation ◽

Full Length ◽

Cell Type ◽

Arabidopsis Root ◽

Rna Profiling ◽

Long Read ◽

Single Nucleus

AbstractThe broad application of large-scale single-cell RNA profiling in plants has been restricted by the prerequisite of protoplasting. We recently found that the Arabidopsis nucleus contains abundant polyadenylated mRNAs, many of which are incompletely spliced. To capture the isoform information, we combined 10x Genomics and Nanopore long-read sequencing to develop a protoplasting-free full-length single-nucleus RNA profiling method in plants. Our results demonstrated using Arabidopsis root that nuclear mRNAs faithfully retain cell identity information, and single-molecule full-length RNA sequencing could further improve cell type identification by revealing splicing status and alternative polyadenylation at single-cell level.

Download Full-text

PacBio single-molecule long-read sequencing shed new light on the complexity of the Carex breviculmis transcriptome

BMC Genomics ◽

10.1186/s12864-019-6163-6 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 2

Author(s):

Ke Teng ◽

Wenjun Teng ◽

Haifeng Wen ◽

Yuesen Yue ◽

Weier Guo ◽

...

Keyword(s):

Single Molecule ◽

Average Length ◽

Transcriptome Profiling ◽

Full Length ◽

Regulation Mechanism ◽

Forage Production ◽

Smrt Sequencing ◽

Reference Transcriptome ◽

Long Read ◽

Ecological Conservation

Abstract Background Carex L., a grass genus commonly known as sedges, is distributed worldwide and contributes constructively to turf management, forage production, and ecological conservation. The development of next-generation sequencing (NGS) technologies has considerably improved our understanding of transcriptome complexity of Carex L. and provided a valuable genetic reference. However, the current transcriptome is not satisfactory mainly because of the enormous difficulty in obtaining full-length transcripts. Results In this study, we employed PacBio single-molecule long-read sequencing (SMRT) technology for whole-transcriptome profiling in Carex breviculmis. We generated 60,353 high-confidence non-redundant transcripts with an average length of 2302-bp. A total of 3588 alternative splicing events, and 1273 long non-coding RNAs were identified. Furthermore, 40,347 complete coding sequences were predicted, providing an informative reference transcriptome. In addition, the transcriptional regulation mechanism of C. breviculmis in response to shade stress was further explored by mapping the NGS data to the reference transcriptome constructed by SMRT sequencing. Conclusions This study provided a full-length reference transcriptome of C. breviculmis using the SMRT sequencing method for the first time. The transcriptome atlas obtained will not only facilitate future functional genomics studies but also pave the way for further selective and genic engineering breeding projects for C. breviculmis.

Download Full-text

Reconstruction and functional annotation of Ascosphaera apis full-length transcriptome via PacBio single-molecule long-read sequencing

10.1101/770040 ◽

2019 ◽

Author(s):

Dafu Chen ◽

Yu Du ◽

Xiaoxue Fan ◽

Zhiwei Zhu ◽

Haibin Jiang ◽

...

Keyword(s):

Single Molecule ◽

Average Length ◽

Full Length ◽

Ascosphaera Apis ◽

Protein Coding ◽

Entomogenous Fungi ◽

Consensus Sequences ◽

Long Read ◽

Non Coding Rnas ◽

Lower Expression

AbstractAscosphaera apis is a widespread fungal pathogen of honeybee larvae that results in chalkbrood disease, leading to heavy losses for the beekeeping industry in China and many other countries. This work was aimed at generating a full-length transcriptome of A. apis using PacBio single-molecule real-time (SMRT) sequencing. Here, more than 23.97 Gb of clean reads was generated from long-read sequencing of A. apis mecylia, including 464,043 circular consensus sequences (CCS) and 394,142 full-length non-chimeric (FLNC) reads. In total, we identified 174,095 high-confidence transcripts covering 5141 known genes with an average length of 2728 bp. We also discovered 2405 genic loci and 11,623 isoforms that have not been annotated yet within the current reference genome. Additionally, 16,049, 10,682, 4520 and 7253 of the discovered transcripts have annotations in the Non-redundant protein (Nr), Clusters of Eukaryotic Orthologous Groups (KOG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, 1205 long non-coding RNAs (lncRNAs) were identified, which have less exons, shorter exon and intron lengths, shorter transcript lengths, lower GC percent, lower expression levels, and fewer alternative splicing (AS) evens, compared with protein-coding transcripts. A total of 253 members from 17 transcription factor (TF) families were identified from our transcript datasets. Finally, the expression of A. apis isoforms was validated using a molecular approach. Overall, this is the first report of a full-length transcriptome of entomogenous fungi including A. apis. Our data offer a comprehensive set of reference transcripts and hence contributes to improving the genome annotation and transcriptomic study of A. apis.

Download Full-text

Time-Course Transcriptome Profiling of a Poxvirus Using Long-Read Full-Length Assay

Pathogens ◽

10.3390/pathogens10080919 ◽

2021 ◽

Vol 10 (8) ◽

pp. 919

Author(s):

Dóra Tombácz ◽

István Prazsák ◽

Gábor Torma ◽

Zsolt Csabai ◽

Zsolt Balázs ◽

...

Keyword(s):

Single Molecule ◽

Time Course ◽

Transcriptome Profiling ◽

Time Lapse ◽

Full Length ◽

Read Length ◽

Transcript Isoforms ◽

Long Read ◽

Second Generation Sequencing ◽

Transcriptional Start Sites

Viral transcriptomes that are determined using first- and second-generation sequencing techniques are incomplete. Due to the short read length, these methods are inefficient or fail to distinguish between transcript isoforms, polycistronic RNAs, and transcriptional overlaps and readthroughs. Additionally, these approaches are insensitive for the identification of splice and transcriptional start sites (TSSs) and, in most cases, transcriptional end sites (TESs), especially in transcript isoforms with varying transcript ends, and in multi-spliced transcripts. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Although vaccinia virus (VACV) does not produce spliced RNAs, its transcriptome has a high diversity of TSSs and TESs, and a high degree of polycistronism that leads to enormous complexity. We applied single-molecule, real-time, and nanopore-based sequencing methods to investigate the time-lapse transcriptome patterns of VACV gene expression.

Download Full-text

PacBio Single-Molecule Long-Read Sequencing Reveals Genes Tolerating Manganese Stress in Schima superba Saplings

Frontiers in Genetics ◽

10.3389/fgene.2021.635043 ◽

2021 ◽

Vol 12 ◽

Author(s):

Fiza Liaquat ◽

Muhammad Farooq Hussain Munis ◽

Samiah Arif ◽

Urooj Haroon ◽

Jianxin Shi ◽

...

Keyword(s):

Single Molecule ◽

Gene Annotation ◽

Treated Group ◽

Full Length ◽

Open Reading Frames ◽

Interacting Protein ◽

Schima Superba ◽

Long Read ◽

First Time ◽

Potential Tool

Schima superba (Theaceae) is a subtropical evergreen tree and is used widely for forest firebreaks and gardening. It is a plant that tolerates salt and typically accumulates elevated amounts of manganese in the leaves. With large ecological amplitude, this tree species grows quickly. Due to its substantial biomass, it has a great potential for soil remediation. To evaluate the thorough framework of the mRNA, we employed PacBio sequencing technology for the first time to generate S. Superba transcriptome. In this analysis, overall, 511,759 full length non-chimeric reads were acquired, and 163,834 high-quality full-length reads were obtained. Overall, 93,362 open reading frames were obtained, of which 78,255 were complete. In gene annotation analyses, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Genes (COG), Gene Ontology (GO), and Non-Redundant (Nr) databases were allocated 91,082, 71,839, 38,914, and 38,376 transcripts, respectively. To identify long non-coding RNAs (lncRNAs), we utilized four computational methods associated with protein families (Pfam), Cooperative Data Classification (CPC), Coding Assessing Potential Tool (CPAT), and Coding Non-Coding Index (CNCI) databases and observed 8,551, 9,174, 20,720, and 18,669 lncRNAs, respectively. Moreover, nine genes were randomly selected for the expression analysis, which showed the highest expression of Gene 6 (Na_Ca_ex gene), and CAX (CAX-interacting protein 4) was higher in manganese (Mn)-treated group. This work provided significant number of full-length transcripts and refined the annotation of the reference genome, which will ease advanced genetic analyses of S. superba.

Download Full-text

Characterization of Full-Length Transcriptome Sequences and Splice Variants of Lateolabrax maculatus by Single-Molecule Long-Read Sequencing and Their Involvement in Salinity Regulation

Frontiers in Genetics ◽

10.3389/fgene.2019.01126 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 4

Author(s):

Yuan Tian ◽

Haishen Wen ◽

Xin Qi ◽

Xiaoyan Zhang ◽

Shikai Liu ◽

...

Keyword(s):

Single Molecule ◽

Splice Variants ◽

Full Length ◽

Long Read ◽

Lateolabrax Maculatus ◽

Transcriptome Sequences

Download Full-text

Transcriptome Profiling Provides Insight into the Genes in Carotenoid Biosynthesis during the Mesocarp and Seed Developmental Stages of Avocado (Persea americana)

International Journal of Molecular Sciences ◽

10.3390/ijms20174117 ◽

2019 ◽

Vol 20 (17) ◽

pp. 4117 ◽

Cited By ~ 8

Author(s):

Yu Ge ◽

Zhihao Cheng ◽

Xiongyuan Si ◽

Weihong Ma ◽

Lin Tan ◽

...

Keyword(s):

Single Molecule ◽

Developmental Stages ◽

Gene Dosage ◽

Average Length ◽

Beta Carotene ◽

Transcriptome Profiling ◽

Carotenoid Biosynthesis ◽

Persea Americana ◽

Sequencing Data ◽

Long Read

Avocado (Persea americana Mill.) is an economically important crop because of its high nutritional value. However, the absence of a sequenced avocado reference genome has hindered investigations of secondary metabolism. For next-generation high-throughput transcriptome sequencing, we obtained 365,615,152 and 348,623,402 clean reads as well as 109.13 and 104.10 Gb of sequencing data for avocado mesocarp and seed, respectively, during five developmental stages. High-quality reads were assembled into 100,837 unigenes with an average length of 847.40 bp (N50 = 1725 bp). Additionally, 16,903 differentially expressed genes (DEGs) were detected, 17 of which were related to carotenoid biosynthesis. The expression levels of most of these 17 DEGs were higher in the mesocarp than in the seed during five developmental stages. In this study, the avocado mesocarp and seed transcriptome were also sequenced using single-molecule long-read sequencing to acquired 25.79 and 17.67 Gb clean data, respectively. We identified 233,014 and 238,219 consensus isoforms in avocado mesocarp and seed, respectively. Furthermore, 104 and 59 isoforms were found to correspond to the putative 11 carotenoid biosynthetic-related genes in the avocado mesocarp and seed, respectively. The isoform numbers of 10 out of the putative 11 genes involved in the carotenoid biosynthetic pathway were higher in the mesocarp than those in the seed. Besides, alpha- and beta-carotene contents in the avocado mesocarp and seed during five developmental stages were also measured, and they were higher in the mesocarp than in the seed, which validated the results of transcriptome profiling. Gene expression changes and the associated variations in gene dosage could influence carotenoid biosynthesis. These results will help to further elucidate carotenoid biosynthesis in avocado.

Download Full-text