scholarly journals PacBio Single-Molecule Long-Read Sequencing Provides New Light on the Complexity of Full-Length Transcripts in Cattle

2021 ◽  
Vol 12 ◽  
Author(s):  
Tianpeng Chang ◽  
Bingxing An ◽  
Mang Liang ◽  
Xinghai Duan ◽  
Lili Du ◽  
...  

Cattle (Bos taurus) is one of the most widely distributed livestock species in the world, and provides us with high-quality milk and meat which have a huge impact on the quality of human life. Therefore, accurate and complete transcriptome and genome annotation are of great value to the research of cattle breeding. In this study, we used error-corrected PacBio single-molecule real-time (SMRT) data to perform whole-transcriptome profiling in cattle. Then, 22.5 Gb of subreads was generated, including 381,423 circular consensus sequences (CCSs), among which 276,295 full-length non-chimeric (FLNC) sequences were identified. After correction by Illumina short reads, we obtained 22,353 error-corrected isoforms. A total of 305 alternative splicing (AS) events and 3,795 alternative polyadenylation (APA) sites were detected by transcriptome structural analysis. Furthermore, we identified 457 novel genes, 120 putative transcription factors (TFs), and 569 novel long non-coding RNAs (lncRNAs). Taken together, this research improves our understanding and provides new insights into the complexity of full-length transcripts in cattle.

2021 ◽  
Author(s):  
Mingwei Sun ◽  
Yilian Zhao ◽  
Xiaobin Shao ◽  
Jintao Ge ◽  
Xueyan Tang ◽  
...  

Abstract It is well known that transcriptional diversity plays important roles in plant biological regulation. But for the difficulty in full-length transcripts obtainment, the available tiger lily (Lilium lancifolium Thunb) transcriptome characterization are still not complete. To improve the integrity of tiger lily transcriptome information, (SMRT PacBio single-molecule long-read sequencing technology) was employed to accomplish the whole transcriptome profiling. A total of 815,624 CCS (Circular Consensus Sequence) reads with mean length of 1,295 bp were obtained. Based on these transcripts, 61,744 reads were full-length reads containing both the 5’ primer, 3’ primer and the poly (A) tail and 3,319 EST-derived SSRs were developed from 2968 unigenes. With the obtained informative reference transcriptome,768 transcription factors and 6,852 long non-coding RNAs were identified, providing a comprehensive framework of the transcriptional regulation network. Of all the annotated transcripts, 15,608 were distributed into 25 various Clusters of euKaryotic Orthologous Groups (KOG), and 10,706 unigenes were categorized into 52 functional groups which were divided into three categories. These results would provide a comprehensive set of reference transcripts and further improve our understanding of the tiger lily transcriptomes.


2019 ◽  
Vol 20 (24) ◽  
pp. 6350 ◽  
Author(s):  
Nan Deng ◽  
Chen Hou ◽  
Fengfeng Ma ◽  
Caixia Liu ◽  
Yuxin Tian

The limitations of RNA sequencing make it difficult to accurately predict alternative splicing (AS) and alternative polyadenylation (APA) events and long non-coding RNAs (lncRNAs), all of which reveal transcriptomic diversity and the complexity of gene regulation. Gnetum, a genus with ambiguous phylogenetic placement in seed plants, has a distinct stomatal structure and photosynthetic characteristics. In this study, a full-length transcriptome of Gnetum luofuense leaves at different developmental stages was sequenced with the latest PacBio Sequel platform. After correction by short reads generated by Illumina RNA-Seq, 80,496 full-length transcripts were obtained, of which 5269 reads were identified as isoforms of novel genes. Additionally, 1660 lncRNAs and 12,998 AS events were detected. In total, 5647 genes in the G. luofuense leaves had APA featured by at least one poly(A) site. Moreover, 67 and 30 genes from the bHLH gene family, which play an important role in stomatal development and photosynthesis, were identified from the G. luofuense genome and leaf transcripts, respectively. This leaf transcriptome supplements the reference genome of G. luofuense, and the AS events and lncRNAs detected provide valuable resources for future studies of investigating low photosynthetic capacity of Gnetum.


Author(s):  
Chengcai Zhang ◽  
Huadong Ren ◽  
Xiaohua Yao ◽  
Kailiang Wang ◽  
Jun Chang

Abstract Pecan is rich in bioactive components such as fatty acids and flavonoids and is an important nut type worldwide. Therefore, the molecular mechanisms of phytochemical biosynthesis in pecan are a focus of research. Recently, a draft genome and several transcriptomes have been published. However, the full-length mRNA transcripts remain unclear, and the regulatory mechanisms behind the quality components biosynthesis and accumulation have not been fully investigated. In this study, single-molecule long read sequencing technology was used to obtain full-length transcripts of pecan kernels. In total, 37 504 isoforms of 16 702 genes were mapped to the reference genome. The numbers of known isoforms, new isoforms, and novel isoforms were 9013 (24.03%), 26 080 (69.54%), and 2411 (6.51%), respectively. Over 80% of the transcripts (30 751, 81.99%) had functional annotations. A total of 15 465 alternative splicing (AS) events and 65 761 alternative polyadenylation events were detected; wherein, the retained intron was the predominant type (5652, 36.55%) of AS. Furthermore, 1894 long non-coding RNAs and 1643 transcription factors were predicted using bioinformatics methods. Finally, the structural genes associated with fatty acid (FA) and flavonoid biosynthesis were characterized. A high frequency of AS accuracy (70.31%) was observed in FA synthesis-associated genes. The present study provides a full-length transcriptome dataset of pecan kernels, which will significantly enhance the understanding of the regulatory basis of phytochemical biosynthesis during pecan kernel maturation.


2020 ◽  
Author(s):  
Yanping Long ◽  
Zhijian Liu ◽  
Jinbu Jia ◽  
Weipeng Mo ◽  
Liang Fang ◽  
...  

AbstractThe broad application of large-scale single-cell RNA profiling in plants has been restricted by the prerequisite of protoplasting. We recently found that the Arabidopsis nucleus contains abundant polyadenylated mRNAs, many of which are incompletely spliced. To capture the isoform information, we combined 10x Genomics and Nanopore long-read sequencing to develop a protoplasting-free full-length single-nucleus RNA profiling method in plants. Our results demonstrated using Arabidopsis root that nuclear mRNAs faithfully retain cell identity information, and single-molecule full-length RNA sequencing could further improve cell type identification by revealing splicing status and alternative polyadenylation at single-cell level.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Ke Teng ◽  
Wenjun Teng ◽  
Haifeng Wen ◽  
Yuesen Yue ◽  
Weier Guo ◽  
...  

Abstract Background Carex L., a grass genus commonly known as sedges, is distributed worldwide and contributes constructively to turf management, forage production, and ecological conservation. The development of next-generation sequencing (NGS) technologies has considerably improved our understanding of transcriptome complexity of Carex L. and provided a valuable genetic reference. However, the current transcriptome is not satisfactory mainly because of the enormous difficulty in obtaining full-length transcripts. Results In this study, we employed PacBio single-molecule long-read sequencing (SMRT) technology for whole-transcriptome profiling in Carex breviculmis. We generated 60,353 high-confidence non-redundant transcripts with an average length of 2302-bp. A total of 3588 alternative splicing events, and 1273 long non-coding RNAs were identified. Furthermore, 40,347 complete coding sequences were predicted, providing an informative reference transcriptome. In addition, the transcriptional regulation mechanism of C. breviculmis in response to shade stress was further explored by mapping the NGS data to the reference transcriptome constructed by SMRT sequencing. Conclusions This study provided a full-length reference transcriptome of C. breviculmis using the SMRT sequencing method for the first time. The transcriptome atlas obtained will not only facilitate future functional genomics studies but also pave the way for further selective and genic engineering breeding projects for C. breviculmis.


2019 ◽  
Author(s):  
Dafu Chen ◽  
Yu Du ◽  
Xiaoxue Fan ◽  
Zhiwei Zhu ◽  
Haibin Jiang ◽  
...  

AbstractAscosphaera apis is a widespread fungal pathogen of honeybee larvae that results in chalkbrood disease, leading to heavy losses for the beekeeping industry in China and many other countries. This work was aimed at generating a full-length transcriptome of A. apis using PacBio single-molecule real-time (SMRT) sequencing. Here, more than 23.97 Gb of clean reads was generated from long-read sequencing of A. apis mecylia, including 464,043 circular consensus sequences (CCS) and 394,142 full-length non-chimeric (FLNC) reads. In total, we identified 174,095 high-confidence transcripts covering 5141 known genes with an average length of 2728 bp. We also discovered 2405 genic loci and 11,623 isoforms that have not been annotated yet within the current reference genome. Additionally, 16,049, 10,682, 4520 and 7253 of the discovered transcripts have annotations in the Non-redundant protein (Nr), Clusters of Eukaryotic Orthologous Groups (KOG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, 1205 long non-coding RNAs (lncRNAs) were identified, which have less exons, shorter exon and intron lengths, shorter transcript lengths, lower GC percent, lower expression levels, and fewer alternative splicing (AS) evens, compared with protein-coding transcripts. A total of 253 members from 17 transcription factor (TF) families were identified from our transcript datasets. Finally, the expression of A. apis isoforms was validated using a molecular approach. Overall, this is the first report of a full-length transcriptome of entomogenous fungi including A. apis. Our data offer a comprehensive set of reference transcripts and hence contributes to improving the genome annotation and transcriptomic study of A. apis.


Pathogens ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 919
Author(s):  
Dóra Tombácz ◽  
István Prazsák ◽  
Gábor Torma ◽  
Zsolt Csabai ◽  
Zsolt Balázs ◽  
...  

Viral transcriptomes that are determined using first- and second-generation sequencing techniques are incomplete. Due to the short read length, these methods are inefficient or fail to distinguish between transcript isoforms, polycistronic RNAs, and transcriptional overlaps and readthroughs. Additionally, these approaches are insensitive for the identification of splice and transcriptional start sites (TSSs) and, in most cases, transcriptional end sites (TESs), especially in transcript isoforms with varying transcript ends, and in multi-spliced transcripts. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Although vaccinia virus (VACV) does not produce spliced RNAs, its transcriptome has a high diversity of TSSs and TESs, and a high degree of polycistronism that leads to enormous complexity. We applied single-molecule, real-time, and nanopore-based sequencing methods to investigate the time-lapse transcriptome patterns of VACV gene expression.


2021 ◽  
Vol 12 ◽  
Author(s):  
Fiza Liaquat ◽  
Muhammad Farooq Hussain Munis ◽  
Samiah Arif ◽  
Urooj Haroon ◽  
Jianxin Shi ◽  
...  

Schima superba (Theaceae) is a subtropical evergreen tree and is used widely for forest firebreaks and gardening. It is a plant that tolerates salt and typically accumulates elevated amounts of manganese in the leaves. With large ecological amplitude, this tree species grows quickly. Due to its substantial biomass, it has a great potential for soil remediation. To evaluate the thorough framework of the mRNA, we employed PacBio sequencing technology for the first time to generate S. Superba transcriptome. In this analysis, overall, 511,759 full length non-chimeric reads were acquired, and 163,834 high-quality full-length reads were obtained. Overall, 93,362 open reading frames were obtained, of which 78,255 were complete. In gene annotation analyses, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Genes (COG), Gene Ontology (GO), and Non-Redundant (Nr) databases were allocated 91,082, 71,839, 38,914, and 38,376 transcripts, respectively. To identify long non-coding RNAs (lncRNAs), we utilized four computational methods associated with protein families (Pfam), Cooperative Data Classification (CPC), Coding Assessing Potential Tool (CPAT), and Coding Non-Coding Index (CNCI) databases and observed 8,551, 9,174, 20,720, and 18,669 lncRNAs, respectively. Moreover, nine genes were randomly selected for the expression analysis, which showed the highest expression of Gene 6 (Na_Ca_ex gene), and CAX (CAX-interacting protein 4) was higher in manganese (Mn)-treated group. This work provided significant number of full-length transcripts and refined the annotation of the reference genome, which will ease advanced genetic analyses of S. superba.


2019 ◽  
Vol 20 (17) ◽  
pp. 4117 ◽  
Author(s):  
Yu Ge ◽  
Zhihao Cheng ◽  
Xiongyuan Si ◽  
Weihong Ma ◽  
Lin Tan ◽  
...  

Avocado (Persea americana Mill.) is an economically important crop because of its high nutritional value. However, the absence of a sequenced avocado reference genome has hindered investigations of secondary metabolism. For next-generation high-throughput transcriptome sequencing, we obtained 365,615,152 and 348,623,402 clean reads as well as 109.13 and 104.10 Gb of sequencing data for avocado mesocarp and seed, respectively, during five developmental stages. High-quality reads were assembled into 100,837 unigenes with an average length of 847.40 bp (N50 = 1725 bp). Additionally, 16,903 differentially expressed genes (DEGs) were detected, 17 of which were related to carotenoid biosynthesis. The expression levels of most of these 17 DEGs were higher in the mesocarp than in the seed during five developmental stages. In this study, the avocado mesocarp and seed transcriptome were also sequenced using single-molecule long-read sequencing to acquired 25.79 and 17.67 Gb clean data, respectively. We identified 233,014 and 238,219 consensus isoforms in avocado mesocarp and seed, respectively. Furthermore, 104 and 59 isoforms were found to correspond to the putative 11 carotenoid biosynthetic-related genes in the avocado mesocarp and seed, respectively. The isoform numbers of 10 out of the putative 11 genes involved in the carotenoid biosynthetic pathway were higher in the mesocarp than those in the seed. Besides, alpha- and beta-carotene contents in the avocado mesocarp and seed during five developmental stages were also measured, and they were higher in the mesocarp than in the seed, which validated the results of transcriptome profiling. Gene expression changes and the associated variations in gene dosage could influence carotenoid biosynthesis. These results will help to further elucidate carotenoid biosynthesis in avocado.


Sign in / Sign up

Export Citation Format

Share Document