Single-molecule Real-time (SMRT) Isoform Sequencing (Iso-Seq) in Plants: The Status of the Bioinformatics Tools to Unravel the Transcriptome Complexity

2019 ◽  
Vol 14 (7) ◽  
pp. 566-573 ◽  
Author(s):  
Yubang Gao ◽  
Feihu Xi ◽  
Hangxiao zhang ◽  
Xuqing Liu ◽  
Huiyuan Wang ◽  
...  

Background: The advent of the Single-Molecule Real-time (SMRT) Isoform Sequencing (Iso-Seq) has paved the way to obtain longer full-length transcripts. This method was found to be much superior in identifying full-length splice variants and other post-transcriptional events as compared to the Next Generation Sequencing (NGS)-based short read sequencing (RNA-Seq). Several different bioinformatics tools to analyze the Iso-Seq data have been developed and some of them are still being refined to address different aspects of transcriptome complexity. However, a comprehensive summary of the available tools and their utility is still lacking. Objective: Here, we summarized the existing Iso-Seq analysis tools and presented an integrated bioinformatics pipeline for Iso-Seq analysis, which overcomes the limitations of NGS and generates long contiguous Full-Length Non-Chimeric (FLNC) reads for the analysis of posttranscriptional events. Results: In this review, we summarized recent applications of Iso-Seq in plants, which include improved genome annotations, identification of novel genes and lncRNAs, identification of fulllength splice isoforms, detection of novel Alternative Splicing (AS) and Alternative Polyadenylation (APA) events. In addition, we also discussed the bioinformatics pipeline for comprehensive Iso-Seq data analysis, including how to reduce the error rate in the reads and how to identify and quantify post-transcriptional events. Furthermore, the visualization approach of Iso-Seq was discussed as well. Finally, we discussed methods to combine Iso-Seq data with RNA-Seq for transcriptome quantification. Conclusion: Overall, this review demonstrates that the Iso-Seq is pivotal for analyzing transcriptome complexity and this new method offers unprecedented opportunities to comprehensively understand transcripts diversity.

DNA Research ◽  
2019 ◽  
Vol 26 (4) ◽  
pp. 301-311 ◽  
Author(s):  
Yue Zhang ◽  
Tonny Maraga Nyong'A ◽  
Tao Shi ◽  
Pingfang Yang

Abstract Alternative splicing (AS) plays a critical role in regulating different physiological and developmental processes in eukaryotes, by dramatically increasing the diversity of the transcriptome and the proteome. However, the saturation and complexity of AS remain unclear in lotus due to its limitation of rare obtainment of full-length multiple-splice isoforms. In this study, we apply a hybrid assembly strategy by combining single-molecule real-time sequencing and Illumina RNA-seq to get a comprehensive insight into the lotus transcriptomic landscape. We identified 211,802 high-quality full-length non-chimeric reads, with 192,690 non-redundant isoforms, and updated the lotus reference gene model. Moreover, our analysis identified a total of 104,288 AS events from 16,543 genes, with alternative 3ʹ splice-site being the predominant model, following by intron retention. By exploring tissue datasets, 370 tissue-specific AS events were identified among 12 tissues. Both the tissue-specific genes and isoforms might play important roles in tissue or organ development, and are suitable for ‘ABCE’ model partly in floral tissues. A large number of AS events and isoform variants identified in our study enhance the understanding of transcriptional diversity in lotus, and provide valuable resource for further functional genomic studies.


2019 ◽  
Vol 20 (24) ◽  
pp. 6350 ◽  
Author(s):  
Nan Deng ◽  
Chen Hou ◽  
Fengfeng Ma ◽  
Caixia Liu ◽  
Yuxin Tian

The limitations of RNA sequencing make it difficult to accurately predict alternative splicing (AS) and alternative polyadenylation (APA) events and long non-coding RNAs (lncRNAs), all of which reveal transcriptomic diversity and the complexity of gene regulation. Gnetum, a genus with ambiguous phylogenetic placement in seed plants, has a distinct stomatal structure and photosynthetic characteristics. In this study, a full-length transcriptome of Gnetum luofuense leaves at different developmental stages was sequenced with the latest PacBio Sequel platform. After correction by short reads generated by Illumina RNA-Seq, 80,496 full-length transcripts were obtained, of which 5269 reads were identified as isoforms of novel genes. Additionally, 1660 lncRNAs and 12,998 AS events were detected. In total, 5647 genes in the G. luofuense leaves had APA featured by at least one poly(A) site. Moreover, 67 and 30 genes from the bHLH gene family, which play an important role in stomatal development and photosynthesis, were identified from the G. luofuense genome and leaf transcripts, respectively. This leaf transcriptome supplements the reference genome of G. luofuense, and the AS events and lncRNAs detected provide valuable resources for future studies of investigating low photosynthetic capacity of Gnetum.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Chong Tan ◽  
Hongxin Liu ◽  
Jie Ren ◽  
Xueling Ye ◽  
Hui Feng ◽  
...  

Abstract Background Anther development has been extensively studied at the transcriptional level, but a systematic analysis of full-length transcripts on a genome-wide scale has not yet been published. Here, the Pacific Biosciences (PacBio) Sequel platform and next-generation sequencing (NGS) technology were combined to generate full-length sequences and completed structures of transcripts in anthers of Chinese cabbage. Results Using single-molecule real-time sequencing (SMRT), a total of 1,098,119 circular consensus sequences (CCSs) were generated with a mean length of 2664 bp. More than 75% of the CCSs were considered full-length non-chimeric (FLNC) reads. After error correction, 725,731 high-quality FLNC reads were estimated to carry 51,501 isoforms from 19,503 loci, consisting of 38,992 novel isoforms from known genes and 3691 novel isoforms from novel genes. Of the novel isoforms, we identified 407 long non-coding RNAs (lncRNAs) and 37,549 open reading frames (ORFs). Furthermore, a total of 453,270 alternative splicing (AS) events were identified and the majority of AS models in anther were determined to be approximate exon skipping (XSKIP) events. Of the key genes regulated during anther development, AS events were mainly identified in the genes SERK1, CALS5, NEF1, and CESA1/3. Additionally, we identified 104 fusion transcripts and 5806 genes that had alternative polyadenylation (APA). Conclusions Our work demonstrated the transcriptome diversity and complexity of anther development in Chinese cabbage. The findings provide a basis for further genome annotation and transcriptome research in Chinese cabbage.


2021 ◽  
Author(s):  
Yufei Wang ◽  
Siyu Xie ◽  
Jialiang Li ◽  
Jieshi Tang ◽  
Tsam Ju ◽  
...  

Abstract Objectives Cupressaceae is the second largest family of coniferous trees (Coniferopsida) with important economic and ecological values. However, like other conifers, the members of Cupressaceae have extremely large genome (>8 gigabytes), which limited the researches of these taxa. A high-quality transcriptome is an important resource for gene discovery and annotation for non-model organisms. Data descriptionJuniperus squamata, a tetraploid species which is widely distributed in Asian mountains, represents the largest genus, Juniperus, in Cupressaceae. Single-molecule real-time sequencing was used to obtain full-length transcriptome of Juniperus squamata. The full-length transcriptome was corrected with Illumina RNA-seq data from the same individual. A total of 47, 860 non-redundant full-length transcripts, N50 of which was 2, 839, were obtained. Simple sequence repeats for Juniperus squamata were also identified. This data presents the first comprehensive transcriptome characterization of Cupressaceae species, and provides an important reference for researches on the genomic evolutionary history of Cupressaceae plants and even conifers in the future.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 320
Author(s):  
Lorissa I. McDougall ◽  
Ryan M. Powell ◽  
Magdalena Ratajska ◽  
Chi F. Lynch-Sutherland ◽  
Sultana Mehbuba Hossain ◽  
...  

Melanoma comprises <5% of cutaneous malignancies, yet it causes a significant proportion of skin cancer-related deaths worldwide. While new therapies for melanoma have been developed, not all patients respond well. Thus, further research is required to better predict patient outcomes. Using long-range nanopore sequencing, RT-qPCR, and RNA sequencing analyses, we examined the transcription of BARD1 splice isoforms in melanoma cell lines and patient tissue samples. Seventy-six BARD1 mRNA variants were identified in total, with several previously characterised isoforms (γ, φ, δ, ε, and η) contributing to a large proportion of the expressed transcripts. In addition, we identified four novel splice events, namely, Δ(E3_E9), ▼(i8), IVS10+131▼46, and IVS10▼176, occurring in various combinations in multiple transcripts. We found that short-read RNA-Seq analyses were limited in their ability to predict isoforms containing multiple non-contiguous splicing events, as compared to long-range nanopore sequencing. These studies suggest that further investigations into the functional significance of the identified BARD1 splice variants in melanoma are warranted.


2019 ◽  
Author(s):  
Anne Deslattes Mays ◽  
Marcel O. Schmidt ◽  
Garrett T. Graham ◽  
Elizabeth Tseng ◽  
Primo Baybayan ◽  
...  

AbstractHematopoietic cells are continuously replenished from progenitor cells that reside in the bone marrow. To evaluate molecular changes during this process, we analyzed the transcriptomes of freshly harvested human bone marrow progenitor (lineage-negative) and differentiated (lineage-positive) cells by single molecule, real time (SMRT) full length RNA sequencing. This analysis revealed a ∼5-fold higher number of transcript isoforms than previously detected and showed a distinct composition of individual transcript isoforms characteristic for bone marrow subpopulations. A detailed analysis of mRNA isoforms transcribed from the ANXA1 and EEF1A1 loci confirmed their distinct composition. The expression of proteins predicted from the transcriptome analysis was validated by mass spectrometry and validated previously unknown protein isoforms predicted e.g. for EEF1A1. These protein isoforms distinguished the lineage negative cell population from the lineage positive cell population. Finally, transcript isoforms expressed from paralogous gene loci (e.g. CFD, GATA2, HLA-A, B & C) also distinguished cell subpopulations but were only detectable by full length RNA sequencing. Thus, qualitatively distinct transcript isoforms from individual genomic loci separate bone marrow cell subpopulations indicating complex transcriptional regulation and protein isoform generation during hematopoiesis.


2018 ◽  
Vol 19 (2) ◽  
pp. 136-146 ◽  
Author(s):  
Takahiro Mimori ◽  
Jun Yasuda ◽  
Yoko Kuroki ◽  
Tomoko F. Shibata ◽  
Fumiki Katsuoka ◽  
...  

PLoS ONE ◽  
2020 ◽  
Vol 15 (9) ◽  
pp. e0238942
Author(s):  
Cuiping Pan ◽  
Yongqing Wang ◽  
Lian Tao ◽  
Hui Zhang ◽  
Qunxian Deng ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document