scholarly journals Global Survey of the Full-Length Cabbage Transcriptome (Brassica oleracea Var. capitata L.) Reveals Key Alternative Splicing Events Involved in Growth and Disease Response

2021 ◽  
Vol 22 (19) ◽  
pp. 10443
Author(s):  
Yong Wang ◽  
Jialei Ji ◽  
Long Tong ◽  
Zhiyuan Fang ◽  
Limei Yang ◽  
...  

Cabbage (Brassica oleracea L. var. capitata L.) is an important vegetable crop cultivated around the world. Previous studies of cabbage gene transcripts were primarily based on next-generation sequencing (NGS) technology which cannot provide accurate information concerning transcript assembly and structure analysis. To overcome these issues and analyze the whole cabbage transcriptome at the isoform level, PacBio RS II Single-Molecule Real-Time (SMRT) sequencing technology was used for a global survey of the full-length transcriptomes of five cabbage tissue types (root, stem, leaf, flower, and silique). A total of 77,048 isoforms, capturing 18,183 annotated genes, were discovered from the sequencing data generated through SMRT. The patterns of both alternative splicing (AS) and alternative polyadenylation (APA) were comprehensively analyzed. In total, we detected 13,468 genes which had isoforms containing APA sites and 8978 genes which underwent AS events. Moreover, 5272 long non-coding RNAs (lncRNAs) were discovered, and most exhibited tissue-specific expression. In total, 3147 transcription factors (TFs) were detected and 10 significant gene co-expression network modules were identified. In addition, we found that Fusarium wilt, black rot and clubroot infection significantly influenced AS in resistant cabbage. In summary, this study provides abundant cabbage isoform transcriptome data, which promotes reannotation of the cabbage genome, deepens our understanding of their post-transcriptional regulation mechanisms, and can be used for future functional genomic research.

DNA Research ◽  
2019 ◽  
Vol 26 (4) ◽  
pp. 301-311 ◽  
Author(s):  
Yue Zhang ◽  
Tonny Maraga Nyong'A ◽  
Tao Shi ◽  
Pingfang Yang

Abstract Alternative splicing (AS) plays a critical role in regulating different physiological and developmental processes in eukaryotes, by dramatically increasing the diversity of the transcriptome and the proteome. However, the saturation and complexity of AS remain unclear in lotus due to its limitation of rare obtainment of full-length multiple-splice isoforms. In this study, we apply a hybrid assembly strategy by combining single-molecule real-time sequencing and Illumina RNA-seq to get a comprehensive insight into the lotus transcriptomic landscape. We identified 211,802 high-quality full-length non-chimeric reads, with 192,690 non-redundant isoforms, and updated the lotus reference gene model. Moreover, our analysis identified a total of 104,288 AS events from 16,543 genes, with alternative 3ʹ splice-site being the predominant model, following by intron retention. By exploring tissue datasets, 370 tissue-specific AS events were identified among 12 tissues. Both the tissue-specific genes and isoforms might play important roles in tissue or organ development, and are suitable for ‘ABCE’ model partly in floral tissues. A large number of AS events and isoform variants identified in our study enhance the understanding of transcriptional diversity in lotus, and provide valuable resource for further functional genomic studies.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
D Oehler ◽  
A Goedecke ◽  
A Spychala ◽  
K Lu ◽  
N Gerdes ◽  
...  

Abstract Background Alternative splicing is a process by which exons within a pre-mRNA are joined or skipped, resulting in isoforms being encoded by a single gene. Alternative Splicing affecting transcription factors may have substantial impact on cellular dynamics. The PPARG Coactivator 1 Alpha (PGC1-α), is a major modulator in energy metabolism. Data from murine skeletal muscle revealed distinctive isoform patterns giving rise to different phenotypes, i.e. mitogenesis and hypertrophy. Here, we aimed to establish a complete dataset of isoforms in murine and human heart applying single-molecule real-time (SMRT)-sequencing as novel approach to identify transcripts without need for assembly, resulting in true full-length sequences. Moreover, we aimed to unravel functional relevance of the various isoforms during experimental ischemia reperfusion (I/R). Methods RNA-Isolation was performed in murine (C57Bl/6J) or human heart tissue (obtained during LVAD-surgery), followed by library preparation and SMRT-Sequencing. Bioinformatic analysis was done using a modified IsoSeq3-Pipeline and OS-tools. Identification of PGC1-α isoforms was fulfilled by similarity search against exonic sequences within the full-length, non-concatemere (FLNC) reads. Isoforms with Open-Reading-Frame (ORF) were manually curated and validated by PCR and Sanger-Sequencing. I/R was induced by ligature of the LAD for 45 min in mice on standard chow as well as on high-fat-high-sucrose diet. Area At Risk (AAR) and remote tissue were collected three and 16 days after I/R or sham-surgery (n=4 per time point). Promotor patterns were analyzed by qPCR. Results Deciphering the full-length transcriptome of murine and human heart resulted in ∼60000 Isoforms with 99% accuracy on mRNA-sequence. Focusing on murine PGC1-α-isoforms we discovered and verified 15 novel transcripts generated by hitherto unknown splicing events. Additionally, we identified a novel Exon 1 originating between the known promoters followed by a valid ORF, suggesting the discovery of a novel promoter. Remarkably, we found a homologous novel Exon1 in human heart, suggesting conservation of the postulated promoter. In I/R the AAR exhibited a significant lower expression of established and novel promoters compared to remote under standard chow 3d post I/R. 16d post I/R, the difference between AAR & Remote equalized in standard chow while remaining under High-Fat-Diet. Conclusion Applying SMRT-technique, we generated the first time a complete full-length-transcriptome of the murine and human heart, identifying 15 novel potentially coding transcripts of PGC1-α and a novel exon 1. These transcripts are differentially regulated in experimental I/R in AAR and remote myocardium, suggesting transcriptional regulation and alternative splicing modulating PGC1-α function in heart. Differences between standard chow and high fat diet suggest impact of impaired glucose metabolism on regulatory processes after myocardial infarction. Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): Collaborative Research Centre 1116 (German Research Foundation)


2018 ◽  
Vol 35 (15) ◽  
pp. 2654-2656 ◽  
Author(s):  
Guoli Ji ◽  
Wenbin Ye ◽  
Yaru Su ◽  
Moliang Chen ◽  
Guangzao Huang ◽  
...  

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (24) ◽  
pp. 6350 ◽  
Author(s):  
Nan Deng ◽  
Chen Hou ◽  
Fengfeng Ma ◽  
Caixia Liu ◽  
Yuxin Tian

The limitations of RNA sequencing make it difficult to accurately predict alternative splicing (AS) and alternative polyadenylation (APA) events and long non-coding RNAs (lncRNAs), all of which reveal transcriptomic diversity and the complexity of gene regulation. Gnetum, a genus with ambiguous phylogenetic placement in seed plants, has a distinct stomatal structure and photosynthetic characteristics. In this study, a full-length transcriptome of Gnetum luofuense leaves at different developmental stages was sequenced with the latest PacBio Sequel platform. After correction by short reads generated by Illumina RNA-Seq, 80,496 full-length transcripts were obtained, of which 5269 reads were identified as isoforms of novel genes. Additionally, 1660 lncRNAs and 12,998 AS events were detected. In total, 5647 genes in the G. luofuense leaves had APA featured by at least one poly(A) site. Moreover, 67 and 30 genes from the bHLH gene family, which play an important role in stomatal development and photosynthesis, were identified from the G. luofuense genome and leaf transcripts, respectively. This leaf transcriptome supplements the reference genome of G. luofuense, and the AS events and lncRNAs detected provide valuable resources for future studies of investigating low photosynthetic capacity of Gnetum.


Author(s):  
Chengcai Zhang ◽  
Huadong Ren ◽  
Xiaohua Yao ◽  
Kailiang Wang ◽  
Jun Chang

Abstract Pecan is rich in bioactive components such as fatty acids and flavonoids and is an important nut type worldwide. Therefore, the molecular mechanisms of phytochemical biosynthesis in pecan are a focus of research. Recently, a draft genome and several transcriptomes have been published. However, the full-length mRNA transcripts remain unclear, and the regulatory mechanisms behind the quality components biosynthesis and accumulation have not been fully investigated. In this study, single-molecule long read sequencing technology was used to obtain full-length transcripts of pecan kernels. In total, 37 504 isoforms of 16 702 genes were mapped to the reference genome. The numbers of known isoforms, new isoforms, and novel isoforms were 9013 (24.03%), 26 080 (69.54%), and 2411 (6.51%), respectively. Over 80% of the transcripts (30 751, 81.99%) had functional annotations. A total of 15 465 alternative splicing (AS) events and 65 761 alternative polyadenylation events were detected; wherein, the retained intron was the predominant type (5652, 36.55%) of AS. Furthermore, 1894 long non-coding RNAs and 1643 transcription factors were predicted using bioinformatics methods. Finally, the structural genes associated with fatty acid (FA) and flavonoid biosynthesis were characterized. A high frequency of AS accuracy (70.31%) was observed in FA synthesis-associated genes. The present study provides a full-length transcriptome dataset of pecan kernels, which will significantly enhance the understanding of the regulatory basis of phytochemical biosynthesis during pecan kernel maturation.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yupeng Cui ◽  
Xinqiang Gao ◽  
Jianshe Wang ◽  
Zengzhen Shang ◽  
Zhibin Zhang ◽  
...  

Artemisia argyi is an important medicinal plant widely utilized for moxibustion heat therapy in China. The terpenoid biosynthesis process in A. argyi is speculated to play a key role in conferring its medicinal value. However, the molecular mechanism underlying terpenoid biosynthesis remains unclear, in part because the reference genome of A. argyi is unavailable. Moreover, the full-length transcriptome of A. argyi has not yet been sequenced. Therefore, in this study, de novo transcriptome sequencing of A. argyi's root, stem, and leaf tissues was performed to obtain those candidate genes related to terpenoid biosynthesis, by combining the PacBio single-molecule real-time (SMRT) and Illumina sequencing NGS platforms. And more than 55.4 Gb of sequencing data and 108,846 full-length reads (non-chimeric) were generated by the Illumina and PacBio platform, respectively. Then, 53,043 consensus isoforms were clustered and used to represent 36,820 non-redundant transcripts, of which 34,839 (94.62%) were annotated in public databases. In the comparison sets of leaves vs roots, and leaves vs stems, 13,850 (7,566 up-regulated, 6,284 down-regulated) and 9,502 (5,284 up-regulated, 4,218 down-regulated) differentially expressed transcripts (DETs) were obtained, respectively. Specifically, the expression profile and KEGG functional enrichment analysis of these DETs indicated that they were significantly enriched in the biosynthesis of amino acids, carotenoids, diterpenoids and flavonoids, as well as the metabolism processes of glycine, serine and threonine. Moreover, multiple genes encoding significant enzymes or transcription factors related to diterpenoid biosynthesis were highly expressed in the A. argyi leaves. Additionally, several transcription factor families, such as RLK-Pelle_LRR-L-1 and RLK-Pelle_DLSV, were also identified. In conclusion, this study offers a valuable resource for transcriptome information, and provides a functional genomic foundation for further research on molecular mechanisms underlying the medicinal use of A. argyi leaves.


2020 ◽  
Vol 10 (10) ◽  
pp. 3505-3514
Author(s):  
Hongmei Zhuang ◽  
Qiang Wang ◽  
Hongwei Han ◽  
Huifang Liu ◽  
Hao Wang

To generate the full-length transcriptome of Xinjiang green and purple turnips, Brassica rapa var. Rapa, using single-molecule real-time (SMRT) sequencing. The samples of two varieties of Brassica rapa var. Rapa at five developmental stages were collected and combined to perform SMRT sequencing. Meanwhile, next generation sequencing was performed to correct SMRT sequencing data. A series of analyses were performed to investigate the transcript structure. Finally, the obtained transcripts were mapped to the genome of Brassica rapa ssp. pekinesis Chiifu to identify potential novel transcripts. For green turnip (F01), a total of 19.54 Gb clean data were obtained from 8 cells. The number of reads of insert (ROI) and full-length non-chimeric (FLNC) reads were 510,137 and 267,666. In addition, 82,640 consensus isoforms were obtained in the isoform sequences clustering, of which 69,480 were high-quality, and 13,160 low-quality sequences were corrected using Illumina RNA seq data. For purple turnip (F02), there were 20.41 Gb clean data, 552,829 ROIs, and 274,915 FLNC sequences. A total of 93,775 consensus isoforms were obtained, of which 78,798 were high-quality, and the 14,977 low-quality sequences were corrected. Following the removal of redundant sequences, there were 46,516 and 49,429 non-redundant transcripts for F01 and F02, respectively; 7,774 and 9,385 alternative splicing events were predicted for F01 and F02; 63,890 simple sequence repeats, 59,460 complete coding sequences, and 535 long-non coding RNAs were predicted. Moreover, 5,194 and 5,369 novel transcripts were identified by mapping to Brassica rapa ssp. pekinesis Chiifu. The obtained transcriptome data may improve turnip genome annotation and facilitate further study of the Brassica rapa var. Rapa genome and transcriptome.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi4-vi4
Author(s):  
Takahide Nejo ◽  
Darwin Kwok ◽  
Kevin Leung ◽  
Lin Wang ◽  
Albert Wang ◽  
...  

Abstract BACKGROUND To develop effective immunotherapy for gliomas, it is crucial to expand the repertoire of targetable antigens. Recent studies have suggested that alternative splicing (AS), or its deriving tumor-specific junctions (“neojunctions”), could generate cryptic amino acid sequences that can be a source of neoantigens. In this study, we investigated neojunctions based on multifaceted transcriptomic and proteomic analyses, seeking the potential cell surface antigens that may be targeted by CAR. METHODS For screening, we analyzed bulk RNA-sequencing data of TCGA-GBM/LGG with high tumor purity (n = 429) and GTEx normal tissues (n = 9,166). Cohorts of spatially mapped intratumoral samples and longitudinally collected tumors were used to determine clonality and stability of the candidate neojunctions. Nanopore long-read amplicon sequencing was deployed to confirm the full-length transcript sequence. Their protein-level expression was explored by analyzing the Clinical Proteomic Tumor Analysis Consortium (CPTAC)-GBM proteomics dataset. RESULTS In the screening analysis comparing TCGA and GTEx datasets, we identified 218 neojunctions with adequate expression, prevalence, and tumor-specificity. Of these, 12 were predicted to be cell-surface antigens. Eight of the 12, such as BCAN, DLL3, and PTPRZ1, were also observed in multiple cases of another validation dataset. In the analysis of tumors with spatially mapped intratumoral samples, 7 of the 12 were recurrently detected in no less than 50% of the samples in multiple cases. In addition, 5 of the 12 were found to be conserved in primary and recurrent pairs of tumors in multiple cases. Full-length transcript sequencing corroborated our predictions based on short reads, and also demonstrated more complex AS patterns. Finally, CPTAC-GBM proteomics analysis identified one cryptic peptide that substantiated the corresponding transcriptome-based prediction. CONCLUSION: We identified neojunctions with the potential to generate cell-surface antigens. These multifaceted transcriptomic and proteomic analyses provide the rationale to pursue the development of immunotherapy targeting neojunction-derived antigens.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yanzhi Feng ◽  
Yang Zhao ◽  
Jiajia Zhang ◽  
Baoping Wang ◽  
Chaowei Yang ◽  
...  

AbstractPaulownia catalpifolia is an important, fast-growing timber species known for its high density, color and texture. However, few transcriptomic and genetic studies have been conducted in P. catalpifolia. In this study, single-molecule real-time sequencing technology was applied to obtain the full-length transcriptome of P. catalpifolia leaves treated with varying degrees of drought stress. The sequencing data were then used to search for microsatellites, or simple sequence repeats (SSRs). A total of 28.83 Gb data were generated, 25,969 high-quality (HQ) transcripts with an average length of 1624 bp were acquired after removing the redundant reads, and 25,602 HQ transcripts (98.59%) were annotated using public databases. Among the HQ transcripts, 16,722 intact coding sequences, 149 long non-coding RNAs and 179 alternative splicing events were predicted, respectively. A total of 7367 SSR loci were distributed throughout 6293 HQ transcripts, of which 763 complex SSRs and 6604 complete SSRs. The SSR appearance frequency was 28.37%, and the average distribution distance was 5.59 kb. Among the 6604 complete SSR loci, 1–3 nucleotide repeats were dominant, occupying 97.85% of the total SSR loci, of which mono-, di- and tri-nucleotide repeats were 44.68%, 33.86% and 19.31%, respectively. We detected 112 repeat motifs, of which A/T (42.64%), AG/CT (12.22%), GA/TC (9.63%), GAA/TTC (1.57%) and CCA/TGG (1.54%) were most common in mono-, di- and tri-nucleotide repeats, respectively. The length of the repeat SSR motifs was 10–88 bp, and 4997 (75.67%) were ≤ 20 bp. This study provides a novel full-length transcriptome reference for P. catalpifolia and will facilitate the identification of germplasm resources and breeding of new drought-resistant P. catalpifolia varieties.


2020 ◽  
Author(s):  
Yanping Long ◽  
Zhijian Liu ◽  
Jinbu Jia ◽  
Weipeng Mo ◽  
Liang Fang ◽  
...  

AbstractThe broad application of large-scale single-cell RNA profiling in plants has been restricted by the prerequisite of protoplasting. We recently found that the Arabidopsis nucleus contains abundant polyadenylated mRNAs, many of which are incompletely spliced. To capture the isoform information, we combined 10x Genomics and Nanopore long-read sequencing to develop a protoplasting-free full-length single-nucleus RNA profiling method in plants. Our results demonstrated using Arabidopsis root that nuclear mRNAs faithfully retain cell identity information, and single-molecule full-length RNA sequencing could further improve cell type identification by revealing splicing status and alternative polyadenylation at single-cell level.


Sign in / Sign up

Export Citation Format

Share Document