Baltica: integrated splice junction usage analysis

Alternative splicing is a tightly regulated co- and post-transcriptional process contributing to the transcriptome diversity observed in eukaryotes. Several methods for detecting differential junction usage (DJU) from RNA sequencing (RNA-seq) datasets exist. Yet, efforts to integrate the results from DJU methods are lacking. Here, we present Baltica, a framework that provides workflows for quality control, de novo transcriptome assembly with StringTie2, and currently 4 DJU methods: rMATS, JunctionSeq, Majiq, and LeafCutter. Baltica puts the results from different DJU methods into context by integrating the results at the junction level. We present Baltica using 2 datasets, one containing known artificial transcripts (SIRVs) and the second dataset of paired Illumina and Oxford Nanopore Technologies RNA-seq. The data integration allows the user to compare the performance of the tools and reveals that JunctionSeq outperforms the other methods, in terms of F1 score, for both datasets. Finally, we demonstrate for the first time that meta-classifiers trained on scores of multiple methods outperform classifiers trained on scores of a single method, emphasizing the application of our data integration approach for differential splicing identification. Baltica is available at https://github.com/dieterich-lab/Baltica under MIT license.

Download Full-text

Extending rnaSPAdes functionality for hybrid transcriptome assembly

10.1101/2020.01.24.918482 ◽

2020 ◽

Cited By ~ 1

Author(s):

Andrey D. Prjibelski ◽

Giuseppe D. Puglia ◽

Dmitry Antipov ◽

Elena Bushmanova ◽

Daniela Giordano ◽

...

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Rna Seq ◽

Structure Information ◽

Oxford Nanopore ◽

Complete Sequences ◽

Novel Method ◽

Hybrid Assemblies ◽

Oxford Nanopore Technologies ◽

Alternative Isoforms

AbstractBackgroundDe novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data.ResultsIn this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data.ConclusionTo evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.Availability and implementationrnaSPAdes is implemented in C++ and Python and is freely available for Linux and MacOS under GPLv2 license at cab.spbu.ru/software/rnaspades/ and github.com/ablab/spades.

Download Full-text

Transcriptome landscape of the developing olive fruit fly embryo delineated by Oxford Nanopore long-read RNA-Seq

10.1101/478172 ◽

2018 ◽

Cited By ~ 5

Author(s):

Anthony Bayega ◽

Spyros Oikonomopoulos ◽

Eleftherios Zorbas ◽

Yu Chang Wang ◽

Maria-Eleni Gregoriou ◽

...

Keyword(s):

Embryo Development ◽

De Novo ◽

Transcriptome Assembly ◽

Fruit Fly ◽

Full Length ◽

Olive Fruit Fly ◽

Rna Seq ◽

Olive Fruit ◽

Olive Fly ◽

Oxford Nanopore

AbstractThe olive fruit fly or olive fly (Bactrocera oleae) is the most important pest of cultivated olive trees. Like all insects the olive fly undergoes complete metamorphosis. However, the transcription dynamics that occur during early embryonic development have not been explored, while detailed transcriptomic analysis in the absence of a fully annotated genome is challenging. We collected olive fly embryos at hourly intervals for the first 6 hours of development and performed full-length cDNA-Seq using a purpose designed SMARTer cDNA synthesis protocol followed by sequencing on the MinION (Oxford Nanopore Technologies). We generated 31 million total reads across the timepoints (median yield 4.2 million per timepoint). The reads showed 98 % alignment rate to the olive fly genome and 91 % alignment rate to the NBCI predicted B. oleae gene models. Over 50 % of the expressed genes had at least one read covering its entire length validating our full-length RNA-Seq procedure. Expression of 68 % of the predicted B. oleae genes was detected in the first six hours of development. We generated a de novo transcriptome assembly of the olive fly and identified 3553 novel genes and a total of 79,810 transcripts; a fourfold increase in transcriptome diversity compared to the NCBI predicted transcriptome. On a global scale, the first six hours of embryo development were characterized by dramatic transcriptome changes with the total number of transcripts per embryo dropping to half from the first hour to the second hour of embryo development. Clustering of genes based on temporal co-expression followed by gene-set enrichment analysiss of genes expressed in the first six hours of embryo development showed that genes involved in transcription and translation, macro-molecule biosynthesis, and neurodevelopment were highly enriched. These data provide the first insight into the transcriptome landscape of the developing olive fly embryo. The data also reveal transcript signatures of sex development. Overall, full-length sequencing of the cDNA molecules permitted a detailed characterization of the isoform complexity and the transcriptional dynamics of the first embryonic stages of the B. oleae.

Download Full-text

Improved Annotation with de novo Transcriptome Assembly in Four Social Amoeba Species

10.1101/054536 ◽

2016 ◽

Author(s):

Reema Singh ◽

Hajara M. Lawal ◽

Christina Schilde ◽

Gernot Glöeckner ◽

Geoff J. Barton ◽

...

Keyword(s):

Genome Sequencing ◽

Empirical Data ◽

Genome Annotation ◽

De Novo ◽

Transcriptome Assembly ◽

Rna Seq ◽

De Novo Transcriptome ◽

Novel Genes ◽

Gene Models ◽

First Time

ABSTRACTBackground:Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA-seq data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species.Results:An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Core Eukaryotic Genes Mapping Approach (CEGMA) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum.Conclusions:In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.

Download Full-text

RNA-Seq Based De Novo Transcriptome Assembly and Gene Discovery of Cistanche deserticola Fleshy Stem

PLoS ONE ◽

10.1371/journal.pone.0125722 ◽

2015 ◽

Vol 10 (5) ◽

pp. e0125722 ◽

Cited By ~ 8

Author(s):

Yuli Li ◽

Xiliang Wang ◽

Tingting Chen ◽

Fuwen Yao ◽

Cuiping Li ◽

...

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Gene Discovery ◽

De Novo Transcriptome Assembly ◽

Rna Seq ◽

De Novo Transcriptome ◽

Cistanche Deserticola

Download Full-text

A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome

Frontiers in Genetics ◽

10.3389/fgene.2014.00190 ◽

2014 ◽

Vol 5 ◽

Cited By ~ 24

Author(s):

Joanna Moreton ◽

Stephen P. Dunham ◽

Richard D. Emes

Keyword(s):

De Novo ◽

Anas Platyrhynchos ◽

Transcriptome Assembly ◽

De Novo Transcriptome Assembly ◽

Rna Seq ◽

Consensus Approach ◽

De Novo Transcriptome

Download Full-text

A practical guide to buildde-novoassemblies for single tissues of non-model organisms: the example of a Neotropical frog

PeerJ ◽

10.7717/peerj.3702 ◽

2017 ◽

Vol 5 ◽

pp. e3702 ◽

Cited By ~ 5

Author(s):

Santiago Montero-Mendieta ◽

Manfred Grabherr ◽

Henrik Lantz ◽

Ignacio De la Riva ◽

Jennifer A. Leonard ◽

...

Keyword(s):

Defense Mechanisms ◽

De Novo ◽

Transcriptome Assembly ◽

Cost Effective ◽

Model Organisms ◽

Rna Seq ◽

Assembly Pipeline ◽

Wide Variability ◽

History Of ◽

Inexperienced User

Whole genome sequencing (WGS) is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembledde-novo. We used RNA-seq to obtain the transcriptomic profile forOreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome ofO. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating ade-novotranscriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to buildde-novotranscriptome assemblies using readily available software and is freely available at:https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.

Download Full-text

Transcriptomic and Lipidomic Analysis of Lipids in Forsythia suspensa

Frontiers in Genetics ◽

10.3389/fgene.2021.758326 ◽

2021 ◽

Vol 12 ◽

Author(s):

Bei Wu ◽

Yinping Li ◽

Wenjia Zhao ◽

Zhiqiang Meng ◽

Wen Ji ◽

...

Keyword(s):

De Novo ◽

Growth Stages ◽

Rna Seq ◽

Dynamic Regulation ◽

Kegg Pathways ◽

Reference Information ◽

Forsythia Suspensa ◽

First Time ◽

Differential Genes ◽

Lipid Components

Forsythiae Fructus (Lianqiao in Chinese) is widely used in traditional Chinese medicine. The lipid components in Forsythiae Fructus are the basis of plant growth and active metabolism. Samples were collected at two growth stages for a comprehensive study. Transcriptome and lipidomics were performed by using the RNA-seq and UPLC-Q-TOF-MS techniques separately. For the first time, it was reported that there were 5802 lipid components in Lianqiao comprised of 31.7% glycerolipids, 16.57% phospholipids, 13.18% sphingolipids, and 10.54% fatty acids. Lipid components such as terpenes and flavonoids have pharmacological activity, but their content was low. Among these lipids which were isolated from Forsythiae Fructus, 139 showed significant differences from the May and July harvest periods. The lipids of natural products are mainly concentrated in pregnenolones and polyvinyl lipids. RNA-Seq analysis revealed 92,294 unigenes, and 1533 of these were differentially expressed. There were 551 differential genes enriched in 119 KEGG pathways. The de novo synthesis pathways of terpenoids and flavonoids were explored. Combined with the results of lipidomics and transcriptomics, it is hypothesized that in the synthesis of abscisic acid, a terpenoid, may be under the dynamic regulation of genes EC: 1.1.1.288, EC: 1.14.14.137 and EC: 1.13.11.51 in balanced state. In the synthesis of gibberellin, GA20-oxidase (GA20ox, EC: 1.14.11.12), and GA3-oxidase (GA3ox, EC: 1.14.11.15) catalyze the production of active GAs, and EC: 1.14.11.13 is the metabolic enzymes of active GAs. In the synthesis of flavonoids, MF (multifunctional), PAL (phenylalanine ammonia-lyase), CHS (chalcone synthase), ANS (anthocyanidin synthase), FLS (flavonol synthase) are all key enzymes. The results of the present study provide valuable reference information for further research on the metabolic pathways of the secondary metabolites of Forsythia suspensa.

Download Full-text

Circular RNA profiling reveals abundant and diverse circRNAs of SARS-CoV-2, SARS-CoV and MERS-CoV origin

10.1101/2020.12.07.415422 ◽

2020 ◽

Author(s):

Shaomin Yang ◽

Hong Zhou ◽

Ruth Cruz-Cosme ◽

Mingde Liu ◽

Jiayu Xu ◽

...

Keyword(s):

De Novo ◽

Splice Junction ◽

Circular Rna ◽

Circular Rnas ◽

Sequencing Data ◽

Rna Profiling ◽

Systematic Strategy ◽

Coronavirus Infection ◽

Abundance And Diversity ◽

First Time

ABSTRACTCircular RNAs (circRNAs) encoded by DNA genomes have been identified across host and pathogen species as parts of the transcriptome. Accumulating evidences indicate that circRNAs play critical roles in autoimmune diseases and viral pathogenesis. Here we report that RNA viruses of the Betacoronavirus genus of Coronaviridae, SARS-CoV-2, SARS-CoV and MERS-CoV, encode a novel type of circRNAs. Through de novo circRNA analyses of publicly available coronavirus-infection related deep RNA-Sequencing data, we identified 351, 224 and 2,764 circRNAs derived from SARS-CoV-2, SARS-CoV and MERS-CoV, respectively, and characterized two major back-splice events shared by these viruses. Coronavirus-derived circRNAs are more abundant and longer compared to host genome-derived circRNAs. Using a systematic strategy to amplify and identify back-splice junction sequences, we experimentally identified over 100 viral circRNAs from SARS-CoV-2 infected Vero E6 cells. This collection of circRNAs provided the first line of evidence for the abundance and diversity of coronavirus-derived circRNAs and suggested possible mechanisms driving circRNA biogenesis from RNA genomes. Our findings highlight circRNAs as an important component of the coronavirus transcriptome.SummaryWe report for the first time that abundant and diverse circRNAs are generated by SARS-CoV-2, SARS-CoV and MERS-CoV and represent a novel type of circRNAs that differ from circRNAs encoded by DNA genomes.

Download Full-text

De novo transcriptome assembly of RNA-Seq reads with different strategies

Science China Life Sciences ◽

10.1007/s11427-011-4256-9 ◽

2011 ◽

Vol 54 (12) ◽

pp. 1129-1133 ◽

Cited By ~ 11

Author(s):

Geng Chen ◽

KangPing Yin ◽

Charles Wang ◽

TieLiu Shi

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

De Novo Transcriptome Assembly ◽

Rna Seq ◽

De Novo Transcriptome

Download Full-text

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

10.1101/420208 ◽

2018 ◽

Cited By ~ 13

Author(s):

Elena Bushmanova ◽

Dmitry Antipov ◽

Alla Lapidus ◽

Andrey D. Prjibelski

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

Challenging Problem ◽

Rna Seq ◽

De Novo Transcriptome ◽

Weak Points ◽

Transcriptome Reconstruction ◽

Evaluation Approaches ◽

Genome Assembler

AbstractSummaryPossibility to generate large RNA-seq datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the model organisms with finished and annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing and paralogous genes. In this paper we describe a novel transcriptome assembler called rnaSPAdes, which is developed on top of SPAdes genome assembler and explores surprising computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-Seq datasets, and briefly highlight strong and weak points of different assemblers.Availability and implementationrnaSPAdes is implemented in C++ and Python and is freely available at cab.spbu.ru/software/rnaspades/.

Download Full-text