Evaluating a lightweight transcriptome assembly pipeline on two closely related ascidian species

10.7287/peerj.preprints.505 ◽

2014 ◽

Cited By ~ 2

Author(s):

Elijah K Lowe ◽

Billie J Swalla ◽

C. Titus Brown

Keyword(s):

Developmental Stages ◽

De Novo ◽

Transcriptome Assembly ◽

Marine Species ◽

Model Organisms ◽

The Past ◽

Ascidian Species ◽

Assembly Pipeline ◽

Quality Filtering

De novo transcriptome sequencing and assembly for non-model organisms has become prevalent in the past decade. However, most assembly approaches are computationally expensive, and little in-depth evaluation has been done to compare de novo approaches. We sequenced several developmental stages of two free-spawning marine species—Molgula occulta and Molgula oculata—assembled their transcriptomes using four different combinations of preprocessing and assembly approaches, and evaluated the quality of the assembly. We present a straightforward and reproducible mRNAseq assembly protocol that combines quality filtering, digital normalization, and assembly, together with several metrics to evaluate our de novo assemblies. The use of digital normalization in the protocol reduces the time and memory needed to complete the assembly and makes this pipeline available to labs without large computing infrastructure. Despite varying widely in basic assembly statistics, all of the assembled transcriptomes evaluate well in metrics such as gene recovery and estimated completeness.

Download Full-text

A practical guide to buildde-novoassemblies for single tissues of non-model organisms: the example of a Neotropical frog

PeerJ ◽

10.7717/peerj.3702 ◽

2017 ◽

Vol 5 ◽

pp. e3702 ◽

Cited By ~ 5

Author(s):

Santiago Montero-Mendieta ◽

Manfred Grabherr ◽

Henrik Lantz ◽

Ignacio De la Riva ◽

Jennifer A. Leonard ◽

...

Keyword(s):

Defense Mechanisms ◽

De Novo ◽

Transcriptome Assembly ◽

Cost Effective ◽

Model Organisms ◽

Rna Seq ◽

Assembly Pipeline ◽

Wide Variability ◽

History Of ◽

Inexperienced User

Whole genome sequencing (WGS) is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembledde-novo. We used RNA-seq to obtain the transcriptomic profile forOreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome ofO. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating ade-novotranscriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to buildde-novotranscriptome assemblies using readily available software and is freely available at:https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.

Download Full-text

Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies

10.7287/peerj.preprints.2284 ◽

2016 ◽

Author(s):

Cédric Cabau ◽

Frédéric Escudié ◽

Anis Djari ◽

Yann Guiguen ◽

Julien Bobe ◽

...

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Error Rates ◽

Rna Seq ◽

De Novo Transcriptome ◽

Software Packages ◽

Redundancy Reduction ◽

Assembly Pipeline ◽

Free Open Source

Background De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly used de novo transcriptome assemblers. The contig sets they produce are of good quality. Still, their compaction (number of contigs needed to represent the transcriptome) and their quality (chimera and nucleotide error rates) can be improved. Results We built a de novo RNA-Seq Assembly Pipeline (DRAP) which wraps these two assemblers (Trinity and Oases) in order to improve their results regarding the above-mentioned criteria. DRAP reduces from 1,3 to 15 fold the number of resulting contigs of the assemblies depending on the read set and the assembler used. This article presents seven assembly comparisons showing in some cases drastic improvements when using DRAP. DRAP does not significantly impair assembly quality metrics such are read realignment rate or protein reconstruction counts. Conclusion Transcriptome assembly is a challenging computational task even if good solutions are already available to end-users, these solutions can still be improved while conserving the overall representation and quality of the assembly. The de novo RNA-Seq Assembly Pipeline (DRAP) is an ease to use software package to produce compact and corrected transcript set. DRAP is free, open-source and available at http://www.sigenae.org/drap .

Download Full-text

Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies

10.7287/peerj.preprints.2284v1 ◽

2016 ◽

Cited By ~ 1

Author(s):

Cédric Cabau ◽

Frédéric Escudié ◽

Anis Djari ◽

Yann Guiguen ◽

Julien Bobe ◽

...

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Error Rates ◽

Rna Seq ◽

De Novo Transcriptome ◽

Software Packages ◽

Redundancy Reduction ◽

Assembly Pipeline ◽

Free Open Source

Background De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly used de novo transcriptome assemblers. The contig sets they produce are of good quality. Still, their compaction (number of contigs needed to represent the transcriptome) and their quality (chimera and nucleotide error rates) can be improved. Results We built a de novo RNA-Seq Assembly Pipeline (DRAP) which wraps these two assemblers (Trinity and Oases) in order to improve their results regarding the above-mentioned criteria. DRAP reduces from 1,3 to 15 fold the number of resulting contigs of the assemblies depending on the read set and the assembler used. This article presents seven assembly comparisons showing in some cases drastic improvements when using DRAP. DRAP does not significantly impair assembly quality metrics such are read realignment rate or protein reconstruction counts. Conclusion Transcriptome assembly is a challenging computational task even if good solutions are already available to end-users, these solutions can still be improved while conserving the overall representation and quality of the assembly. The de novo RNA-Seq Assembly Pipeline (DRAP) is an ease to use software package to produce compact and corrected transcript set. DRAP is free, open-source and available at http://www.sigenae.org/drap .

Download Full-text

Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies

PeerJ ◽

10.7717/peerj.2988 ◽

2017 ◽

Vol 5 ◽

pp. e2988 ◽

Cited By ~ 45

Author(s):

Cédric Cabau ◽

Frédéric Escudié ◽

Anis Djari ◽

Yann Guiguen ◽

Julien Bobe ◽

...

Keyword(s):

Reference Genome ◽

De Novo ◽

Transcriptome Assembly ◽

Error Rates ◽

Rna Seq ◽

Software Packages ◽

Redundancy Reduction ◽

Assembly Pipeline ◽

Free Open Source

Background De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly used de novo transcriptome assemblers. The contig sets they produce are of good quality. Still, their compaction (number of contigs needed to represent the transcriptome) and their quality (chimera and nucleotide error rates) can be improved. Results We built a de novo RNA-Seq Assembly Pipeline (DRAP) which wraps these two assemblers (Trinity and Oases) in order to improve their results regarding the above-mentioned criteria. DRAP reduces from 1.3 to 15 fold the number of resulting contigs of the assemblies depending on the read set and the assembler used. This article presents seven assembly comparisons showing in some cases drastic improvements when using DRAP. DRAP does not significantly impair assembly quality metrics such are read realignment rate or protein reconstruction counts. Conclusion Transcriptome assembly is a challenging computational task even if good solutions are already available to end-users, these solutions can still be improved while conserving the overall representation and quality of the assembly. The de novo RNA-Seq Assembly Pipeline (DRAP) is an easy to use software package to produce compact and corrected transcript set. DRAP is free, open-source and available under GPL V3 license at http://www.sigenae.org/drap.

Download Full-text

The brain transcriptome of the wolf spider, Schizocosa ocreata

BMC Research Notes ◽

10.1186/s13104-021-05648-y ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Daniel Stribling ◽

Peter L. Chang ◽

Justin E. Dalton ◽

Christopher A. Conow ◽

Malcolm Rosenthal ◽

...

Keyword(s):

Gene Expression ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome ◽

Wolf Spiders ◽

Schizocosa Ocreata ◽

Genomic Studies ◽

The Brain

Abstract Objectives Arachnids have fascinating and unique biology, particularly for questions on sex differences and behavior, creating the potential for development of powerful emerging models in this group. Recent advances in genomic techniques have paved the way for a significant increase in the breadth of genomic studies in non-model organisms. One growing area of research is comparative transcriptomics. When phylogenetic relationships to model organisms are known, comparative genomic studies provide context for analysis of homologous genes and pathways. The goal of this study was to lay the groundwork for comparative transcriptomics of sex differences in the brain of wolf spiders, a non-model organism of the pyhlum Euarthropoda, by generating transcriptomes and analyzing gene expression. Data description To examine sex-differential gene expression, short read transcript sequencing and de novo transcriptome assembly were performed. Messenger RNA was isolated from brain tissue of male and female subadult and mature wolf spiders (Schizocosa ocreata). The raw data consist of sequences for the two different life stages in each sex. Computational analyses on these data include de novo transcriptome assembly and differential expression analyses. Sample-specific and combined transcriptomes, gene annotations, and differential expression results are described in this data note and are available from publicly-available databases.

Download Full-text

Characterization of Embryonic Skin Transcriptome in Anser cygnoides at Three Feather Follicles Developmental Stages

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400875 ◽

2019 ◽

Vol 10 (2) ◽

pp. 443-454

Author(s):

Chang Liu ◽

Cornelius Tlotliso Sello ◽

Yujian Sui ◽

Jingtao Hu ◽

Shaokang Chen ◽

...

Keyword(s):

Signaling Pathway ◽

Developmental Stages ◽

De Novo ◽

Transcriptome Assembly ◽

Expression Profiles ◽

Expression Patterns ◽

Mitogen Activated Protein Kinase ◽

Receptor Interaction ◽

Keratinocyte Proliferation ◽

Skin Development

In order to enrich the Anser cygnoides genome and identify the gene expression profiles of primary and secondary feather follicles development, de novo transcriptome assembly of skin tissues was established by analyzing three developmental stages at embryonic day 14, 18, and 28 (E14, E18, E28). Sequencing output generated 436,730,608 clean reads from nine libraries and de novo assembled into 56,301 unigenes. There were 2,298, 9,423 and 12,559 unigenes showing differential expression in three stages respectively. Furthermore, differentially expressed genes (DEGs) were functionally classified according to genes ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and series-cluster analysis. Relevant specific GO terms such as epithelium development, regulation of keratinocyte proliferation, morphogenesis of an epithelium were identified. In all, 15,144 DEGs were clustered into eight profiles with distinct expression patterns and 2,424 DEGs were assigned to 198 KEGG pathways. Skin development related pathways (mitogen-activated protein kinase signaling pathway, extra-cellular matrix -receptor interaction, Wingless-type signaling pathway) and genes (delta like canonical Notch ligand 1, fibroblast growth factor 2, Snail family transcriptional repressor 2, bone morphogenetic protein 6, polo like kinase 1) were identified, and eight DEGs were selected to verify the reliability of transcriptome results by real-time quantitative PCR. The findings of this study will provide the key insights into the complicated molecular mechanism and breeding techniques underlying the developmental characteristics of skin and feather follicles in Anser cygnoides.

Download Full-text

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data

10.1101/420208 ◽

2018 ◽

Cited By ~ 13

Author(s):

Elena Bushmanova ◽

Dmitry Antipov ◽

Alla Lapidus ◽

Andrey D. Prjibelski

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

Challenging Problem ◽

Rna Seq ◽

De Novo Transcriptome ◽

Weak Points ◽

Transcriptome Reconstruction ◽

Evaluation Approaches ◽

Genome Assembler

AbstractSummaryPossibility to generate large RNA-seq datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the model organisms with finished and annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing and paralogous genes. In this paper we describe a novel transcriptome assembler called rnaSPAdes, which is developed on top of SPAdes genome assembler and explores surprising computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-Seq datasets, and briefly highlight strong and weak points of different assemblers.Availability and implementationrnaSPAdes is implemented in C++ and Python and is freely available at cab.spbu.ru/software/rnaspades/.

Download Full-text

K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity

10.1101/149948 ◽

2017 ◽

Author(s):

Chang Sik Kim ◽

Martyn D. Winn ◽

Vipin Sachdeva ◽

Kirk E. Jordan

Keyword(s):

Clustering Algorithm ◽

De Novo ◽

Transcriptome Assembly ◽

Initial Step ◽

Computer Hardware ◽

Model Organisms ◽

De Bruijn Graph ◽

Memory Representation ◽

Novel Approach ◽

Sequencing Problems

AbstractBackgroundDe novo transcriptome assembly is an important technique for understanding gene expression in non-model organisms. Many de novo assemblers using the de Bruijn graph of a set of the RNA sequences rely on in-memory representation of this graph. However, current methods analyse the complete set of read-derived k-mer sequence at once, resulting in the need for computer hardware with large shared memory.ResultsWe introduce a novel approach that clusters k-mers as the first step. The clusters correspond to small sets of gene products, which can be processed quickly to give candidate transcripts. We implement the clustering step using the MapReduce approach for parallelising the analysis of large datasets, which enables the use of compute clusters. The computational task is distributed across the compute system, and no specialised hardware is required. Using this approach, we have re-implemented the Inchworm module from the widely used Trinity pipeline, and tested the method in the context of the full Trinity pipeline. Validation tests on a range of real datasets show large reductions in the runtime and per-node memory requirements, when making use of a compute cluster.ConclusionsOur study shows that MapReduce-based clustering has great potential for distributing challenging sequencing problems, without loss of accuracy. Although we have focussed on the Trinity package, we propose that such clustering is a useful initial step for other assembly pipelines.

Download Full-text

Initial virome characterization of the common cnidarian lab model Nematostella vectensis

10.1101/2020.01.14.906370 ◽

2020 ◽

Author(s):

Magda Lewandowska ◽

Yael Hazan ◽

Yehu Moran

Keyword(s):

Developmental Stages ◽

De Novo ◽

Artemia Salina ◽

Model Organisms ◽

Nematostella Vectensis ◽

Virus Identification ◽

Functional Studies ◽

Powerful Approach ◽

Viral Communities ◽

Viral Sequences

AbstractThe role of viruses in forming a stable holobiont has been a subject of extensive research in the recent years. However, many emerging model organisms still lack any data on the composition of the associated viral communities. Here, we re-analyzed seven publicly available transcriptome datasets of the starlet sea anemone Nematostella vectensis, the most commonly used anthozoan lab model, and searched for viral sequences. We applied a straightforward, yet powerful approach of de novo assembly followed by homology-based virus identification and a multi-step, thorough taxonomic validation. The comparison of different lab populations of N. vectensis revealed the existence of the core virome composed of 21 viral sequences, present in all adult datasets. Unexpectedly, we observed almost complete lack of viruses in the samples from the early developmental stages which together with the identification of the viruses shared with the major source of the food in the lab, the brine shrimp Artemia salina, shed new light on the course of viral species acquisition in N. vectensis. Our study provides an initial, yet comprehensive insight into N. vectensis virome and sets the first foundation for functional studies of viruses and antiviral systems in this lab model cnidarian.

Download Full-text