The Oyster River Protocol: a multi-assembler and kmer approach for de novo transcriptome assembly

Characterizing transcriptomes in non-model organisms has resulted in a massive increase in our understanding of biological phenomena. This boon, largely made possible via high-throughput sequencing, means that studies of functional, evolutionary, and population genomics are now being done by hundreds or even thousands of labs around the world. For many, these studies begin with a de novo transcriptome assembly, which is a technically complicated process involving several discrete steps. The Oyster River Protocol (ORP), described here, implements a standardized and benchmarked set of bioinformatic processes, resulting in an assembly with enhanced qualities over other standard assembly methods. Specifically, ORP produced assemblies have higher Detonate and TransRate scores and mapping rates, which is largely a product of the fact that it leverages a multi-assembler and kmer assembly process, thereby bypassing the shortcomings of any one approach. These improvements are important, as previously unassembled transcripts are included in ORP assemblies, resulting in a significant enhancement of the power of downstream analysis. Further, as part of this study, I show that assembly quality is unrelated with the number of reads generated, above 30 million reads. Code Availability: The version controlled open-source code is available at https://github.com/macmanes-lab/Oyster_River_Protocol. Instructions for software installation and use, and other details are available at http://oyster-river-protocol.rtfd.org/.

Download Full-text

The Oyster River Protocol: A Multi Assembler and Kmer Approach For de novo Transcriptome Assembly

10.1101/177253 ◽

2017 ◽

Cited By ~ 2

Author(s):

Matthew D. MacManes

Keyword(s):

High Throughput Sequencing ◽

Population Genomics ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

De Novo Transcriptome ◽

Link Type ◽

Biological Phenomena ◽

Complicated Process ◽

Downstream Analysis

AbstractCharacterizing transcriptomes in non-model organisms has resulted in a massive increase in our understanding of biological phenomena. This boon, largely made possible via high-throughput sequencing, means that studies of functional, evolutionary and population genomics are now being done by hundreds or even thousands of labs around the world. For many, these studies begin with a de novo transcriptome assembly, which is a technically complicated process involving several discrete steps. The Oyster River Protocol (ORP), described here, implements a standardized and benchmarked set of bioinformatic processes, resulting in an assembly with enhanced qualities over other standard assembly methods. Specifically, ORP produced assemblies have higher Detonate and TransRate scores and mapping rates, which is largely a product of the fact that it leverages a multi-assembler and kmer assembly process, thereby bypassing the shortcomings of any one approach. These improvements are important, as previously unassembled transcripts are included in ORP assemblies, resulting in a significant enhancement of the power of downstream analysis. Further, as part of this study, I show that assembly quality is unrelated with the number of reads generated, above 30 million reads. Code Availability: The version controlled open-source code is available at https://github.com/macmanes-lab/Oyster_River_Protocol. Instructions for software installation and use, and other details are available at http://oyster-river-protocol.rtfd.org/.

Download Full-text

Establishing evidenced-based best practice for the de novo assembly and evaluation of transcriptomes from non-model organisms

10.1101/035642 ◽

2015 ◽

Cited By ~ 25

Author(s):

Matthew D MacManes

Keyword(s):

Best Practice ◽

High Throughput Sequencing ◽

Population Genomics ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

Single Individual ◽

Biological Phenomena ◽

Or Gene ◽

Evidenced Based

Characterizing transcriptomes in both model and non-model organisms has resulted in a massive increase in our understanding of biological phenomena. This boon, largely made possible via high-throughput sequencing, means that studies of functional, evolutionary and population genomics are now being done by hundreds or even thousands of labs around the world. For many, these studies begin with a de novo transcriptome assembly, which is a technically complicated process involving several discrete steps. Each step may be accomplished in one of several different ways, using different software packages, each producing different results. This analytical complexity begs the question -- Which method(s) are optimal? Using reference and non-reference based evaluative methods, I propose a set of guidelines that aim to standardize and facilitate the process of transcriptome assembly. These recommendations include the generation of between 20 million and 40 million sequencing reads from single individual where possible, error correction of reads, gentle quality trimming, assembly filtering using Transrate and/or gene expression, annotation using dammit, and appropriate reporting. These recommendations have been extensively benchmarked and applied to publicly available transcriptomes, resulting in improvements in both content and contiguity. To facilitate the implementation of the proposed standardized methods, I have released a set of version controlled open-sourced code, The Oyster River Protocol for Transcriptome Assembly, available at http://oyster-river-protocol.rtfd.org/.

Download Full-text

The brain transcriptome of the wolf spider, Schizocosa ocreata

BMC Research Notes ◽

10.1186/s13104-021-05648-y ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Daniel Stribling ◽

Peter L. Chang ◽

Justin E. Dalton ◽

Christopher A. Conow ◽

Malcolm Rosenthal ◽

...

Keyword(s):

Gene Expression ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome ◽

Wolf Spiders ◽

Schizocosa Ocreata ◽

Genomic Studies ◽

The Brain

Abstract Objectives Arachnids have fascinating and unique biology, particularly for questions on sex differences and behavior, creating the potential for development of powerful emerging models in this group. Recent advances in genomic techniques have paved the way for a significant increase in the breadth of genomic studies in non-model organisms. One growing area of research is comparative transcriptomics. When phylogenetic relationships to model organisms are known, comparative genomic studies provide context for analysis of homologous genes and pathways. The goal of this study was to lay the groundwork for comparative transcriptomics of sex differences in the brain of wolf spiders, a non-model organism of the pyhlum Euarthropoda, by generating transcriptomes and analyzing gene expression. Data description To examine sex-differential gene expression, short read transcript sequencing and de novo transcriptome assembly were performed. Messenger RNA was isolated from brain tissue of male and female subadult and mature wolf spiders (Schizocosa ocreata). The raw data consist of sequences for the two different life stages in each sex. Computational analyses on these data include de novo transcriptome assembly and differential expression analyses. Sample-specific and combined transcriptomes, gene annotations, and differential expression results are described in this data note and are available from publicly-available databases.

Download Full-text

A Pipeline for Non-model Organisms for de novo Transcriptome Assembly, Annotation, and Gene Ontology Analysis Using Open Tools: Case Study with Scots Pine

BIO-PROTOCOL ◽

10.21769/bioprotoc.3912 ◽

2021 ◽

Vol 11 (3) ◽

Author(s):

Gustavo Duarte ◽

Polina Yu. ◽

Stanislav Geras’kin

Keyword(s):

Gene Ontology ◽

Scots Pine ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

Gene Ontology Analysis ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome

Download Full-text

High-throughput sequencing and de novo transcriptome assembly of Swertia japonica to identify genes involved in the biosynthesis of therapeutic metabolites

Plant Cell Reports ◽

10.1007/s00299-016-2021-z ◽

2016 ◽

Vol 35 (10) ◽

pp. 2091-2111 ◽

Cited By ~ 20

Author(s):

Amit Rai ◽

Michimi Nakamura ◽

Hiroki Takahashi ◽

Hideyuki Suzuki ◽

Kazuki Saito ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

De Novo ◽

Transcriptome Assembly ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome ◽

Swertia Japonica

Download Full-text

A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms

BMC Genomics ◽

10.1186/s12864-017-3735-1 ◽

2017 ◽

Vol 18 (S4) ◽

Cited By ~ 5

Author(s):

Sing-Hoi Sze ◽

Meaghan L. Pimsler ◽

Jeffery K. Tomberlin ◽

Corbin D. Jones ◽

Aaron M. Tarone

Keyword(s):

Efficient Algorithm ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome ◽

Memory Efficient

Download Full-text

DTA-SiST: de novo transcriptome assembly by using simplified suffix trees

BMC Bioinformatics ◽

10.1186/s12859-019-3272-9 ◽

2019 ◽

Vol 20 (S25) ◽

Author(s):

Jin Zhao ◽

Haodi Feng ◽

Daming Zhu ◽

Chi Zhang ◽

Ying Xu

Keyword(s):

Suffix Tree ◽

High Throughput Sequencing ◽

De Novo ◽

State Of The Art ◽

Linear Time ◽

Transcriptome Assembly ◽

De Novo Transcriptome Assembly ◽

Suffix Trees ◽

De Novo Transcriptome ◽

Hybrid Strategy

Abstract Background Alternative splicing allows the pre-mRNAs of a gene to be spliced into various mRNAs, which greatly increases the diversity of proteins. High-throughput sequencing of mRNAs has revolutionized our ability for transcripts reconstruction. However, the massive size of short reads makes de novo transcripts assembly an algorithmic challenge. Results We develop a novel radical framework, called DTA-SiST, for de novo transcriptome assembly based on suffix trees. DTA-SiST first extends contigs by reads that have the longest overlaps with the contigs’ terminuses. These reads can be found in linear time of the lengths of the reads through a well-designed suffix tree structure. Then, DTA-SiST constructs splicing graphs based on contigs for each gene locus. Finally, DTA-SiST proposes two strategies to extract transcript-representing paths: a depth-first enumeration strategy and a hybrid strategy based on length and coverage. We implemented the above two strategies and compared them with the state-of-the-art de novo assemblers on both simulated and real datasets. Experimental results showed that the depth-first enumeration strategy performs always better with recall and also better with precision for smaller datasets while the hybrid strategy leads with precision for big datasets. Conclusions DTA-SiST performs more competitive than the other compared de novo assemblers especially with precision measure, due to the read-based contig extension strategy and the elegant transcripts extraction rules.

Download Full-text

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

BMC Bioinformatics ◽

10.1186/1471-2105-13-170 ◽

2012 ◽

Vol 13 (1) ◽

pp. 170 ◽

Cited By ~ 24

Author(s):

Berat Z Haznedaroglu ◽

Darryl Reeves ◽

Hamid Rismani-Yazdi ◽

Jordan Peccia

Keyword(s):

High Throughput ◽

Functional Annotation ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

De Novo Transcriptome Assembly ◽

Sequencing Data ◽

De Novo Transcriptome ◽

Short Read ◽

Short Read Sequencing

Download Full-text

Data on the first functionally-annotated de novo transcriptome assembly for North American flying squirrels (genus Glaucomys)

Data in Brief ◽

10.1016/j.dib.2021.107267 ◽

2021 ◽

pp. 107267

Author(s):

Michael G.C. Brown ◽

Jeff Bowman ◽

Paul J. Wilson

Keyword(s):

North American ◽

De Novo ◽

Transcriptome Assembly ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome ◽

Flying Squirrels

Download Full-text

Sub-Chronic Effects of Slight PAH- and PCB-Contaminated Mesocosms in Paracentrotus lividus Lmk: A Multi-Endpoint Approach and De Novo Transcriptomic

International Journal of Molecular Sciences ◽

10.3390/ijms22136674 ◽

2021 ◽

Vol 22 (13) ◽

pp. 6674

Author(s):

Luisa Albarano ◽

Valerio Zupo ◽

Davide Caramiello ◽

Maria Toscanesi ◽

Marco Trifuoggi ◽

...

Keyword(s):

De Novo ◽

Sea Urchins ◽

Transcriptome Assembly ◽

Paracentrotus Lividus ◽

Marine Biota ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome ◽

Limit Values ◽

The Impact ◽

Endpoint Approach

Sediment pollution is a major issue in coastal areas, potentially endangering human health and the marine environments. We investigated the short-term sublethal effects of sediments contaminated with polycyclic aromatic hydrocarbons (PAHs) and polychlorinated biphenyls (PCBs) on the sea urchin Paracentrotus lividus for two months. Spiking occurred at concentrations below threshold limit values permitted by the law (TLVPAHs = 900 µg/L, TLVPCBs = 8 µg/L, Legislative Italian Decree 173/2016). A multi-endpoint approach was adopted, considering both adults (mortality, bioaccumulation and gonadal index) and embryos (embryotoxicity, genotoxicity and de novo transcriptome assembly). The slight concentrations of PAHs and PCBs added to the mesocosms were observed to readily compartmentalize in adults, resulting below the detection limits just one week after their addition. Reconstructed sediment and seawater, as negative controls, did not affect sea urchins. PAH- and PCB-spiked mesocosms were observed to impair P. lividus at various endpoints, including bioaccumulation and embryo development (mainly PAHs) and genotoxicity (PAHs and PCBs). In particular, genotoxicity tests revealed that PAHs and PCBs affected the development of P. lividus embryos deriving from exposed adults. Negative effects were also detected by generating a de novo transcriptome assembly and its annotation, as well as by real-time qPCR performed to identify genes differentially expressed in adults exposed to the two contaminants. The effects on sea urchins (both adults and embryos) at background concentrations of PAHs and PCBs below TLV suggest a need for further investigations on the impact of slight concentrations of such contaminants on marine biota.

Download Full-text