scholarly journals The Oyster River Protocol: a multi-assembler and kmer approach for de novo transcriptome assembly

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5428 ◽  
Author(s):  
Matthew D. MacManes

Characterizing transcriptomes in non-model organisms has resulted in a massive increase in our understanding of biological phenomena. This boon, largely made possible via high-throughput sequencing, means that studies of functional, evolutionary, and population genomics are now being done by hundreds or even thousands of labs around the world. For many, these studies begin with a de novo transcriptome assembly, which is a technically complicated process involving several discrete steps. The Oyster River Protocol (ORP), described here, implements a standardized and benchmarked set of bioinformatic processes, resulting in an assembly with enhanced qualities over other standard assembly methods. Specifically, ORP produced assemblies have higher Detonate and TransRate scores and mapping rates, which is largely a product of the fact that it leverages a multi-assembler and kmer assembly process, thereby bypassing the shortcomings of any one approach. These improvements are important, as previously unassembled transcripts are included in ORP assemblies, resulting in a significant enhancement of the power of downstream analysis. Further, as part of this study, I show that assembly quality is unrelated with the number of reads generated, above 30 million reads. Code Availability: The version controlled open-source code is available at https://github.com/macmanes-lab/Oyster_River_Protocol. Instructions for software installation and use, and other details are available at http://oyster-river-protocol.rtfd.org/.

2017 ◽  
Author(s):  
Matthew D. MacManes

AbstractCharacterizing transcriptomes in non-model organisms has resulted in a massive increase in our understanding of biological phenomena. This boon, largely made possible via high-throughput sequencing, means that studies of functional, evolutionary and population genomics are now being done by hundreds or even thousands of labs around the world. For many, these studies begin with a de novo transcriptome assembly, which is a technically complicated process involving several discrete steps. The Oyster River Protocol (ORP), described here, implements a standardized and benchmarked set of bioinformatic processes, resulting in an assembly with enhanced qualities over other standard assembly methods. Specifically, ORP produced assemblies have higher Detonate and TransRate scores and mapping rates, which is largely a product of the fact that it leverages a multi-assembler and kmer assembly process, thereby bypassing the shortcomings of any one approach. These improvements are important, as previously unassembled transcripts are included in ORP assemblies, resulting in a significant enhancement of the power of downstream analysis. Further, as part of this study, I show that assembly quality is unrelated with the number of reads generated, above 30 million reads. Code Availability: The version controlled open-source code is available at https://github.com/macmanes-lab/Oyster_River_Protocol. Instructions for software installation and use, and other details are available at http://oyster-river-protocol.rtfd.org/.


2015 ◽  
Author(s):  
Matthew D MacManes

Characterizing transcriptomes in both model and non-model organisms has resulted in a massive increase in our understanding of biological phenomena. This boon, largely made possible via high-throughput sequencing, means that studies of functional, evolutionary and population genomics are now being done by hundreds or even thousands of labs around the world. For many, these studies begin with a de novo transcriptome assembly, which is a technically complicated process involving several discrete steps. Each step may be accomplished in one of several different ways, using different software packages, each producing different results. This analytical complexity begs the question -- Which method(s) are optimal? Using reference and non-reference based evaluative methods, I propose a set of guidelines that aim to standardize and facilitate the process of transcriptome assembly. These recommendations include the generation of between 20 million and 40 million sequencing reads from single individual where possible, error correction of reads, gentle quality trimming, assembly filtering using Transrate and/or gene expression, annotation using dammit, and appropriate reporting. These recommendations have been extensively benchmarked and applied to publicly available transcriptomes, resulting in improvements in both content and contiguity. To facilitate the implementation of the proposed standardized methods, I have released a set of version controlled open-sourced code, The Oyster River Protocol for Transcriptome Assembly, available at http://oyster-river-protocol.rtfd.org/.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Daniel Stribling ◽  
Peter L. Chang ◽  
Justin E. Dalton ◽  
Christopher A. Conow ◽  
Malcolm Rosenthal ◽  
...  

Abstract Objectives Arachnids have fascinating and unique biology, particularly for questions on sex differences and behavior, creating the potential for development of powerful emerging models in this group. Recent advances in genomic techniques have paved the way for a significant increase in the breadth of genomic studies in non-model organisms. One growing area of research is comparative transcriptomics. When phylogenetic relationships to model organisms are known, comparative genomic studies provide context for analysis of homologous genes and pathways. The goal of this study was to lay the groundwork for comparative transcriptomics of sex differences in the brain of wolf spiders, a non-model organism of the pyhlum Euarthropoda, by generating transcriptomes and analyzing gene expression. Data description To examine sex-differential gene expression, short read transcript sequencing and de novo transcriptome assembly were performed. Messenger RNA was isolated from brain tissue of male and female subadult and mature wolf spiders (Schizocosa ocreata). The raw data consist of sequences for the two different life stages in each sex. Computational analyses on these data include de novo transcriptome assembly and differential expression analyses. Sample-specific and combined transcriptomes, gene annotations, and differential expression results are described in this data note and are available from publicly-available databases.


BMC Genomics ◽  
2017 ◽  
Vol 18 (S4) ◽  
Author(s):  
Sing-Hoi Sze ◽  
Meaghan L. Pimsler ◽  
Jeffery K. Tomberlin ◽  
Corbin D. Jones ◽  
Aaron M. Tarone

2019 ◽  
Vol 20 (S25) ◽  
Author(s):  
Jin Zhao ◽  
Haodi Feng ◽  
Daming Zhu ◽  
Chi Zhang ◽  
Ying Xu

Abstract Background Alternative splicing allows the pre-mRNAs of a gene to be spliced into various mRNAs, which greatly increases the diversity of proteins. High-throughput sequencing of mRNAs has revolutionized our ability for transcripts reconstruction. However, the massive size of short reads makes de novo transcripts assembly an algorithmic challenge. Results We develop a novel radical framework, called DTA-SiST, for de novo transcriptome assembly based on suffix trees. DTA-SiST first extends contigs by reads that have the longest overlaps with the contigs’ terminuses. These reads can be found in linear time of the lengths of the reads through a well-designed suffix tree structure. Then, DTA-SiST constructs splicing graphs based on contigs for each gene locus. Finally, DTA-SiST proposes two strategies to extract transcript-representing paths: a depth-first enumeration strategy and a hybrid strategy based on length and coverage. We implemented the above two strategies and compared them with the state-of-the-art de novo assemblers on both simulated and real datasets. Experimental results showed that the depth-first enumeration strategy performs always better with recall and also better with precision for smaller datasets while the hybrid strategy leads with precision for big datasets. Conclusions DTA-SiST performs more competitive than the other compared de novo assemblers especially with precision measure, due to the read-based contig extension strategy and the elegant transcripts extraction rules.


2021 ◽  
Vol 22 (13) ◽  
pp. 6674
Author(s):  
Luisa Albarano ◽  
Valerio Zupo ◽  
Davide Caramiello ◽  
Maria Toscanesi ◽  
Marco Trifuoggi ◽  
...  

Sediment pollution is a major issue in coastal areas, potentially endangering human health and the marine environments. We investigated the short-term sublethal effects of sediments contaminated with polycyclic aromatic hydrocarbons (PAHs) and polychlorinated biphenyls (PCBs) on the sea urchin Paracentrotus lividus for two months. Spiking occurred at concentrations below threshold limit values permitted by the law (TLVPAHs = 900 µg/L, TLVPCBs = 8 µg/L, Legislative Italian Decree 173/2016). A multi-endpoint approach was adopted, considering both adults (mortality, bioaccumulation and gonadal index) and embryos (embryotoxicity, genotoxicity and de novo transcriptome assembly). The slight concentrations of PAHs and PCBs added to the mesocosms were observed to readily compartmentalize in adults, resulting below the detection limits just one week after their addition. Reconstructed sediment and seawater, as negative controls, did not affect sea urchins. PAH- and PCB-spiked mesocosms were observed to impair P. lividus at various endpoints, including bioaccumulation and embryo development (mainly PAHs) and genotoxicity (PAHs and PCBs). In particular, genotoxicity tests revealed that PAHs and PCBs affected the development of P. lividus embryos deriving from exposed adults. Negative effects were also detected by generating a de novo transcriptome assembly and its annotation, as well as by real-time qPCR performed to identify genes differentially expressed in adults exposed to the two contaminants. The effects on sea urchins (both adults and embryos) at background concentrations of PAHs and PCBs below TLV suggest a need for further investigations on the impact of slight concentrations of such contaminants on marine biota.


Sign in / Sign up

Export Citation Format

Share Document