scholarly journals Pincho: A Modular Approach to High Quality De Novo Transcriptomics

Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 953
Author(s):  
Randy Ortiz ◽  
Priyanka Gera ◽  
Christopher Rivera ◽  
Juan C. Santos

Transcriptomic reconstructions without reference (i.e., de novo) are common for data samples derived from non-model biological systems. These assemblies involve massive parallel short read sequence reconstructions from experiments, but they usually employ ad-hoc bioinformatic workflows that exhibit limited standardization and customization. The increasing number of transcriptome assembly software continues to provide little room for standardization which is exacerbated by the lack of studies on modularity that compare the effects of assembler synergy. We developed a customizable management workflow for de novo transcriptomics that includes modular units for short read cleaning, assembly, validation, annotation, and expression analysis by connecting twenty-five individual bioinformatic tools. With our software tool, we were able to compare the assessment scores based on 129 distinct single-, bi- and tri-assembler combinations with diverse k-mer size selections. Our results demonstrate a drastic increase in the quality of transcriptome assemblies with bi- and tri- assembler combinations. We aim for our software to improve de novo transcriptome reconstructions for the ever-growing landscape of RNA-seq data derived from non-model systems. We offer guidance to ensure the most complete transcriptomic reconstructions via the inclusion of modular multi-assembly software controlled from a single master console.

Author(s):  
T.I. Garcia ◽  
Y. Shen ◽  
J. Catchen ◽  
A. Amores ◽  
M. Schartl ◽  
...  

2021 ◽  
Author(s):  
Víctor García-Olivares ◽  
Adrián Muñoz-Barrera ◽  
José Miguel Lorenzo-Salazar ◽  
Carlos Zaragoza-Trello ◽  
Luis A. Rubio-Rodríguez ◽  
...  

AbstractThe mitochondrial genome (mtDNA) is of interest for a range of fields including evolutionary, forensic, and medical genetics. Human mitogenomes can be classified into evolutionary related haplogroups that provide ancestral information and pedigree relationships. Because of this and the advent of high-throughput sequencing (HTS) technology, there is a diversity of bioinformatic tools for haplogroup classification. We present a benchmarking of the 11 most salient tools for human mtDNA classification using empirical whole-genome (WGS) and whole-exome (WES) short-read sequencing data from 36 unrelated donors. Besides, because of its relevance, we also assess the best performing tool in third-generation long noisy read WGS data obtained with nanopore technology for a subset of the donors. We found that, for short-read WGS, most of the tools exhibit high accuracy for haplogroup classification irrespective of the input file used for the analysis. However, for short-read WES, Haplocheck and MixEmt were the most accurate tools. Based on the performance shown for WGS and WES, and the accompanying qualitative assessment, Haplocheck stands out as the most complete tool. For third-generation HTS data, we also showed that Haplocheck was able to accurately retrieve mtDNA haplogroups for all samples assessed, although only after following assembly-based approaches (either based on a referenced-based assembly or a hybrid de novo assembly). Taken together, our results provide guidance for researchers to select the most suitable tool to conduct the mtDNA analyses from HTS data.


2022 ◽  
Author(s):  
Karl Johan Westrin ◽  
Warren W Kretzschmar ◽  
Olof Emanuelsson

Motivation: Transcriptome assembly from RNA sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate reconstruction ability of transcript isoforms. This impedes the study of alternative splicing, in particular for lowly expressed isoforms. Result: We present the de novo transcript isoform assembler ClusTrast, which clusters a set of guiding contigs by similarity, aligns short reads to the guiding contigs, and assembles each clustered set of short reads individually. We tested ClusTrast on datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. An appreciable fraction were reconstructed to at least 95% of their length. We suggest that ClusTrast will be useful for studying alternative splicing in the absence of a reference genome. Availability and implementation: The code and usage instructions are available at https://github.com/karljohanw/clustrast.


2018 ◽  
Author(s):  
Jacob F. Warner ◽  
Vincent Guerlais ◽  
Aldine R. Amiel ◽  
Hereroa Johnston ◽  
Karine Nedoncelle ◽  
...  

AbstractFor more than a century researchers have been comparing embryogenesis and regeneration hoping that lessons learned from embryonic development will unlock hidden regenerative potential. This problem has historically been a difficult one to investigate since the best regenerative model systems are poor embryonic models and vice versa. Recently however, the comparison of embryogenesis and regeneration has seen renewed interest as emerging models including the sea anemone Nematostella vectensis have allowed researchers to investigate these processes in the same organism. This interest has been further fueled by the advent of high-throughput transcriptomic analyses that provide virtual mountains of data. Unfortunately much of this data remains in raw unanalyzed formats that are difficult to access or browse. Here we present NematostellavectensisEmbryogenesis and Regeneration Transcriptomics - NvERTx, the first platform for comparing gene expression during embryogenesis and regeneration. NvERTx is comprised of close to 50 RNAseq datasets spanning embryogenesis and regeneration in Nematostella. These data were used to perform a robust de novo transcriptome assembly which users can search, BLAST and plot expression of multiple genes during these two developmental processes. The site is also home to the results of gene clustering analyses, to further mine the data and identify groups of co-expressed genes. The site can be accessed at http://nvertx.kahikai.org.


2020 ◽  
Author(s):  
Prashant Vaidyanathan ◽  
Evan Appleton ◽  
David Tran ◽  
Alexander Vahid ◽  
George Church ◽  
...  

ABSTRACTMolecular biologists rely on the use of fluorescent probes to take measurements of their model systems. These fluorophores fall into various classes (e.g. fluorescent dyes, fluorescent proteins, etc.), but they all share some general properties (such as excitation and emission spectra, brightness) and require similar equipment for data acquisition. Selecting an ideal set of fluorophores for a particular measurement technology or vice versa is a multidimensional problem that is difficult to solve with ad hoc methods due to the enormous solution space of possible fluorophore panels. Choosing sub-optimal fluorophore panels can result in unreliable or erroneous measurements of biochemical properties in model systems. Here, we describe a set of algorithms, implemented in an open-source software tool, for solving these problems efficiently to arrive at fluorophore panels optimized for maximal signal and minimal bleed-through.


2011 ◽  
Vol 12 (Suppl 14) ◽  
pp. S2 ◽  
Author(s):  
Qiong-Yi Zhao ◽  
Yi Wang ◽  
Yi-Meng Kong ◽  
Da Luo ◽  
Xuan Li ◽  
...  

PLoS ONE ◽  
2018 ◽  
Vol 13 (12) ◽  
pp. e0208344 ◽  
Author(s):  
Sang-Ho Kang ◽  
Jong-Yeol Lee ◽  
Tae-Ho Lee ◽  
Soo-Yun Park ◽  
Chang-Kug Kim

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Prashant Vaidyanathan ◽  
Evan Appleton ◽  
David Tran ◽  
Alexander Vahid ◽  
George Church ◽  
...  

AbstractMolecular biologists rely on the use of fluorescent probes to take measurements of their model systems. These fluorophores fall into various classes (e.g. fluorescent dyes, fluorescent proteins, etc.), but they all share some general properties (such as excitation and emission spectra, brightness) and require similar equipment for data acquisition. Selecting an ideal set of fluorophores for a particular measurement technology or vice versa is a multidimensional problem that is difficult to solve with ad hoc methods due to the enormous solution space of possible fluorophore panels. Choosing sub-optimal fluorophore panels can result in unreliable or erroneous measurements of biochemical properties in model systems. Here, we describe a set of algorithms, implemented in an open-source software tool, for solving these problems efficiently to arrive at fluorophore panels optimized for maximal signal and minimal bleed-through.


Sign in / Sign up

Export Citation Format

Share Document