TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes

AbstractHigh-throughput RNA-seq data has become ubiquitous in the study of non-model organisms, but its use in comparative analysis remains a challenge. Without a reference genome for mapping, sequence data has to be de novo assembled, producing large numbers of short, highly redundant contigs. Preparing these assemblies for comparative analyses requires the removal of redundant isoforms, assignment of orthologs and converting fragmented transcripts into gene alignments. In this article we present Glutton, a novel tool to process transcriptome assemblies for downstream evolutionary analyses. Glutton takes as input a set of fragmented, possibly erroneous transcriptome assemblies. Utilising phylogeny-aware alignment and reference data from a closely related species, it reconstructs one transcript per gene, finds orthologous sequences and produces accurate multiple alignments of coding sequences. We present a comprehensive analysis of Glutton’s performance across a wide range of divergence times between study and reference species. We demonstrate the impact choice of assembler has on both the number of alignments and the correctness of ortholog assignment and show substantial improvements over heuristic methods, without sacrificing correctness. Finally, using inference of Darwinian selection as an example of downstream analysis, we show that Glutton-processed RNA-seq data give results comparable to those obtained from full length gene sequences even with distantly related reference species. Glutton is available from http://wasabiapp.org/software/glutton/ and is licensed under the GPLv3.

Download Full-text

nanotatoR: a tool for enhanced annotation of genomic structural variants

BMC Genomics ◽

10.1186/s12864-020-07182-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Surajit Bhattacharya ◽

Hayk Barseghyan ◽

Emmanuèle C. Délot ◽

Eric Vilain

Keyword(s):

De Novo ◽

Genome Mapping ◽

Gene List ◽

Sufficient Information ◽

Rna Seq ◽

Structural Variants ◽

De Novo Genome Assembly ◽

Pathogenic Variants ◽

Increased Sensitivity

Abstract Background Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. Results We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. Conclusions The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.

Download Full-text

Identification and Expression Analysis of the Genes Involved in the Raffinose Family Oligosaccharides Pathway of Phaseolus vulgaris and Glycine max

Plants ◽

10.3390/plants10071465 ◽

2021 ◽

Vol 10 (7) ◽

pp. 1465

Author(s):

Ramon de Koning ◽

Raphaël Kiekens ◽

Mary Esther Muyoka Toili ◽

Geert Angenon

Keyword(s):

Common Bean ◽

Seed Development ◽

Expression Analysis ◽

De Novo ◽

Expression Patterns ◽

Gene Families ◽

Rna Seq ◽

Raffinose Family Oligosaccharides ◽

Specific Expression ◽

Raffinose Synthase

Raffinose family oligosaccharides (RFO) play an important role in plants but are also considered to be antinutritional factors. A profound understanding of the galactinol and RFO biosynthetic gene families and the expression patterns of the individual genes is a prerequisite for the sustainable reduction of the RFO content in the seeds, without compromising normal plant development and functioning. In this paper, an overview of the annotation and genetic structure of all galactinol- and RFO biosynthesis genes is given for soybean and common bean. In common bean, three galactinol synthase genes, two raffinose synthase genes and one stachyose synthase gene were identified for the first time. To discover the expression patterns of these genes in different tissues, two expression atlases have been created through re-analysis of publicly available RNA-seq data. De novo expression analysis through an RNA-seq study during seed development of three varieties of common bean gave more insight into the expression patterns of these genes during the seed development. The results of the expression analysis suggest that different classes of galactinol- and RFO synthase genes have tissue-specific expression patterns in soybean and common bean. With the obtained knowledge, important galactinol- and RFO synthase genes that specifically play a key role in the accumulation of RFOs in the seeds are identified. These candidate genes may play a pivotal role in reducing the RFO content in the seeds of important legumes which could improve the nutritional quality of these beans and would solve the discomforts associated with their consumption.

Download Full-text

RNA-Seq reveals divergent gene expression between larvae with contrasting trophic modes in the poecilogonous polychaete Boccardia wellingtonensis

Scientific Reports ◽

10.1038/s41598-021-94646-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Álvaro Figueroa ◽

Antonio Brante ◽

Leyla Cárdenas

Keyword(s):

De Novo ◽

Juvenile Stage ◽

Type I ◽

Rna Seq ◽

Mrna Synthesis ◽

De Novo Transcriptome ◽

Larval Stages ◽

Divergent Gene ◽

Genetic Mechanisms ◽

Planktotrophic Larvae

AbstractThe polychaete Boccardia wellingtonensis is a poecilogonous species that produces different larval types. Females may lay Type I capsules, in which only planktotrophic larvae are present, or Type III capsules that contain planktotrophic and adelphophagic larvae as well as nurse eggs. While planktotrophic larvae do not feed during encapsulation, adelphophagic larvae develop by feeding on nurse eggs and on other larvae inside the capsules and hatch at the juvenile stage. Previous works have not found differences in the morphology between the two larval types; thus, the factors explaining contrasting feeding abilities in larvae of this species are still unknown. In this paper, we use a transcriptomic approach to study the cellular and genetic mechanisms underlying the different larval trophic modes of B. wellingtonensis. By using approximately 624 million high-quality reads, we assemble the de novo transcriptome with 133,314 contigs, coding 32,390 putative proteins. We identify 5221 genes that are up-regulated in larval stages compared to their expression in adult individuals. The genetic expression profile differed between larval trophic modes, with genes involved in lipid metabolism and chaetogenesis over expressed in planktotrophic larvae. In contrast, up-regulated genes in adelphophagic larvae were associated with DNA replication and mRNA synthesis.

Download Full-text

De novo RNA-Seq based transcriptome analysis of Papiliotrema laurentii strain RY1 under nitrogen starvation

Gene ◽

10.1016/j.gene.2017.12.014 ◽

2018 ◽

Vol 645 ◽

pp. 146-156 ◽

Cited By ~ 5

Author(s):

Soumyadev Sarkar ◽

Somnath Chakravorty ◽

Avishek Mukherjee ◽

Debanjana Bhattacharya ◽

Semantee Bhattacharya ◽

...

Keyword(s):

Transcriptome Analysis ◽

Nitrogen Starvation ◽

De Novo ◽

Rna Seq

Download Full-text

De Novo Transcriptome Analysis of Medicinally Important Plantago ovata Using RNA-Seq

PLoS ONE ◽

10.1371/journal.pone.0150273 ◽

2016 ◽

Vol 11 (3) ◽

pp. e0150273 ◽

Cited By ~ 14

Author(s):

Shivanjali Kotwal ◽

Sanjana Kaul ◽

Pooja Sharma ◽

Mehak Gupta ◽

Rama Shankar ◽

...

Keyword(s):

Transcriptome Analysis ◽

De Novo ◽

Rna Seq ◽

Plantago Ovata ◽

De Novo Transcriptome

Download Full-text

A web server for comparative analysis of single-cell RNA-seq data

Nature Communications ◽

10.1038/s41467-018-07165-2 ◽

2018 ◽

Vol 9 (1) ◽

Cited By ~ 19

Author(s):

Amir Alavi ◽

Matthew Ruffalo ◽

Aiyappa Parvangada ◽

Zhilin Huang ◽

Ziv Bar-Joseph

Keyword(s):

Comparative Analysis ◽

Single Cell ◽

Web Server ◽

Rna Seq

Download Full-text

Low-Bias RNA Sequencing of the HIV-2 Genome from Blood Plasma

Journal of Virology ◽

10.1128/jvi.00677-18 ◽

2018 ◽

Vol 93 (1) ◽

Cited By ~ 3

Author(s):

Katherine L. James ◽

Thushan I. de Silva ◽

Katherine Brown ◽

Hilton Whittle ◽

Stephen Taylor ◽

...

Keyword(s):

Genetic Diversity ◽

Blood Plasma ◽

De Novo ◽

A Priori ◽

Pcr Amplification ◽

Hiv Vaccine ◽

Whole Genome ◽

Plasma Samples ◽

Target Enrichment ◽

Rna Seq

ABSTRACTAccurate determination of the genetic diversity present in the HIV quasispecies is critical for the development of a preventative vaccine: in particular, little is known about viral genetic diversity for the second type of HIV, HIV-2. A better understanding of HIV-2 biology is relevant to the HIV vaccine field because a substantial proportion of infected people experience long-term viral control, and prior HIV-2 infection has been associated with slower HIV-1 disease progression in coinfected subjects. The majority of traditional and next-generation sequencing methods have relied on target amplification prior to sequencing, introducing biases that may obscure the true signals of diversity in the viral population. Additionally, target enrichment through PCR requiresa priorisequence knowledge, which is lacking for HIV-2. Therefore, a target enrichment free method of library preparation would be valuable for the field. We applied an RNA shotgun sequencing (RNA-Seq) method without PCR amplification to cultured viral stocks and patient plasma samples from HIV-2-infected individuals. Libraries generated from total plasma RNA were analyzed with a two-step pipeline: (i)de novogenome assembly, followed by (ii) read remapping. By this approach, whole-genome sequences were generated with a 28× to 67× mean depth of coverage. Assembled reads showed a low level of GC bias, and comparison of the genome diversities at the intrahost level showed low diversity in the accessory genevpxin all patients. Our study demonstrates that RNA-Seq is a feasible full-genomede novosequencing method for blood plasma samples collected from HIV-2-infected individuals.IMPORTANCEAn accurate picture of viral genetic diversity is critical for the development of a globally effective HIV vaccine. However, sequencing strategies are often complicated by target enrichment prior to sequencing, introducing biases that can distort variant frequencies, which are not easily corrected for in downstream analyses. Additionally, detaileda priorisequence knowledge is needed to inform robust primer design when employing PCR amplification, a factor that is often lacking when working with tropical diseases localized in developing countries. Previous work has demonstrated that direct RNA shotgun sequencing (RNA-Seq) can be used to circumvent these issues for hepatitis C virus (HCV) and norovirus. We applied RNA-Seq to total RNA extracted from HIV-2 blood plasma samples, demonstrating the applicability of this technique to HIV-2 and allowing us to generate a dynamic picture of genetic diversity over the whole genome of HIV-2 in the context of low-bias sequencing.

Download Full-text