Faculty Opinions recommendation of Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Author(s):  
Steven Salzberg ◽  
Michael Schatz
2011 ◽  
Vol 29 (7) ◽  
pp. 644-652 ◽  
Author(s):  
Manfred G Grabherr ◽  
Brian J Haas ◽  
Moran Yassour ◽  
Joshua Z Levin ◽  
Dawn A Thompson ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Lixing Huang ◽  
Ying Qiao ◽  
Wei Xu ◽  
Linfeng Gong ◽  
Rongchao He ◽  
...  

Fish is considered as a supreme model for clarifying the evolution and regulatory mechanism of vertebrate immunity. However, the knowledge of distinct immune cell populations in fish is still limited, and further development of techniques advancing the identification of fish immune cell populations and their functions are required. Single cell RNA-seq (scRNA-seq) has provided a new approach for effective in-depth identification and characterization of cell subpopulations. Current approaches for scRNA-seq data analysis usually rely on comparison with a reference genome and hence are not suited for samples without any reference genome, which is currently very common in fish research. Here, we present an alternative, i.e. scRNA-seq data analysis with a full-length transcriptome as a reference, and evaluate this approach on samples from Epinephelus coioides-a teleost without any published genome. We show that it reconstructs well most of the present transcripts in the scRNA-seq data achieving a sensitivity equivalent to approaches relying on genome alignments of related species. Based on cell heterogeneity and known markers, we characterized four cell types: T cells, B cells, monocytes/macrophages (Mo/MΦ) and NCC (non-specific cytotoxic cells). Further analysis indicated the presence of two subsets of Mo/MΦ including M1 and M2 type, as well as four subsets in B cells, i.e. mature B cells, immature B cells, pre B cells and early-pre B cells. Our research will provide new clues for understanding biological characteristics, development and function of immune cell populations of teleost. Furthermore, our approach provides a reliable alternative for scRNA-seq data analysis in teleost for which no reference genome is currently available.


2021 ◽  
Author(s):  
Wenbin Guo ◽  
Max Coulter ◽  
Robbie Waugh ◽  
Runxuan Zhang

High quality transcriptome assembly using short reads from RNA-seq data still heavily relies upon reference-based approaches, of which the primary step is to align RNA-seq reads to a single reference genome of haploid sequence. However, it is increasingly apparent that while different genotypes within a species share core genes, they also contain variable numbers of specific genes that are only present a subset of individuals. Using a common reference may thus lead to a loss of genotype-specific information in the assembled transcript dataset and the generation of erroneous, incomplete or misleading transcriptomics analysis results. With the recent development of pan-genome information in many species, it is important that we understand the limitations of single genotype references for transcriptomics analysis. In this study, we quantitively evaluated the advantages of using genotype-specific reference genomes for transcriptome assembly and analysis using cultivated barley as a model. We mapped barley cultivar Barke RNA-seq reads to the Barke genome and to the cultivar Morex genome (common barley genome reference) to construct a genotype specific Reference Transcript Dataset (sRTD) and a common Reference Transcript Datasets (cRTD), respectively. We compared the two RTDs according to their transcript diversity, transcript sequence and structure similarity and the accuracy they provided for transcript quantification and differential expression analysis. Our evaluation shows that the sRTD has a significantly higher diversity of transcripts and alternative splicing events. Despite using a high-quality reference genome for assembly of the cRTD, we miss ca. 40% transcripts present in the sRTD and cRTD only has ca. 70% true assemblies. We found that the sRTD is more accurate for transcript quantification as well as differential expression and differential alternative splicing analysis. However, gene level quantification and comparative expression analysis are less affected by the source RTD, which indicates that analysing transcriptomic data at the gene level may be a reasonable compromise when a high-quality genotype-specific reference is not available.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Christophe Klopp ◽  
Cédric Cabau ◽  
Gonzalo Greif ◽  
André Lasalle ◽  
Santiago Di Landro ◽  
...  

Abstract Motivation: Siberian sturgeon is a long lived and late maturing fish farmed for caviar production in 50 countries. Functional genomics enable to find genes of interest for fish farming. In the absence of a reference genome, a reference transcriptome is very useful for sequencing based functional studies. Results: We present here a high-quality transcriptome assembly database built using RNA-seq reads coming from brain, pituitary, gonadal, liver, stomach, kidney, anterior kidney, heart, embryonic and pre-larval tissues. It will facilitate crucial research on topics such as puberty, reproduction, growth, food intake and immunology. This database represents a major contribution to the publicly available sturgeon transcriptome reference datasets. Availability: The database is publicly available at http://siberiansturgeontissuedb.sigenae.org Supplementary information:  Supplementary data are available at Database online.


2019 ◽  
Author(s):  
Jing Bing ◽  
Yunhe Ling ◽  
Peipei An ◽  
Enshi Xiao ◽  
Chunlian Li ◽  
...  

Abstract Background Silverleaf sunflower, Helianthus argophyllus , is one of the most important wild species that have been usually used for the improvement of cultivated sunflower. Although a reference genome is now available for the cultivated species, H. annuus , its effect in helping understanding the mechanisms underlying the traits of H. argophyllus is limited by the substantial genomic variance between these two species.Results In this study, we generated a high-quality reference transcriptome of H. argophyllus using Iso-seq strategy. This assembly contains 50,153 unique genes covering more than 91% of the whole genes. Among them, we find 205 genes that are absent in the cultivated species and 475 fusion genes containing components of coding or non-coding sequences from the genome of H. annuus . It is interesting that in line with the strong disease resistance observed for H. argophyllus , these H. argophyllus -specific genes are predominantly related to functions of resistance. We have also profiled the gene expressions in leaf and root under normal or salt stressed conditions and, as a result, find distinct transcriptomic responses to salt stress in leaf and root. Particularly, genes involved in several critical processes including the synthesis and metabolism of glutamate and carbohydrate transport are reversely regulated in leaf and root.Conclusions Overall, this study provided insights into the genomic mechanisms underlying the disease resistance and salt tolerance of silverleaf sunflower and the transcriptome assembly and the genes identified in this study can serve as a complement data resources for future research and breeding programs of sunflowers.


2018 ◽  
Author(s):  
Jesse Kerkvliet ◽  
Arthur de Fouchier ◽  
Michiel van Wijk ◽  
Astrid T. Groot

AbstractTranscriptome quality control is an important step in RNA-seq experiments. However, the quality of de novo assembled transcriptomes is difficult to assess, due to the lack of reference genome to compare the assembly to. We developed a method to assess and improve the quality of de novo assembled transcriptomes by focusing on the removal of chimeric sequences. These chimeric sequences can be the result of faulty assembled contigs, merging two transcripts into one. The developed method is incorporated into a pipeline, that we named Bellerophon, which is broadly applicable and easy to use. Bellerophon first uses the quality-assessment tool TransRate to indicate the quality, after which it uses a Transcripts Per Million (TPM) filter to remove lowly expressed contigs and CD-HIT-EST to remove highly identical contigs. To validate the quality of this method, we performed three benchmark experiments: 1) a computational creation of chimeras, 2) identification of chimeric contigs in a transcriptome assembly, 3) a simulated RNAseq experiment using a known reference transcriptome. Overall, the Bellerophon pipeline was able to remove between 40 to 91.9% of the chimeras in transcriptome assemblies and removed more chimeric than non-chimeric contigs. Thus, the Bellerophon sequence of filtration steps is a broadly applicable solution to improve transcriptome assemblies.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e2988 ◽  
Author(s):  
Cédric Cabau ◽  
Frédéric Escudié ◽  
Anis Djari ◽  
Yann Guiguen ◽  
Julien Bobe ◽  
...  

Background De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software packages are available to perform this task. Even if their results are of good quality it is still possible to improve them in several ways including redundancy reduction or error correction. Trinity and Oases are two commonly used de novo transcriptome assemblers. The contig sets they produce are of good quality. Still, their compaction (number of contigs needed to represent the transcriptome) and their quality (chimera and nucleotide error rates) can be improved. Results We built a de novo RNA-Seq Assembly Pipeline (DRAP) which wraps these two assemblers (Trinity and Oases) in order to improve their results regarding the above-mentioned criteria. DRAP reduces from 1.3 to 15 fold the number of resulting contigs of the assemblies depending on the read set and the assembler used. This article presents seven assembly comparisons showing in some cases drastic improvements when using DRAP. DRAP does not significantly impair assembly quality metrics such are read realignment rate or protein reconstruction counts. Conclusion Transcriptome assembly is a challenging computational task even if good solutions are already available to end-users, these solutions can still be improved while conserving the overall representation and quality of the assembly. The de novo RNA-Seq Assembly Pipeline (DRAP) is an easy to use software package to produce compact and corrected transcript set. DRAP is free, open-source and available under GPL V3 license at http://www.sigenae.org/drap.


2020 ◽  
Vol 21 (3) ◽  
pp. 1067 ◽  
Author(s):  
Zhaoyang Hu ◽  
Yufei Zhang ◽  
Yue He ◽  
Qingqing Cao ◽  
Ting Zhang ◽  
...  

Cadmium (Cd) is a toxic heavy metal element. It is relatively easily absorbed by plants and enters the food chain, resulting in human exposure to Cd. Italian ryegrass (Lolium multiflorum Lam.), an important forage cultivated widely in temperate regions worldwide, has the potential to be used in phytoremediation. However, genes regulating Cd translocation and accumulation in this species are not fully understood. Here, we optimized PacBio ISO-seq and integrated it with RNA-seq to construct a de novo full-length transcriptomic database for an un-sequenced autotetraploid species. With the database, we identified 2367 differentially expressed genes (DEGs) and profiled the molecular regulatory pathways of Italian ryegrass with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis in response to Cd stress. Overexpression of a DEG LmAUX1 in Arabidopsis thaliana significantly enhanced plant Cd concentration. We also unveiled the complexity of alternative splicing (AS) with a genome-free strategy. We reconstructed full-length UniTransModels using the reference transcriptome, and 29.76% of full-length models had more than one isoform. Taken together, the results enhanced our understanding of the genetic diversity and complexity of Italian ryegrass under Cd stress and provided valuable genetic resources for its gene identification and molecular breeding.


2015 ◽  
Vol 24 (1) ◽  
pp. 1-9 ◽  
Author(s):  
Kisun Pokharel ◽  
Jaana Peippo ◽  
Göran Andersson ◽  
Meng-Hua Li ◽  
Juha Kantanen

Finnsheep is one of the most prolific sheep breeds in the world. We sequenced RNA-Seq libraries from the ovaries of Finnsheep ewes collected during out of season breeding period at about 30X sequence coverage. A total of 86 966 348 and 105 587 994 reads from two samples were mapped against latest available ovine reference genome (Oarv3.1). The transcriptome assembly revealed 14 870 known ovine genes, including the 15 candidate genes for fertility and out-of-season breeding. In this study we successfully used our bioinformatics pipeline to assemble the first ovarian transcriptome of Finnsheep.


Sign in / Sign up

Export Citation Format

Share Document