Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq

AbstractSummaryPossibility to generate large RNA-seq datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the model organisms with finished and annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing and paralogous genes. In this paper we describe a novel transcriptome assembler called rnaSPAdes, which is developed on top of SPAdes genome assembler and explores surprising computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-Seq datasets, and briefly highlight strong and weak points of different assemblers.Availability and implementationrnaSPAdes is implemented in C++ and Python and is freely available at cab.spbu.ru/software/rnaspades/.

Download Full-text

In Silico identification and annotation of non-coding RNAs by RNA-seq and De Novo assembly of the transcriptome of Tomato Fruits

PLoS ONE ◽

10.1371/journal.pone.0171504 ◽

2017 ◽

Vol 12 (2) ◽

pp. e0171504 ◽

Cited By ~ 14

Author(s):

Daria Scarano ◽

Rosa Rao ◽

Giandomenico Corrado

Keyword(s):

De Novo Assembly ◽

In Silico ◽

De Novo ◽

Rna Seq ◽

Tomato Fruits ◽

In Silico Identification ◽

Non Coding Rnas

Download Full-text

RNA-Seq De Novo Assembly of Clonal Immunoglobulin Rearrangements Identifies Interesting Biology and Uncovers Prognostic Features in Multiple Myeloma

Blood ◽

10.1182/blood.v128.22.195.195 ◽

2016 ◽

Vol 128 (22) ◽

pp. 195-195

Author(s):

David Mosen-Ansorena ◽

Rachael Bashford-Rogers ◽

Niccolo Bolli ◽

Stephane Minvielle ◽

Florence Magrangeas ◽

...

Keyword(s):

Multiple Myeloma ◽

Light Chain ◽

De Novo Assembly ◽

Poor Prognosis ◽

De Novo ◽

Clinical Course ◽

Rna Seq ◽

Prognostic Features ◽

Vh Gene ◽

Cluster A

Abstract Introduction Although monoclonal immunoglobulin (Ig) production by myeloma cells is one of the central features of the disease, genotypic identification of the clonal Ig sequence remains understudied in multiple myeloma (MM). Here, using extensive RNA-seq data, we study molecular features of clonal Ig rearrangements, as well as their association with other MM markers and patient outcome. Methods We performed deep RNA-seq on purified CD138+ MM cells from 429 newly-diagnosed uniformly-treated patients with long clinical follow-up. For each sample, we performed de novo assembly using sequences that appeared in the library with a frequency of at least one in a million. Germline V and J genes were then BLASTed against the assembled contigs to determine the clonal germline genes and pinpoint mutations. Using the sequences reconstructed from the Ig contigs and the BLAST output, we ran IgBLAST to fully characterize the predominant Ig V(D)J sequence. Results We tested the accuracy of our approach by looking at 24 technical duplicates and one triplicate. In all cases, the predicted gene and gene allele were consistent across replicates. Next, we evaluated our large patient cohort, identifying IGHV3 as the most common clonal VH gene subgroup (53.3%), followed by IGHV4 (17.8%) and IGHV1 (15.6%). Importantly, we observed a significant association between poorer prognosis and IGHV3, both for progression-free survival (PFS) (p=0.0019) and overall survival (OS) (p=0.012). IGHV3-30 (11%, the most commonly rearranged VH gene) and IGHV3-9 (4.8%) were the drivers behind this poor prognosis (IGHV3-30: PFS p=0.021; OS p=0.013) (IGHV3-9: PFS p=0.002). IGHV3-30 was even more preferentially rearranged than in normal B-cell VH repertoires from previous studies (8.5%, 6.3%) and ours (2%). Remarkably, these results sharply contrast with what has been observed in CLL. In this malignancy, IGHV3-30 use has been seen to be underrepresented and usually characterizes an indolent clinical course, while IGHV3-21 and possibly IGHV3-23 carry poor prognosis. We predicted light chain usage through the presence of clonal VL sequences. The most frequent VL genes were from the κ locus (69.4% total): IGKV1-33 (12.4%), IGKV1-5 (11.3%), IGKV3-20 (9.9%) and IGKV1-39 (8.0%). Del(22q) was observed more frequently in patients with IGλ (OR=10.0, p=6e-15) and, within this group, del(22q) was more frequent if Vλ belonged to the more centromeric V-clusters C or B, in contrast to cluster A (OR=8.4, p=5e-4). Remarkably, patients with Vλ gene from cluster A presented worse OS (vs. Vk: p=0.0079; vs. Vλ B,C: p=0.067). The proportion of mutated bases was higher in the heavy chain than in the light chain (mean 7.0% vs. 4.8%, max 14.6% vs. 14.3%), and it was associated with OS (heavy p=0.0020, light p=0.036, both=0.0056), but not PFS. Interestingly, mutated Ig in CLL results in a more benign clinical course. We further found that 24.9% and 22.7% of the mutations lay within WRCY or RGYW AID motifs in the light and heavy chains respectively (enrichment p<1e-16), while AID mutations in a TW or WA context accounted for 22.9% and 25.7% (p=0.14, p=0.64). Higher ratios of mutations in WRCY vs. RGYW motifs within the light chain were highly predictive of poor prognosis (PFS p=0.0019, OS p=6.3e-4). Strikingly, IGλ usage was linked to higher ratios (p=3e-6), an association not explained by germline sequence variability (p=0.24). The usage of IGHV3 genes and the AID WRCY/RGYW motif ratio were independent markers of each other (p=1) and of other markers of poor prognosis in MM, such as presence of either t(4;14) or del(17p) (IGHV3 p=0.10; motif ratio p=0.49). In conclusion, de novo Ig heavy and light chain assembly using RNA-seq identifies interesting biology, may provide MM markers and highlights a novel application of high-throughput genomics. Disclosures Anderson: OncoPep Inc.: Equity Ownership, Membership on an entity's Board of Directors or advisory committees. Avet-Loiseau:sanofi: Consultancy; celgene: Consultancy; amgen: Consultancy; janssen: Consultancy.

Download Full-text