scholarly journals JAFFA: High sensitivity transcriptome-focused fusion gene detection.

2015 ◽  
Author(s):  
Nadia M Davidson ◽  
Ian J Majewski ◽  
Alicia Oshlack

Genomic instability is a hallmark of cancer and, as such, structural alterations and fusion genes are common events in the cancer landscape. RNA sequencing (RNA-Seq) is a powerful method for profiling cancers, but current methods for identifying fusion genes are optimized for short reads. JAFFA (https://code.google.com/p/jaffa-project/) is a sensitive fusion detection method that clearly out-performs other methods with reads of 100bp or greater. JAFFA compares a cancer transcriptome to the reference transcriptome, rather than the genome, where the cancer transcriptome is inferred using long reads directly or by de novo assembling short reads.

2019 ◽  
Vol 17 (03) ◽  
pp. 1940008 ◽  
Author(s):  
Yoshiaki Sota ◽  
Shigeto Seno ◽  
Hironori Shigeta ◽  
Naoki Osato ◽  
Masafumi Shimoda ◽  
...  

Fusion genes are involved in cancer, and their detection using RNA-Seq is insufficient given the relatively short reading length. Therefore, we proposed a shifted short-read clustering (SSC) method, which focuses on overlapping reads from the same loci and extends them as a representative sequence. To verify their usefulness, we applied the SSC method to RNA-Seq data from four types of cell lines (BT-474, MCF-7, SKBR-3, and T-47D). As the slide width of the SSC method increased to one, two, five, or ten bases, the read length was extended from 201 bases to 217 (108%), 234 (116%), 282 (140%), or 317 (158%) bases, respectively. Furthermore, fusion genes were investigated using STAR-Fusion, a fusion gene detection tool, with and without the SSC method. When one base was shifted by the SSC method, the reads mapped to multiple loci decreased from 9.7% to 4.6%, and the sensitivity of the fusion gene was improved from 47% to 54% on average (BT-474: from 48% to 57%, MCF-7: 49% to 53%, SKBR-3: 50% to 57%, and T-47D: 43% to 50%) compared with original data. When the reads are shifted more, the positive predictive value was also improved. The SSC method could be an effective method for fusion gene detection.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7071
Author(s):  
Jakub Hynst ◽  
Karla Plevova ◽  
Lenka Radova ◽  
Vojtech Bystry ◽  
Karol Pal ◽  
...  

Background Extensive genome rearrangements, known as chromothripsis, have been recently identified in several cancer types. Chromothripsis leads to complex structural variants (cSVs) causing aberrant gene expression and the formation of de novo fusion genes, which can trigger cancer development, or worsen its clinical course. The functional impact of cSVs can be studied at the RNA level using whole transcriptome sequencing (total RNA-Seq). It represents a powerful tool for discovering, profiling, and quantifying changes of gene expression in the overall genomic context. However, bioinformatic analysis of transcriptomic data, especially in cases with cSVs, is a complex and challenging task, and the development of proper bioinformatic tools for transcriptome studies is necessary. Methods We designed a bioinformatic workflow for the analysis of total RNA-Seq data consisting of two separate parts (pipelines): The first pipeline incorporates a statistical solution for differential gene expression analysis in a biologically heterogeneous sample set. We utilized results from transcriptomic arrays which were carried out in parallel to increase the precision of the analysis. The second pipeline is used for the identification of de novo fusion genes. Special attention was given to the filtering of false positives (FPs), which was achieved through consensus fusion calling with several fusion gene callers. We applied the workflow to the data obtained from ten patients with chronic lymphocytic leukemia (CLL) to describe the consequences of their cSVs in detail. The fusion genes identified by our pipeline were correlated with genomic break-points detected by genomic arrays. Results We set up a novel solution for differential gene expression analysis of individual samples and de novo fusion gene detection from total RNA-Seq data. The results of the differential gene expression analysis were concordant with results obtained by transcriptomic arrays, which demonstrates the analytical capabilities of our method. We also showed that the consensus fusion gene detection approach was able to identify true positives (TPs) efficiently. Detected coordinates of fusion gene junctions were in concordance with genomic breakpoints assessed using genomic arrays. Discussion Byapplying our methods to real clinical samples, we proved that our approach for total RNA-Seq data analysis generates results consistent with other genomic analytical techniques. The data obtained by our analyses provided clues for the study of the biological consequences of cSVs with far-reaching implications for clinical outcome and management of cancer patients. The bioinformatic workflow is also widely applicable for addressing other research questions in different contexts, for which transcriptomic data are generated.


2017 ◽  
Author(s):  
Zhiqin Huang ◽  
David T.W. Jones ◽  
Yonghe Wu ◽  
Peter Lichter ◽  
Marc Zapatka

ABSTRACTBackgroundFusion genes play an important role in the tumorigenesis of many cancers. Next-generation sequencing (NGS) technologies have been successfully applied in fusion gene detection for the last several years, and a number of NGS-based tools have been developed for identifying fusion genes during this period. Most fusion gene detection tools based on RNA-seq data report a large number of candidates (mostly false positives), making it hard to prioritize candidates for experimental validation and further analysis. Selection of reliable fusion genes for downstream analysis becomes very important in cancer research. We therefore developed confFuse, a scoring algorithm to reliably select high-confidence fusion genes which are likely to be biologically relevant.ResultsConfFuse takes multiple parameters into account in order to assign each fusion candidate a confidence score, of which score ≥8 indicates high-confidence fusion gene predictions. These parameters were manually curated based on our experience and on certain structural motifs of fusion genes. Compared with alternative tools, based on 96 published RNA-seq samples from different tumor entities, our method can significantly reduce the number of fusion candidates (301 high-confidence from 8,083 total predicted fusion genes) and keep high detection accuracy (recovery rate 85.7%). Validation of 18 novel, high-confidence fusions detected in three breast tumor samples resulted in a 100% validation rate.ConclusionsConfFuse is a novel downstream filtering method that allows selection of highly reliable fusion gene candidates for further downstream analysis and experimental validations. confFuse is available at https://github.com/Zhiqin-HUANG/confFuse.


Haematologica ◽  
2021 ◽  
Author(s):  
Paul Kerbs ◽  
Sebastian Vosberg ◽  
Stefan Krebs ◽  
Alexander Graf ◽  
Helmut Blum ◽  
...  

Identification of fusion genes in clinical routine is mostly based on cytogenetics and targeted molecular genetics, such as metaphase karyotyping, FISH and RT-PCR. However, sequencing technologies are becoming more important in clinical routine as processing-time and costs per sample decrease. To evaluate the performance of fusion gene detection by RNA sequencing (RNAseq) compared to standard diagnostic techniques, we analyzed 806 RNA-seq samples from acute myeloid leukemia (AML) patients using two state-of-the-art software tools, namely Arriba and FusionCatcher. RNA-seq detected 90% of fusion events that were reported by routine with high evidence, while samples in which RNA-seq failed to detect fusion genes had overall lower and inhomogeneous sequence coverage. Based on properties of known and unknown fusion events, we developed a workflow with integrated filtering strategies for the identification of robust fusion gene candidates by RNA-seq. Thereby, we detected known recurrent fusion events in 26 cases that were not reported by routine and found discrepancies in evidence for known fusion events between routine and RNA-seq in three cases. Moreover, we identified 157 fusion genes as novel robust candidates and comparison to entries from ChimerDB or Mitelman Database showed novel recurrence of fusion genes in 14 cases. Finally, we detected the novel recurrent fusion gene NRIP1-MIR99AHG resulting from inv(21)(q11.2;q21.1) in nine patients (1.1%) and LTN1-MX1 resulting from inv(21)(q21.3;q22.3) in two patients (0.25%). We demonstrated that NRIP1-MIR99AHG results in overexpression of the 3' region of MIR99AHG and the disruption of the tricistronic miRNA cluster miR-99a/let-7c/miR-125b-2. Interestingly, upregulation of MIR99AHG and deregulation of the miRNA cluster, residing in the MIR99AHG locus, are known mechanism of leukemogenesis in acute megakaryoblastic leukemia. Our findings demonstrate that RNA-seq has a strong potential to improve the systematic detection of fusion genes in clinical applications and provides a valuable tool for fusion discovery.


2019 ◽  
Author(s):  
Christopher A. Hilker ◽  
Aditya V. Bhagwate ◽  
Jin Sung Jang ◽  
Jeffrey G Meyer ◽  
Asha A. Nair ◽  
...  

AbstractFormalin fixed paraffin embedded (FFPE) tissues are commonly used biospecimen for clinical diagnosis. However, RNA degradation is extensive when isolated from FFPE blocks making it challenging for whole transcriptome profiling (RNA-seq). Here, we examined RNA isolation methods, quality metrics, and the performance of RNA-seq using different approaches with RNA isolated from FFPE and fresh frozen (FF) tissues. We evaluated FFPE RNA extraction methods using six different tissues and five different methods. The reproducibility and quality of the prepared libraries from these RNAs were assessed by RNA-seq. We next examined the performance and reproducibility of RNA-seq for gene expression profiling with FFPE and FF samples using targeted (Kinome capture) and whole transcriptome capture based sequencing. Finally, we assessed Agilent SureSelect All-Exon V6+UTR capture and the Illumina TruSeq RNA Access protocols for their ability to detect known gene fusions in FFPE RNA samples. Although the overall yield of RNA varied among extraction methods, gene expression profiles generated by RNA-seq were highly correlated (>90%) when the input RNA was of sufficient quality (≥DV200 30%) and quantity (≥ 100 ng). Using gene capture, we observed a linear relationship between gene expression levels for shared genes that were captured using either All-Exon or Kinome kits. Gene expression correlations between the two capture-based approaches were similar using RNA from FFPE and FF samples. However, TruSeq RNA Access protocol provided significantly higher exon and junction reads when compared to the SureSelect All-Exon capture kit and was more sensitive for fusion gene detection. Our study established pre and post library construction QC parameters that are essential to reproducible RNA-seq profiling using FFPE samples. We show that gene capture based NGS sequencing is an efficient and highly reproducible strategy for gene expression measurements as well as fusion gene detection.


2020 ◽  
Vol 15 (1) ◽  
pp. 2-16
Author(s):  
Yuwen Luo ◽  
Xingyu Liao ◽  
Fang-Xiang Wu ◽  
Jianxin Wang

Transcriptome assembly plays a critical role in studying biological properties and examining the expression levels of genomes in specific cells. It is also the basis of many downstream analyses. With the increase of speed and the decrease in cost, massive sequencing data continues to accumulate. A large number of assembly strategies based on different computational methods and experiments have been developed. How to efficiently perform transcriptome assembly with high sensitivity and accuracy becomes a key issue. In this work, the issues with transcriptome assembly are explored based on different sequencing technologies. Specifically, transcriptome assemblies with next-generation sequencing reads are divided into reference-based assemblies and de novo assemblies. The examples of different species are used to illustrate that long reads produced by the third-generation sequencing technologies can cover fulllength transcripts without assemblies. In addition, different transcriptome assemblies using the Hybrid-seq methods and other tools are also summarized. Finally, we discuss the future directions of transcriptome assemblies.


2021 ◽  
Author(s):  
Ridvan Eksi ◽  
Daiyao Yi ◽  
Hongyang Li ◽  
Bradley Godfrey ◽  
Lisa R. Mathew ◽  
...  

AbstractStudying isoform expression at the microscopic level has always been a challenging task. A classical example is kidney, where glomerular and tubulo-insterstitial compartments carry out drastically different physiological functions and thus presumably their isoform expression also differs. We aim at developing an experimental and computational pipeline for identifying isoforms at microscopic structure-level. We microdissed glomerular and tubulo-interstitial compartments from healthy human kidney tissues from two cohorts. The two compartments were separately sequenced with the PacBio RS II platform. These transcripts were then validated using transcripts of the same samples by the traditional Illumina RNA-Seq protocol, distinct Illumina RNA-Seq short reads from European Renal cDNA Bank (ERCB) samples, and annotated GENCODE transcript list, thus identifying novel transcripts. We identified 14,739 and 14,259 annotated transcripts, and 17,268 and 13,118 potentially novel transcripts in the glomerular and tubulo-interstitial compartments, respectively. Of note, relying solely on either short or long reads would have resulted in many erroneous identifications. We identified distinct pathways involved in glomerular and tubulointerstitial compartments at the isoform level.We demonstrated the possibility of micro-dissecting a tissue, incorporating both long- and short-read sequencing to identify isoforms for each compartment.


2019 ◽  
Author(s):  
Yifan Yang ◽  
Michael Gribskov

AbstractRNA-Seq de novo assembly is an important method to generate transcriptomes for non-model organisms before any downstream analysis. Given many great de novo assembly methods developed by now, one critical issue is that there is no consensus on the evaluation of de novo assembly methods yet. Therefore, to set up a benchmark for evaluating the quality of de novo assemblies is very critical. Addressing this challenge will help us deepen the insights on the properties of different de novo assemblers and their evaluation methods, and provide hints on choosing the best assembly sets as transcriptomes of non-model organisms for the further functional analysis. In this article, we generate a “real time” transcriptome using PacBio long reads as a benchmark for evaluating five de novo assemblers and two model-based de novo assembly evaluation methods. By comparing the de novo assmblies generated by RNA-Seq short reads with the “real time” transcriptome from the same biological sample, we find that Trinity is best at the completeness by generating more assemblies than the alternative assemblers, but less continuous and having more misassemblies; Oases is best at the continuity and specificity, but less complete; The performance of SOAPdenovo-Trans, Trans-AByss and IDBA-Tran are in between of five assemblers. For evaluation methods, DETONATE leverages multiple aspects of the assembly set and ranks the assembly set with an average performance as the best, meanwhile the contig score can serve as a good metric to select assemblies with high completeness, specificity, continuity but not sensitive to misassemblies; TransRate contig score is useful for removing misassemblies, yet often the assemblies in the optimal set is too few to be used as a transcriptome.


2019 ◽  
Author(s):  
Christina Stangl ◽  
Sam de Blank ◽  
Ivo Renkens ◽  
Tamara Verbeek ◽  
Jose Espejo Valle-Inclan ◽  
...  

AbstractFusion genes are hallmarks of various cancer types and important determinants for diagnosis, prognosis and treatment possibilities. The promiscuity of fusion genes with respect to partner choice and exact breakpoint-positions restricts their detection in the diagnostic setting, even for known and recurrent fusion gene configurations. To accurately identify these gene fusions in an unbiased manner, we developed FUDGE: a FUsion gene Detection assay from Gene Enrichment. FUDGE couples target-selected and strand-specific CRISPR/Cas9 activity for enrichment and detection of fusion gene drivers (e.g. BRAF, EWSR1, KMT2A/MLL) - without prior knowledge of fusion partner or breakpoint-location - to long-read Nanopore sequencing. FUDGE encompasses a dedicated bioinformatics approach (NanoFG) to detect fusion genes from Nanopore sequencing data. Our strategy is flexible with respect to target choice and enables multiplexed enrichment for simultaneous analysis of several genes in multiple samples in a single sequencing run. We observe on average a 508 fold on-target enrichment and identify fusion breakpoints at nucleotide resolution - all within two days. We demonstrate that FUDGE effectively identifies fusion genes in cancer cell lines, tumor samples and on whole genome amplified DNA irrespective of partner gene or breakpoint-position in 100% of cases. Furthermore, we show that FUDGE is superior to routine diagnostic methods for fusion gene detection. In summary, we have developed a rapid and versatile fusion gene detection assay, providing an unparalleled opportunity for pan-cancer detection of fusion genes in routine diagnostics.


2019 ◽  
Author(s):  
Patrick Sorn ◽  
Christoph Holtsträter ◽  
Martin Löwer ◽  
Ugur Sahin ◽  
David Weber

Abstract Motivation Gene fusions are an important class of transcriptional variants that can influence cancer development and can be predicted from RNA sequencing (RNA-seq) data by multiple existing tools. However, the real-world performance of these tools is unclear due to the lack of known positive and negative events, especially with regard to fusion genes in individual samples. Often simulated reads are used, but these cannot account for all technical biases in RNA-seq data generated from real samples. Results Here, we present ArtiFuse, a novel approach that simulates fusion genes by sequence modification to the genomic reference, and therefore, can be applied to any RNA-seq dataset without the need for any simulated reads. We demonstrate our approach on eight RNA-seq datasets for three fusion gene prediction tools: average recall values peak for all three tools between 0.4 and 0.56 for high-quality and high-coverage datasets. As ArtiFuse affords total control over involved genes and breakpoint position, we also assessed performance with regard to gene-related properties, showing a drop-in recall value for low-expressed genes in high-coverage samples and genes with co-expressed paralogues. Overall tool performance assessed from ArtiFusions is lower compared to previously reported estimates on simulated reads. Due to the use of real RNA-seq datasets, we believe that ArtiFuse provides a more realistic benchmark that can be used to develop more accurate fusion gene prediction tools for application in clinical settings. Availability and implementation ArtiFuse is implemented in Python. The source code and documentation are available at https://github.com/TRON-Bioinformatics/ArtiFusion. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document