confFuse: high-confidence fusion gene detection across tumor entities

ABSTRACTBackgroundFusion genes play an important role in the tumorigenesis of many cancers. Next-generation sequencing (NGS) technologies have been successfully applied in fusion gene detection for the last several years, and a number of NGS-based tools have been developed for identifying fusion genes during this period. Most fusion gene detection tools based on RNA-seq data report a large number of candidates (mostly false positives), making it hard to prioritize candidates for experimental validation and further analysis. Selection of reliable fusion genes for downstream analysis becomes very important in cancer research. We therefore developed confFuse, a scoring algorithm to reliably select high-confidence fusion genes which are likely to be biologically relevant.ResultsConfFuse takes multiple parameters into account in order to assign each fusion candidate a confidence score, of which score ≥8 indicates high-confidence fusion gene predictions. These parameters were manually curated based on our experience and on certain structural motifs of fusion genes. Compared with alternative tools, based on 96 published RNA-seq samples from different tumor entities, our method can significantly reduce the number of fusion candidates (301 high-confidence from 8,083 total predicted fusion genes) and keep high detection accuracy (recovery rate 85.7%). Validation of 18 novel, high-confidence fusions detected in three breast tumor samples resulted in a 100% validation rate.ConclusionsConfFuse is a novel downstream filtering method that allows selection of highly reliable fusion gene candidates for further downstream analysis and experimental validations. confFuse is available at https://github.com/Zhiqin-HUANG/confFuse.

Download Full-text

confFuse: High-Confidence Fusion Gene Detection across Tumor Entities

Frontiers in Genetics ◽

10.3389/fgene.2017.00137 ◽

2017 ◽

Vol 8 ◽

Cited By ~ 7

Author(s):

Zhiqin Huang ◽

David T. W. Jones ◽

Yonghe Wu ◽

Peter Lichter ◽

Marc Zapatka

Keyword(s):

Fusion Gene ◽

Gene Detection ◽

High Confidence ◽

Tumor Entities

Download Full-text

Improvement of detection performance of fusion genes from RNA-seq data by clustering short reads

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019400080 ◽

2019 ◽

Vol 17 (03) ◽

pp. 1940008 ◽

Cited By ~ 1

Author(s):

Yoshiaki Sota ◽

Shigeto Seno ◽

Hironori Shigeta ◽

Naoki Osato ◽

Masafumi Shimoda ◽

...

Keyword(s):

Fusion Gene ◽

Original Data ◽

Read Length ◽

Fusion Genes ◽

Rna Seq ◽

Gene Detection ◽

Representative Sequence ◽

Multiple Loci ◽

Detection Tool ◽

Mcf 7

Fusion genes are involved in cancer, and their detection using RNA-Seq is insufficient given the relatively short reading length. Therefore, we proposed a shifted short-read clustering (SSC) method, which focuses on overlapping reads from the same loci and extends them as a representative sequence. To verify their usefulness, we applied the SSC method to RNA-Seq data from four types of cell lines (BT-474, MCF-7, SKBR-3, and T-47D). As the slide width of the SSC method increased to one, two, five, or ten bases, the read length was extended from 201 bases to 217 (108%), 234 (116%), 282 (140%), or 317 (158%) bases, respectively. Furthermore, fusion genes were investigated using STAR-Fusion, a fusion gene detection tool, with and without the SSC method. When one base was shifted by the SSC method, the reads mapped to multiple loci decreased from 9.7% to 4.6%, and the sensitivity of the fusion gene was improved from 47% to 54% on average (BT-474: from 48% to 57%, MCF-7: 49% to 53%, SKBR-3: 50% to 57%, and T-47D: 43% to 50%) compared with original data. When the reads are shifted more, the positive predictive value was also improved. The SSC method could be an effective method for fusion gene detection.

Download Full-text

JAFFA: High sensitivity transcriptome-focused fusion gene detection.

10.1101/013698 ◽

2015 ◽

Author(s):

Nadia M Davidson ◽

Ian J Majewski ◽

Alicia Oshlack

Keyword(s):

De Novo ◽

Fusion Gene ◽

High Sensitivity ◽

Fusion Genes ◽

Rna Seq ◽

Gene Detection ◽

Short Reads ◽

Long Reads ◽

Cancer Transcriptome ◽

De Novo Assembling

Genomic instability is a hallmark of cancer and, as such, structural alterations and fusion genes are common events in the cancer landscape. RNA sequencing (RNA-Seq) is a powerful method for profiling cancers, but current methods for identifying fusion genes are optimized for short reads. JAFFA (https://code.google.com/p/jaffa-project/) is a sensitive fusion detection method that clearly out-performs other methods with reads of 100bp or greater. JAFFA compares a cancer transcriptome to the reference transcriptome, rather than the genome, where the cancer transcriptome is inferred using long reads directly or by de novo assembling short reads.

Download Full-text

Fusion gene detection by RNA sequencing complements diagnostics of acute myeloid leukemia and identifies recurring NRIP1-MIR99AHG rearrangements

Haematologica ◽

10.3324/haematol.2021.278436 ◽

2021 ◽

Author(s):

Paul Kerbs ◽

Sebastian Vosberg ◽

Stefan Krebs ◽

Alexander Graf ◽

Helmut Blum ◽

...

Keyword(s):

Acute Myeloid Leukemia ◽

Rna Sequencing ◽

Myeloid Leukemia ◽

Fusion Gene ◽

Mirna Cluster ◽

Fusion Genes ◽

Rna Seq ◽

Gene Detection ◽

Clinical Routine ◽

Acute Myeloid

Identification of fusion genes in clinical routine is mostly based on cytogenetics and targeted molecular genetics, such as metaphase karyotyping, FISH and RT-PCR. However, sequencing technologies are becoming more important in clinical routine as processing-time and costs per sample decrease. To evaluate the performance of fusion gene detection by RNA sequencing (RNAseq) compared to standard diagnostic techniques, we analyzed 806 RNA-seq samples from acute myeloid leukemia (AML) patients using two state-of-the-art software tools, namely Arriba and FusionCatcher. RNA-seq detected 90% of fusion events that were reported by routine with high evidence, while samples in which RNA-seq failed to detect fusion genes had overall lower and inhomogeneous sequence coverage. Based on properties of known and unknown fusion events, we developed a workflow with integrated filtering strategies for the identification of robust fusion gene candidates by RNA-seq. Thereby, we detected known recurrent fusion events in 26 cases that were not reported by routine and found discrepancies in evidence for known fusion events between routine and RNA-seq in three cases. Moreover, we identified 157 fusion genes as novel robust candidates and comparison to entries from ChimerDB or Mitelman Database showed novel recurrence of fusion genes in 14 cases. Finally, we detected the novel recurrent fusion gene NRIP1-MIR99AHG resulting from inv(21)(q11.2;q21.1) in nine patients (1.1%) and LTN1-MX1 resulting from inv(21)(q21.3;q22.3) in two patients (0.25%). We demonstrated that NRIP1-MIR99AHG results in overexpression of the 3' region of MIR99AHG and the disruption of the tricistronic miRNA cluster miR-99a/let-7c/miR-125b-2. Interestingly, upregulation of MIR99AHG and deregulation of the miRNA cluster, residing in the MIR99AHG locus, are known mechanism of leukemogenesis in acute megakaryoblastic leukemia. Our findings demonstrate that RNA-seq has a strong potential to improve the systematic detection of fusion genes in clinical applications and provides a valuable tool for fusion discovery.

Download Full-text

Impact of RNA Extraction and Target Capture Methods on RNA Sequencing Using Formalin-Fixed, Paraffin Embedded Tissues

10.1101/656736 ◽

2019 ◽

Author(s):

Christopher A. Hilker ◽

Aditya V. Bhagwate ◽

Jin Sung Jang ◽

Jeffrey G Meyer ◽

Asha A. Nair ◽

...

Keyword(s):

Gene Expression ◽

Fusion Gene ◽

Rna Extraction ◽

Extraction Methods ◽

Rna Seq ◽

Gene Detection ◽

Formalin Fixed Paraffin ◽

Formalin Fixed Paraffin Embedded ◽

Whole Transcriptome ◽

Formalin Fixed

AbstractFormalin fixed paraffin embedded (FFPE) tissues are commonly used biospecimen for clinical diagnosis. However, RNA degradation is extensive when isolated from FFPE blocks making it challenging for whole transcriptome profiling (RNA-seq). Here, we examined RNA isolation methods, quality metrics, and the performance of RNA-seq using different approaches with RNA isolated from FFPE and fresh frozen (FF) tissues. We evaluated FFPE RNA extraction methods using six different tissues and five different methods. The reproducibility and quality of the prepared libraries from these RNAs were assessed by RNA-seq. We next examined the performance and reproducibility of RNA-seq for gene expression profiling with FFPE and FF samples using targeted (Kinome capture) and whole transcriptome capture based sequencing. Finally, we assessed Agilent SureSelect All-Exon V6+UTR capture and the Illumina TruSeq RNA Access protocols for their ability to detect known gene fusions in FFPE RNA samples. Although the overall yield of RNA varied among extraction methods, gene expression profiles generated by RNA-seq were highly correlated (>90%) when the input RNA was of sufficient quality (≥DV200 30%) and quantity (≥ 100 ng). Using gene capture, we observed a linear relationship between gene expression levels for shared genes that were captured using either All-Exon or Kinome kits. Gene expression correlations between the two capture-based approaches were similar using RNA from FFPE and FF samples. However, TruSeq RNA Access protocol provided significantly higher exon and junction reads when compared to the SureSelect All-Exon capture kit and was more sensitive for fusion gene detection. Our study established pre and post library construction QC parameters that are essential to reproducible RNA-seq profiling using FFPE samples. We show that gene capture based NGS sequencing is an efficient and highly reproducible strategy for gene expression measurements as well as fusion gene detection.

Download Full-text

Partner-independent fusion gene detection by multiplexed CRISPR/Cas9 enrichment and long-read Nanopore sequencing

10.1101/807545 ◽

2019 ◽

Cited By ~ 1

Author(s):

Christina Stangl ◽

Sam de Blank ◽

Ivo Renkens ◽

Tamara Verbeek ◽

Jose Espejo Valle-Inclan ◽

...

Keyword(s):

Fusion Gene ◽

Diagnostic Methods ◽

Partner Choice ◽

Fusion Partner ◽

Nanopore Sequencing ◽

Fusion Genes ◽

Sequencing Data ◽

Gene Detection ◽

Detection Assay ◽

Long Read

AbstractFusion genes are hallmarks of various cancer types and important determinants for diagnosis, prognosis and treatment possibilities. The promiscuity of fusion genes with respect to partner choice and exact breakpoint-positions restricts their detection in the diagnostic setting, even for known and recurrent fusion gene configurations. To accurately identify these gene fusions in an unbiased manner, we developed FUDGE: a FUsion gene Detection assay from Gene Enrichment. FUDGE couples target-selected and strand-specific CRISPR/Cas9 activity for enrichment and detection of fusion gene drivers (e.g. BRAF, EWSR1, KMT2A/MLL) - without prior knowledge of fusion partner or breakpoint-location - to long-read Nanopore sequencing. FUDGE encompasses a dedicated bioinformatics approach (NanoFG) to detect fusion genes from Nanopore sequencing data. Our strategy is flexible with respect to target choice and enables multiplexed enrichment for simultaneous analysis of several genes in multiple samples in a single sequencing run. We observe on average a 508 fold on-target enrichment and identify fusion breakpoints at nucleotide resolution - all within two days. We demonstrate that FUDGE effectively identifies fusion genes in cancer cell lines, tumor samples and on whole genome amplified DNA irrespective of partner gene or breakpoint-position in 100% of cases. Furthermore, we show that FUDGE is superior to routine diagnostic methods for fusion gene detection. In summary, we have developed a rapid and versatile fusion gene detection assay, providing an unparalleled opportunity for pan-cancer detection of fusion genes in routine diagnostics.

Download Full-text

ArtiFuse—computational validation of fusion gene detection tools without relying on simulated reads

Bioinformatics ◽

10.1093/bioinformatics/btz613 ◽

2019 ◽

Author(s):

Patrick Sorn ◽

Christoph Holtsträter ◽

Martin Löwer ◽

Ugur Sahin ◽

David Weber

Keyword(s):

Fusion Gene ◽

Gene Prediction ◽

Supplementary Information ◽

Fusion Genes ◽

Rna Seq ◽

High Coverage ◽

Prediction Tools ◽

Novel Approach ◽

Tool Performance ◽

Transcriptional Variants

Abstract Motivation Gene fusions are an important class of transcriptional variants that can influence cancer development and can be predicted from RNA sequencing (RNA-seq) data by multiple existing tools. However, the real-world performance of these tools is unclear due to the lack of known positive and negative events, especially with regard to fusion genes in individual samples. Often simulated reads are used, but these cannot account for all technical biases in RNA-seq data generated from real samples. Results Here, we present ArtiFuse, a novel approach that simulates fusion genes by sequence modification to the genomic reference, and therefore, can be applied to any RNA-seq dataset without the need for any simulated reads. We demonstrate our approach on eight RNA-seq datasets for three fusion gene prediction tools: average recall values peak for all three tools between 0.4 and 0.56 for high-quality and high-coverage datasets. As ArtiFuse affords total control over involved genes and breakpoint position, we also assessed performance with regard to gene-related properties, showing a drop-in recall value for low-expressed genes in high-coverage samples and genes with co-expressed paralogues. Overall tool performance assessed from ArtiFusions is lower compared to previously reported estimates on simulated reads. Due to the use of real RNA-seq datasets, we believe that ArtiFuse provides a more realistic benchmark that can be used to develop more accurate fusion gene prediction tools for application in clinical settings. Availability and implementation ArtiFuse is implemented in Python. The source code and documentation are available at https://github.com/TRON-Bioinformatics/ArtiFusion. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Deep Learning Approach to Genomic Breakage Study from Primary Sequence

10.1101/2021.06.03.446904 ◽

2021 ◽

Author(s):

Pora Kim ◽

Hua Tan ◽

Jiajia Liu ◽

Megnyuan Yang ◽

Xiaobo Zhou

Keyword(s):

Deep Learning ◽

Gene Fusion ◽

Molecular Mechanisms ◽

Learning Approach ◽

Fusion Genes ◽

Rna Seq ◽

Structural Variants ◽

Important Goal ◽

Genomic Context ◽

Selection Of

Identifying the molecular mechanisms related to genomic breakage is an important goal of cancer mechanism studies. Among the diverse location of the breakpoints of structural variants, the fusion genes, which have the breakpoints in the gene bodies and typically identified from RNA-seq data, can provide a highlighted structural variant resource for studying the genomic breakages with expression and potential pathogenic impacts. In this study, we developed FusionAI which utilizes deep learning to predict gene fusion breakpoints based on primary sequences and let us identify fusion breakage code and genomic context. FusionAI leverages the known fusion breakpoints to provide a prediction model of the fusion genes from the primary genomic sequences via deep learning, thereby helping researchers a more accurate selection of fusion genes and better understand genomic breakage.

Download Full-text

Clinker: visualising fusion genes detected in RNA-seq data

10.1101/218586 ◽

2017 ◽

Author(s):

Breon M Schmidt ◽

Nadia M Davidson ◽

Anthony DK Hawkins ◽

Ray Bartolo ◽

Ian J Majewski ◽

...

Keyword(s):

Acute Lymphoblastic Leukaemia ◽

B Cell ◽

Lymphoblastic Leukaemia ◽

Fusion Gene ◽

Therapeutic Targets ◽

Genomic Profiling ◽

Fusion Genes ◽

Rna Seq ◽

Bioinformatics Tool ◽

Rich Diversity

ABSTRACTGenomic profiling efforts have revealed a rich diversity of oncogenic fusion genes, and many are emerging as important therapeutic targets. While there are many ways to identify fusion genes from RNA-seq data, visualising these transcripts and their supporting reads remains challenging. Clinker is a bioinformatics tool written in Python, R and Bpipe, that leverages the superTranscript method to visualise fusion genes. We demonstrate the use of Clinker to obtain interpretable visualisations of the RNA-seq data that lead to fusion calls. In addition, we use Clinker to explore multiple fusion transcripts with novel breakpoints within the P2RY8-CRLF2 fusion gene in B-cell Acute Lymphoblastic Leukaemia (B-ALL).Availability and ImplementationClinker is freely available from Github https://github.com/Oshlack/Clinker under a MIT [email protected]

Download Full-text

STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq

10.1101/120295 ◽

2017 ◽

Cited By ~ 69

Author(s):

Brian J. Haas ◽

Alex Dobin ◽

Nicolas Stransky ◽

Bo Li ◽

Xiao Yang ◽

...

Keyword(s):

Fusion Transcript ◽

Superior Performance ◽

Detection Methods ◽

Molecular Evidence ◽

Detection Accuracy ◽

Fusion Genes ◽

Rna Seq ◽

Accurate Identification ◽

Large Numbers ◽

Fusion Detection

AbstractMotivationFusion genes created by genomic rearrangements can be potent drivers of tumorigenesis. However, accurate identification of functionally fusion genes from genomic sequencing requires whole genome sequencing, since exonic sequencing alone is often insufficient. Transcriptome sequencing provides a direct, highly effective alternative for capturing molecular evidence of expressed fusions in the precision medicine pipeline, but current methods tend to be inefficient or insufficiently accurate, lacking in sensitivity or predicting large numbers of false positives. Here, we describe STAR-Fusion, a method that is both fast and accurate in identifying fusion transcripts from RNA-Seq data.ResultsWe benchmarked STAR-Fusion’s fusion detection accuracy using both simulated and genuine Illumina paired-end RNA-Seq data, and show that it has superior performance compared to popular alternative fusion detection methods.Availability and implementationSTAR-Fusion is implemented in Perl, freely available as open source software at http://star-fusion.github.io, and supported on [email protected]

Download Full-text