scholarly journals confFuse: high-confidence fusion gene detection across tumor entities

2017 ◽  
Author(s):  
Zhiqin Huang ◽  
David T.W. Jones ◽  
Yonghe Wu ◽  
Peter Lichter ◽  
Marc Zapatka

ABSTRACTBackgroundFusion genes play an important role in the tumorigenesis of many cancers. Next-generation sequencing (NGS) technologies have been successfully applied in fusion gene detection for the last several years, and a number of NGS-based tools have been developed for identifying fusion genes during this period. Most fusion gene detection tools based on RNA-seq data report a large number of candidates (mostly false positives), making it hard to prioritize candidates for experimental validation and further analysis. Selection of reliable fusion genes for downstream analysis becomes very important in cancer research. We therefore developed confFuse, a scoring algorithm to reliably select high-confidence fusion genes which are likely to be biologically relevant.ResultsConfFuse takes multiple parameters into account in order to assign each fusion candidate a confidence score, of which score ≥8 indicates high-confidence fusion gene predictions. These parameters were manually curated based on our experience and on certain structural motifs of fusion genes. Compared with alternative tools, based on 96 published RNA-seq samples from different tumor entities, our method can significantly reduce the number of fusion candidates (301 high-confidence from 8,083 total predicted fusion genes) and keep high detection accuracy (recovery rate 85.7%). Validation of 18 novel, high-confidence fusions detected in three breast tumor samples resulted in a 100% validation rate.ConclusionsConfFuse is a novel downstream filtering method that allows selection of highly reliable fusion gene candidates for further downstream analysis and experimental validations. confFuse is available at https://github.com/Zhiqin-HUANG/confFuse.

2017 ◽  
Vol 8 ◽  
Author(s):  
Zhiqin Huang ◽  
David T. W. Jones ◽  
Yonghe Wu ◽  
Peter Lichter ◽  
Marc Zapatka

2019 ◽  
Vol 17 (03) ◽  
pp. 1940008 ◽  
Author(s):  
Yoshiaki Sota ◽  
Shigeto Seno ◽  
Hironori Shigeta ◽  
Naoki Osato ◽  
Masafumi Shimoda ◽  
...  

Fusion genes are involved in cancer, and their detection using RNA-Seq is insufficient given the relatively short reading length. Therefore, we proposed a shifted short-read clustering (SSC) method, which focuses on overlapping reads from the same loci and extends them as a representative sequence. To verify their usefulness, we applied the SSC method to RNA-Seq data from four types of cell lines (BT-474, MCF-7, SKBR-3, and T-47D). As the slide width of the SSC method increased to one, two, five, or ten bases, the read length was extended from 201 bases to 217 (108%), 234 (116%), 282 (140%), or 317 (158%) bases, respectively. Furthermore, fusion genes were investigated using STAR-Fusion, a fusion gene detection tool, with and without the SSC method. When one base was shifted by the SSC method, the reads mapped to multiple loci decreased from 9.7% to 4.6%, and the sensitivity of the fusion gene was improved from 47% to 54% on average (BT-474: from 48% to 57%, MCF-7: 49% to 53%, SKBR-3: 50% to 57%, and T-47D: 43% to 50%) compared with original data. When the reads are shifted more, the positive predictive value was also improved. The SSC method could be an effective method for fusion gene detection.


2015 ◽  
Author(s):  
Nadia M Davidson ◽  
Ian J Majewski ◽  
Alicia Oshlack

Genomic instability is a hallmark of cancer and, as such, structural alterations and fusion genes are common events in the cancer landscape. RNA sequencing (RNA-Seq) is a powerful method for profiling cancers, but current methods for identifying fusion genes are optimized for short reads. JAFFA (https://code.google.com/p/jaffa-project/) is a sensitive fusion detection method that clearly out-performs other methods with reads of 100bp or greater. JAFFA compares a cancer transcriptome to the reference transcriptome, rather than the genome, where the cancer transcriptome is inferred using long reads directly or by de novo assembling short reads.


Haematologica ◽  
2021 ◽  
Author(s):  
Paul Kerbs ◽  
Sebastian Vosberg ◽  
Stefan Krebs ◽  
Alexander Graf ◽  
Helmut Blum ◽  
...  

Identification of fusion genes in clinical routine is mostly based on cytogenetics and targeted molecular genetics, such as metaphase karyotyping, FISH and RT-PCR. However, sequencing technologies are becoming more important in clinical routine as processing-time and costs per sample decrease. To evaluate the performance of fusion gene detection by RNA sequencing (RNAseq) compared to standard diagnostic techniques, we analyzed 806 RNA-seq samples from acute myeloid leukemia (AML) patients using two state-of-the-art software tools, namely Arriba and FusionCatcher. RNA-seq detected 90% of fusion events that were reported by routine with high evidence, while samples in which RNA-seq failed to detect fusion genes had overall lower and inhomogeneous sequence coverage. Based on properties of known and unknown fusion events, we developed a workflow with integrated filtering strategies for the identification of robust fusion gene candidates by RNA-seq. Thereby, we detected known recurrent fusion events in 26 cases that were not reported by routine and found discrepancies in evidence for known fusion events between routine and RNA-seq in three cases. Moreover, we identified 157 fusion genes as novel robust candidates and comparison to entries from ChimerDB or Mitelman Database showed novel recurrence of fusion genes in 14 cases. Finally, we detected the novel recurrent fusion gene NRIP1-MIR99AHG resulting from inv(21)(q11.2;q21.1) in nine patients (1.1%) and LTN1-MX1 resulting from inv(21)(q21.3;q22.3) in two patients (0.25%). We demonstrated that NRIP1-MIR99AHG results in overexpression of the 3' region of MIR99AHG and the disruption of the tricistronic miRNA cluster miR-99a/let-7c/miR-125b-2. Interestingly, upregulation of MIR99AHG and deregulation of the miRNA cluster, residing in the MIR99AHG locus, are known mechanism of leukemogenesis in acute megakaryoblastic leukemia. Our findings demonstrate that RNA-seq has a strong potential to improve the systematic detection of fusion genes in clinical applications and provides a valuable tool for fusion discovery.


2019 ◽  
Author(s):  
Christopher A. Hilker ◽  
Aditya V. Bhagwate ◽  
Jin Sung Jang ◽  
Jeffrey G Meyer ◽  
Asha A. Nair ◽  
...  

AbstractFormalin fixed paraffin embedded (FFPE) tissues are commonly used biospecimen for clinical diagnosis. However, RNA degradation is extensive when isolated from FFPE blocks making it challenging for whole transcriptome profiling (RNA-seq). Here, we examined RNA isolation methods, quality metrics, and the performance of RNA-seq using different approaches with RNA isolated from FFPE and fresh frozen (FF) tissues. We evaluated FFPE RNA extraction methods using six different tissues and five different methods. The reproducibility and quality of the prepared libraries from these RNAs were assessed by RNA-seq. We next examined the performance and reproducibility of RNA-seq for gene expression profiling with FFPE and FF samples using targeted (Kinome capture) and whole transcriptome capture based sequencing. Finally, we assessed Agilent SureSelect All-Exon V6+UTR capture and the Illumina TruSeq RNA Access protocols for their ability to detect known gene fusions in FFPE RNA samples. Although the overall yield of RNA varied among extraction methods, gene expression profiles generated by RNA-seq were highly correlated (>90%) when the input RNA was of sufficient quality (≥DV200 30%) and quantity (≥ 100 ng). Using gene capture, we observed a linear relationship between gene expression levels for shared genes that were captured using either All-Exon or Kinome kits. Gene expression correlations between the two capture-based approaches were similar using RNA from FFPE and FF samples. However, TruSeq RNA Access protocol provided significantly higher exon and junction reads when compared to the SureSelect All-Exon capture kit and was more sensitive for fusion gene detection. Our study established pre and post library construction QC parameters that are essential to reproducible RNA-seq profiling using FFPE samples. We show that gene capture based NGS sequencing is an efficient and highly reproducible strategy for gene expression measurements as well as fusion gene detection.


2019 ◽  
Author(s):  
Christina Stangl ◽  
Sam de Blank ◽  
Ivo Renkens ◽  
Tamara Verbeek ◽  
Jose Espejo Valle-Inclan ◽  
...  

AbstractFusion genes are hallmarks of various cancer types and important determinants for diagnosis, prognosis and treatment possibilities. The promiscuity of fusion genes with respect to partner choice and exact breakpoint-positions restricts their detection in the diagnostic setting, even for known and recurrent fusion gene configurations. To accurately identify these gene fusions in an unbiased manner, we developed FUDGE: a FUsion gene Detection assay from Gene Enrichment. FUDGE couples target-selected and strand-specific CRISPR/Cas9 activity for enrichment and detection of fusion gene drivers (e.g. BRAF, EWSR1, KMT2A/MLL) - without prior knowledge of fusion partner or breakpoint-location - to long-read Nanopore sequencing. FUDGE encompasses a dedicated bioinformatics approach (NanoFG) to detect fusion genes from Nanopore sequencing data. Our strategy is flexible with respect to target choice and enables multiplexed enrichment for simultaneous analysis of several genes in multiple samples in a single sequencing run. We observe on average a 508 fold on-target enrichment and identify fusion breakpoints at nucleotide resolution - all within two days. We demonstrate that FUDGE effectively identifies fusion genes in cancer cell lines, tumor samples and on whole genome amplified DNA irrespective of partner gene or breakpoint-position in 100% of cases. Furthermore, we show that FUDGE is superior to routine diagnostic methods for fusion gene detection. In summary, we have developed a rapid and versatile fusion gene detection assay, providing an unparalleled opportunity for pan-cancer detection of fusion genes in routine diagnostics.


2019 ◽  
Author(s):  
Patrick Sorn ◽  
Christoph Holtsträter ◽  
Martin Löwer ◽  
Ugur Sahin ◽  
David Weber

Abstract Motivation Gene fusions are an important class of transcriptional variants that can influence cancer development and can be predicted from RNA sequencing (RNA-seq) data by multiple existing tools. However, the real-world performance of these tools is unclear due to the lack of known positive and negative events, especially with regard to fusion genes in individual samples. Often simulated reads are used, but these cannot account for all technical biases in RNA-seq data generated from real samples. Results Here, we present ArtiFuse, a novel approach that simulates fusion genes by sequence modification to the genomic reference, and therefore, can be applied to any RNA-seq dataset without the need for any simulated reads. We demonstrate our approach on eight RNA-seq datasets for three fusion gene prediction tools: average recall values peak for all three tools between 0.4 and 0.56 for high-quality and high-coverage datasets. As ArtiFuse affords total control over involved genes and breakpoint position, we also assessed performance with regard to gene-related properties, showing a drop-in recall value for low-expressed genes in high-coverage samples and genes with co-expressed paralogues. Overall tool performance assessed from ArtiFusions is lower compared to previously reported estimates on simulated reads. Due to the use of real RNA-seq datasets, we believe that ArtiFuse provides a more realistic benchmark that can be used to develop more accurate fusion gene prediction tools for application in clinical settings. Availability and implementation ArtiFuse is implemented in Python. The source code and documentation are available at https://github.com/TRON-Bioinformatics/ArtiFusion. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Pora Kim ◽  
Hua Tan ◽  
Jiajia Liu ◽  
Megnyuan Yang ◽  
Xiaobo Zhou

Identifying the molecular mechanisms related to genomic breakage is an important goal of cancer mechanism studies. Among the diverse location of the breakpoints of structural variants, the fusion genes, which have the breakpoints in the gene bodies and typically identified from RNA-seq data, can provide a highlighted structural variant resource for studying the genomic breakages with expression and potential pathogenic impacts. In this study, we developed FusionAI which utilizes deep learning to predict gene fusion breakpoints based on primary sequences and let us identify fusion breakage code and genomic context. FusionAI leverages the known fusion breakpoints to provide a prediction model of the fusion genes from the primary genomic sequences via deep learning, thereby helping researchers a more accurate selection of fusion genes and better understand genomic breakage.


2017 ◽  
Author(s):  
Breon M Schmidt ◽  
Nadia M Davidson ◽  
Anthony DK Hawkins ◽  
Ray Bartolo ◽  
Ian J Majewski ◽  
...  

ABSTRACTGenomic profiling efforts have revealed a rich diversity of oncogenic fusion genes, and many are emerging as important therapeutic targets. While there are many ways to identify fusion genes from RNA-seq data, visualising these transcripts and their supporting reads remains challenging. Clinker is a bioinformatics tool written in Python, R and Bpipe, that leverages the superTranscript method to visualise fusion genes. We demonstrate the use of Clinker to obtain interpretable visualisations of the RNA-seq data that lead to fusion calls. In addition, we use Clinker to explore multiple fusion transcripts with novel breakpoints within the P2RY8-CRLF2 fusion gene in B-cell Acute Lymphoblastic Leukaemia (B-ALL).Availability and ImplementationClinker is freely available from Github https://github.com/Oshlack/Clinker under a MIT [email protected]


2017 ◽  
Author(s):  
Brian J. Haas ◽  
Alex Dobin ◽  
Nicolas Stransky ◽  
Bo Li ◽  
Xiao Yang ◽  
...  

AbstractMotivationFusion genes created by genomic rearrangements can be potent drivers of tumorigenesis. However, accurate identification of functionally fusion genes from genomic sequencing requires whole genome sequencing, since exonic sequencing alone is often insufficient. Transcriptome sequencing provides a direct, highly effective alternative for capturing molecular evidence of expressed fusions in the precision medicine pipeline, but current methods tend to be inefficient or insufficiently accurate, lacking in sensitivity or predicting large numbers of false positives. Here, we describe STAR-Fusion, a method that is both fast and accurate in identifying fusion transcripts from RNA-Seq data.ResultsWe benchmarked STAR-Fusion’s fusion detection accuracy using both simulated and genuine Illumina paired-end RNA-Seq data, and show that it has superior performance compared to popular alternative fusion detection methods.Availability and implementationSTAR-Fusion is implemented in Perl, freely available as open source software at http://star-fusion.github.io, and supported on [email protected]


Sign in / Sign up

Export Citation Format

Share Document