scholarly journals FusionQ: a novel approach for gene fusion detection and quantification from paired-end RNA-Seq

2013 ◽  
Vol 14 (1) ◽  
pp. 193 ◽  
Author(s):  
Chenglin Liu ◽  
Jinwen Ma ◽  
ChungChe Chang ◽  
Xiaobo Zhou
2017 ◽  
Author(s):  
Páll Melsted ◽  
Shannon Hateley ◽  
Isaac Charles Joseph ◽  
Harold Pimentel ◽  
Nicolas Bray ◽  
...  

RNA sequencing in cancer cells is a powerful technique to detect chromosomal rearrangements, allowing for de novo discovery of actively expressed fusion genes. Here we focus on the problem of detecting gene fusions from raw sequencing data, assembling the reads to define fusion transcripts and their associated breakpoints, and quantifying their abundances. Building on the pseudoalignment idea that simplifies and accelerates transcript quantification, we introduce a novel approach to fusion detection based on inspecting paired reads that cannot be pseudoaligned due to conflicting matches. The method and software, called pizzly, filters false positives, assembles new transcripts from the fusion reads, and reports candidate fusions. With pizzly, fusion detection from raw RNA-Seq reads can be performed in a matter of minutes, making the program suitable for the analysis of large cancer gene expression databases and for clinical use. pizzly is available at https://github.com/pmelsted/pizzly


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 4655-4655
Author(s):  
Paul Kerbs ◽  
Aarif Mohamed Nazeer Batcha ◽  
Sebastian Vosberg ◽  
Dirk Metzler ◽  
Tobias Herold ◽  
...  

Accurate and complete genetic classification of AML is crucial for the prediction of clinical outcome and treatment stratification. Deciphering the spectrum of genetic abnormalities by polymerase chain reaction (PCR), karyotyping and fluorescence in situ hybridization (FISH) in routine diagnostics is the current gold standard, however, fusion genes might potentially be missed by these assays. Recently, several methods have been developed to improve the detection of gene fusion transcripts based on RNA sequencing data, providing robust results. To test the detection power and assess the applicability of RNA-Seq based methods in clinical diagnostics we applied two different algorithms, namely FusionCatcher (Nicorici D et al., bioRxiv, 2014) and Arriba (Uhrig S et al., DKFZ, https://github.com/suhrig/arriba), to the transcriptomes of 895 well-characterized AML samples from three independently sequenced cohorts: AMLCG (Herold T et al., Haematologica, 2018, n=261), DKTK (Greif PA et al., Clin Cancer Res, 2018 and unpublished data, n=166), BeatAML (Tyner JW et al., Nature 2018, n=468) and publicly available healthy control samples (SRA studies: SRP018028, SRP047126, SRP050146, SRP105369, SRP115911, SRP133442, n=38). According to karyotyping, 31% (277/895) of samples harbored chromosomal aberrations putatively causing gene fusions (i.e. translocations, interstitial deletions, duplications, inversions, insertions). Analyses by FISH and/or PCR confirmed these rearrangements in 51.3% (142/277) of samples, whereas fusion detection by the means of RNA-Seq showed evidence for fusion genes corresponding to these rearrangements in 60.3% (167/277) of samples. Chromosomal aberrations, identified by karyotyping, which are known to result in clinically relevant fusions (e.g. RUNX1-RUNX1T1, KMT2A fusions) were confirmed by FISH/PCR (AMLCG: n=27/27, DKTK: n=21/21, BeatAML: n=54/57) and RNA-Seq based methods (AMLCG: n=17/27, DKTK: n=21/21, BeatAML: n=56/57) in most of the cases. Of note, the AMLCG cohort was sequenced using the SENSE mRNA Library Prep Kit from Lexogen which seems to be not optimal for fusion detection. Furthermore, 19 samples (AMLCG: n=12, DKTK: n=4, BeatAML: n=3) were found to harbor known pathogenic fusions, described in previous studies, which were not reported by routine diagnostics: NUP98-NSD1 (n=11); CBFB-MYH11, RUNX1-RUNX1T1 and DEK-NUP214 (n=2 each); RUNX1-CBFA2T2 and RUNX1-CBFA2T3 (n=1 each). Reanalysis of six of these samples by PCR confirmed three fusions which were initially missed by routine diagnostics. In general, the amount of reported fusion events by RNA-Seq is high (on average 69 and 39 per sample as detected by FusionCatcher and Arriba respectively), even after applying the built-in filters, indicating a high false positive rate. To robustly identify putative novel fusions, we developed a filtering pipeline and incorporated two new filtering steps. The promiscuity score (PS) of a fusion measures the amount of further distinct fusion partners which were detected in the respective cohort for the 5' and 3' gene. The fusion transcript score (FTS) measures the relative abundance of a fusion transcript to its 5' and 3' partner gene. PS and FTS of known, clinically relevant fusions confirmed by FISH/PCR were used to define cut-offs. To further maximize specificity while maintaining sensitivity, we excluded fusion events which we detected in publicly available healthy samples and subsequently filtered for overlapping calls from FusionCatcher and Arriba (Fig. 1A). Additionally, we obtained further evidence for a fusion event by an elevated transcription of the 3' fusion partner. In case of a fusion event, the transcription of the 3' partner gene likely gets under the control of the promoter of the 5' partner gene. This results in an elevated transcription of genes which are otherwise transcribed at low levels (Fig. 1B-C). Thus, we identified five putatively novel recurrent fusion genes which were detected in two cohorts independently: NRIP1-MIR99AHG, LATS2-ZMYM2, ATP11A-ING1, MBP-SLC66A2, PRDM16-SKI (Fig. 1D-F). Although these events were called with high evidence, we aim at independent validation by complementary methods. In our study, we have not only demonstrated that the application of RNA-Seq to the detection of fusion genes is a valuable complement to diagnostic routine but also has the potential to discover novel putatively pathogenic fusions. Disclosures No relevant conflicts of interest to declare.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Qian Liu ◽  
Yu Hu ◽  
Andres Stucky ◽  
Li Fang ◽  
Jiang F. Zhong ◽  
...  

Abstract Background Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. Results In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. Conclusions In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 1898-1898
Author(s):  
Steven M. Foltz ◽  
Qingsong Gao ◽  
Christopher J. Yoon ◽  
Amila Weerasinghe ◽  
Hua Sun ◽  
...  

Abstract Introduction: Gene fusions are the result of genomic rearrangements that create hybrid protein products or bring the regulatory elements of one gene into close proximity of another. Fusions often dysregulate gene function or expression through oncogene overexpression or tumor suppressor underexpression (Gao, Liang, Foltz, et al. Cell Rep 2018). Some fusions such as EML4--ALK in lung adenocarcinoma are known druggable targets. Fusion detection algorithms utilize discordantly mapped RNA-seq reads. Careful consideration of detection and filtering procedures is vital for large-scale fusion detection because current methods are prone to reporting false positives and show poor concordance. Multiple myeloma (MM) is a blood cancer in which rapidly expanding clones of plasma cells spread in the bone marrow. Translocations that juxtapose the highly-expressed IGH enhancer with potential oncogenes are associated with overexpression of partner genes, although they may not lead to a detectable gene fusion in RNA-seq data. Previous studies have explored the fusion landscape of multiple myeloma cohorts (Cleynen, et al. Nat Comm 2017; Nasser, et al. Blood 2017). In this study, we developed a novel gene fusion detection pipeline and post-processing strategy to analyze 742 patient samples at the primary time point and 64 samples at follow-up time points (806 total samples) from the Multiple Myeloma Research Foundation (MMRF) CoMMpass Study using RNA-seq, WGS, and clinical data. Methods and Results: We overlapped five fusion detection algorithms (EricScript, FusionCatcher, INTEGRATE, PRADA, and STAR-Fusion) to report fusion events. Our filtered call set consisted of 2,817 fusions with a median of 3 fusions per sample (mean 3.8), similar to glioblastoma, breast, ovarian, and prostate cancers in TCGA. Major recurrent fusions involving immunoglobulin genes included IGH--WHSC1 (88 primary samples), IGL--BMI1 (29), and the upstream neighbor of MYC, PVT1, paired with IGH (6), IGK (3), and IGL (11). For each event, we used WGS data when available to determine if there was genomic support of the gene fusion (based on discordant WGS reads, SV event detection, and MMRF CoMMpass Seq-FISH WGS results) (Miller, et al. Blood 2016). WGS validation rates varied by the level of RNA-seq evidence supporting each fusion, with an overall rate of 24.1%, which is comparable to previously observed pan-cancer validation rates using low-pass WGS. We calculated the association between fusion status and gene expression and identified genes such as BCL2L11, CCND1/2, LTBR, and TXNDC5 that showed significant overexpression (t-test). We explored the clinical connections of fusion events through survival analysis and clinical data correlations, and by mining potentially druggable targets from our Database of Evidence for Precision Oncology (dinglab.wustl.edu/depo) (Sun, Mashl, Sengupta, et al. Bioinformatics 2018). Major examples of upregulated fusion kinases that could potentially be targeted with off-label drug use include FGFR3 and NTRK1. We examined the evolution of fusion events over multiple time points. In one MMRF patient with a t(8;14) translocation joining the IGH locus and transcription factor MAFA, we observed IGH fusions with TOP1MT (neighbor of MAFA) at all four time points with corresponding high expression of TOP1MT and MAFA. Using non-MMRF single-cell RNA data from different patients, we were able to track cell-type composition over time as well as detect subpopulations of cells harboring fusions at different time points with potential treatment implications. Discussion: Gene fusions offer potential targets for alternative MM therapies. Careful implementation of gene fusion detection algorithms and post-processing are essential in large cohort studies to reduce false positives and enrich results for clinically relevant information. Clinical fusion detection from untargeted RNA-seq remains a challenge due to poor sensitivity, specificity, and usability. By combining MMRF CoMMpass data from multiple platforms, we have produced a comprehensive fusion profile of 742 MM patients. We have shown novel gene fusion associations with gene expression and clinical data, and we identified candidates for druggability studies. Disclosures Vij: Bristol-Myers Squibb: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Celgene: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Jazz Pharmaceuticals: Honoraria, Membership on an entity's Board of Directors or advisory committees; Jansson: Honoraria, Membership on an entity's Board of Directors or advisory committees; Amgen: Honoraria, Membership on an entity's Board of Directors or advisory committees; Karyopharma: Honoraria, Membership on an entity's Board of Directors or advisory committees; Takeda: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding.


2020 ◽  
Author(s):  
Michael Apostolides ◽  
Yue Jiang ◽  
Mia Husić ◽  
Robert Siddaway ◽  
Cynthia Hawkins ◽  
...  

AbstractMotivationGene fusions are often associated with cancer, yet current fusion detection tools vary in their calling approaches, making selecting the right tool challenging. Ensemble fusion calling techniques appear promising; however, current options have limited accessibility and function.ResultsMetaFusion is a flexible meta-calling tool that amalgamates the outputs from any number of fusion callers. Results from individual callers are converted into Common Fusion Format, a new file type that standardizes outputs from callers. Calls are then annotated, merged using graph clustering, filtered and ranked to provide a final output of high confidence candidates. MetaFusion consistently outperformed individual callers with respect to recall and precision on real and simulated datasets, achieving up to 100% precision. Thus, an ensemble calling approach is imperative for high confidence results. MetaFusion also labels fusions found in databases using the FusionAnnotator package, and is provided with a benchmarking toolkit to calibrate new callers.AvailabilityMetaFusion is freely available at https://github.com/ccmbioinfo/[email protected]


Author(s):  
Stephen Gross ◽  
Lisa Watson ◽  
Smita Pathak ◽  
Irina Khrebtukova ◽  
Gary Schroth
Keyword(s):  
Rna Seq ◽  

2011 ◽  
Vol 27 (8) ◽  
pp. 1068-1075 ◽  
Author(s):  
Marcus Kinsella ◽  
Olivier Harismendy ◽  
Masakazu Nakano ◽  
Kelly A. Frazer ◽  
Vineet Bafna
Keyword(s):  
Rna Seq ◽  

Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 107-107
Author(s):  
Marco J. Koudijs ◽  
Lennart A. Kester ◽  
Jayne Y. Hehir-Kwa ◽  
Eugene T.P. Verwiel ◽  
Erik Strengman ◽  
...  

Abstract Background Diagnosis and treatment of hematological malignancies relies increasingly on the detection of underlying genetic abnormalities. Various laboratory techniques, including karyotyping, SNP-array, FISH, MLPA and RT-PCR are typically required to detect the full spectrum of clinically relevant genetic aberrations. These techniques are also hampered in their sensitivity by their targeted approach or lack of resolution. Ideally, an unbiased genome wide approach like RNA sequencing (RNA-seq) as a one-test-fits-all, could save costs and efforts and streamline diagnostic procedures. In the Netherlands, the care for all children with oncological disorders has been concentrated in a single, national center. Within the Laboratory of Childhood Cancer Pathology, we aim for a comprehensive diagnostic pipeline by implementing RNA-seq to aid diagnosis, prognosis and treatment of all children with cancer in the Netherlands. Methods We have established an RNA-seq based diagnostic pipeline, primarily aimed at detecting gene fusion events. Library prep is performed on 50-300 ng total RNA isolated from fresh (frozen) samples, followed by ribo-depletion and subsequent paired-end sequencing (2x150 nt) using the Illumina NovaSeq platform. Data is analyzed using the StarFusion algorithm for gene-fusion detection. We are prospectively comparing the results with routine diagnostic procedures. In addition, we are validating the detection of single nucleotide variants (SNVs) from RNA-seq data and developing a diagnostic classifier, using a nearest neighbor network approach. Results Based on RNA-seq profiling in diagnostics for all patients entering the Princess Maxima Center, there are several use-cases that highlight the value of RNA-seq. 1) In a prospective cohort of 244 patients (pan-cancer, including 97 hematological malignancies) we have shown that the diagnostic yield for detecting gene fusion events increased by approximately 40% compared to classical methods. An example is the TNIP1--PDGFRB gene fusion in a patient with pre B-ALL, making this patient eligible for imatinib treatment, which was not detected by other methods. 2) Variant calling on RNA-seq shows that activating mutations in e.g. KRAS are detected with high sensitivity, stratifying patients for therapeutic MEK intervention. 3) By expression outlier analysis, we were able to detect various promotor exchanges, e.g. IGH-MYC or IGH--DUX4, which are typically hard to detect by molecular techniques since the genomic breakpoint is highly variable and no chimeric transcript is formed. 4) Preliminary results from our diagnostic classifier show its potential to predict subclasses of hematological malignancies, e.g. high-hyperdiploid or bi-phenotypic ALL patients. 5) Fusion gene breakpoints detected by RNA-seq serve as a target for MRD analysis, allowing us to monitor disease progression and therapy response in individual patients. Currently, RNA-seq data is available for more than 1500 pediatric tumor samples. At the upcoming conference we will present an update of our results and some typical cases highlighting the added value of RNA-seq in routine diagnostics. Conclusion We show that RNA-seq on pediatric cancer samples is feasible and of great value for routine diagnostics. It has a higher sensitivity to detect gene fusion events compared to targeted assays. RNA-seq based gene fusion detection, in combination with mutation and expression analysis, is also promising to improve classification of malignancies, prognosis and stratification of patients for targeted therapies. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Vol 22 (14) ◽  
pp. 7514
Author(s):  
David S. Moura ◽  
Juan Díaz-Martín ◽  
Silvia Bagué ◽  
Ruth Orellana-Fernandez ◽  
Ana Sebio ◽  
...  

Solitary fibrous tumor is a rare subtype of soft-tissue sarcoma with a wide spectrum of histopathological features and clinical behaviors, ranging from mildly to highly aggressive tumors. The defining genetic driver alteration is the gene fusion NAB2–STAT6, resulting from a paracentric inversion within chromosome 12q, and involving several different exons in each gene. STAT6 (signal transducer and activator of transcription 6) nuclear immunostaining and/or the identification of NAB2–STAT6 gene fusion is required for the diagnostic confirmation of solitary fibrous tumor. In the present study, a new gene fusion consisting of Nuclear Factor I X (NFIX), mapping to 19p13.2 and STAT6, mapping to 12q13.3 was identified by targeted RNA-Seq in a 74-year-old female patient diagnosed with a deep-seated solitary fibrous tumor in the pelvis. Histopathologically, the neoplasm did not display nuclear pleomorphism or tumor necrosis and had a low proliferative index. A total of 378 unique reads spanning the NFIXexon8–STAT6exon2 breakpoint with 55 different start sites were detected in the bioinformatic analysis, which represented 59.5% of the reads intersecting the genomic location on either side of the breakpoint. Targeted RNA-Seq results were validated by RT-PCR/ Sanger sequencing. The identification of a new gene fusion partner for STAT6 in solitary fibrous tumor opens intriguing new hypotheses to refine the role of STAT6 in the sarcomatogenesis of this entity.


Sign in / Sign up

Export Citation Format

Share Document