Fusion Transcript Detection from RNA-Seq using Jaccard Distance

Author(s):  
Hamidreza Mohebbi ◽  
Nurit Haspel ◽  
Dan Simovici ◽  
Joyce Quach
2021 ◽  
Author(s):  
Hamid Reza Mohebbi ◽  
Nurit Haspel

Gene fusions events, which are the result of two genes fused together to create a hybrid gene, were first described in cancer cells in the early 1980s. These events are relatively common in many cancers including prostate, lymphoid, soft tissue, and breast. Recent advances in next-generation sequencing (NGS) provide a high volume of genomic data, including cancer genomes. The detection of possible gene fusions requires fast and accurate methods. However, current methods suffer from inefficiency, lack of sufficient accuracy, and a high false-positive rate. We present an RNA-Seq fusion detection method that uses dimensionality reduction and parallel computing to speed up the computation. We convert the RNA categorical space into a compact binary array called binary fingerprints, which enables us to reduce the memory usage and increase efficiency. The search and detection of fusion candidates are done using the Jaccard distance. The detection of candidates is followed by refinement. We benchmarked our fusion prediction accuracy using both simulated and genuine RNA-Seq datasets. Paired-end Illumina RNA-Seq genuine data were obtained from 60 publicly available cancer cell line data sets. The results are compared against the state-of-the-art-methods such as STAR-Fusion, InFusion, and TopHat-Fusion. Our results show that FDJD exhibits superior accuracy compared to popular alternative fusion detection methods. We achieved 90% accuracy on simulated fusion transcript inputs, which is the highest among the compared methods while maintaining comparable run time.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Brian J. Haas ◽  
Alexander Dobin ◽  
Bo Li ◽  
Nicolas Stransky ◽  
Nathalie Pochet ◽  
...  

Abstract Background Accurate fusion transcript detection is essential for comprehensive characterization of cancer transcriptomes. Over the last decade, multiple bioinformatic tools have been developed to predict fusions from RNA-seq, based on either read mapping or de novo fusion transcript assembly. Results We benchmark 23 different methods including applications we develop, STAR-Fusion and TrinityFusion, leveraging both simulated and real RNA-seq. Overall, STAR-Fusion, Arriba, and STAR-SEQR are the most accurate and fastest for fusion detection on cancer transcriptomes. Conclusion The lower accuracy of de novo assembly-based methods notwithstanding, they are useful for reconstructing fusion isoforms and tumor viruses, both of which are important in cancer research.


Author(s):  
Martin Philpott ◽  
Jonathan Watson ◽  
Anjan Thakurta ◽  
Tom Brown ◽  
Tom Brown ◽  
...  

AbstractHere we describe single-cell corrected long-read sequencing (scCOLOR-seq), which enables error correction of barcode and unique molecular identifier oligonucleotide sequences and permits standalone cDNA nanopore sequencing of single cells. Barcodes and unique molecular identifiers are synthesized using dimeric nucleotide building blocks that allow error detection. We illustrate the use of the method for evaluating barcode assignment accuracy, differential isoform usage in myeloma cell lines, and fusion transcript detection in a sarcoma cell line.


2014 ◽  
Author(s):  
Angie Cheng ◽  
Varun Bagai ◽  
Joey Cienfuegos ◽  
Natalie Hernandez ◽  
Mu Li ◽  
...  

2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Stefanie Friedrich ◽  
Erik L. L. Sonnhammer

Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 1278-1278
Author(s):  
Fabiana Ostronoff ◽  
Matthew Fitzgibbon ◽  
Martin McIntosh ◽  
Rhonda E. Ries ◽  
Alan S. Gamis ◽  
...  

Abstract Abstract 1278 Introduction: Acute myeloid leukemia (AML) represents a heterogeneous group of malignancies with great variability in response to therapy. In recent years, an increasing list of molecular markers with prognostic significance in AML has been identified; nonetheless, new prognostic markers and therapeutic targets are still needed. The aim of this study was to identify and verify fusion transcripts using RNA-Sequencing (RNA-Seq) that would be otherwise undetectable by conventional karyotyping. Methods: Transcriptome Sequence data is generated by high-throughput short-read RNA-Seq performed for each AML sample on the Illumina HiSeq. Poly(A) RNA is captured with poly(T) magnetic beads, fragmented, copied to cDNA libraries with reverse transcriptase and random primers. Each library is subjected to 50-cycle paired-end sequencing on the Illumina HiSeq at Hudson Alpha. Filtered Fastq files are processed with TopHat-Fusion [Kim2011,Trapnell2009] alignment software to discover cryptic fusions in RNA-Seq data without relying on known, annotated models. This process yielded an average of 20 million alignable reads per sample. Cord blood blast cell transcripts are also processed and serve as normal controls. A series of filtering steps eliminate junctions commonly found to be in error. Filtered junctions found in at least 3 AML samples and no normal controls are retained as AML-associated candidate junctions. Visual curation of candidates is performed using Integrative Genomics Viewer. Candidate fusions were verified by RT-PCR amplification of the AML-associated fusions in the index cases. Fusion transcript product, as well as the break point junction was verified by Sanger sequencing Results: Diagnostic specimens from 70 patients with de novo AML that included patients with normal karyotype (NK, N=31), core-binding factor (CBF) AML (N=33) and other (N=6) were sequenced. Age at diagnosis varied from 10 months to 69 years (Median 12 years). White blood cell count (WBC) and blast percentage were 49×109/L (range, 2.4 to 496×109/L) and 78% (40% to 100%), respectively. Bioinformatic evaluation of the RNA-Seq data revealed 67 high-value novel fusions that were not detected by conventional karyotyping: 54 (80.6%) were intra- and 13 (19.4%) inter-chromosomal junctions. The number of novel translocations varied in different cytogenetic groups, with 22 novel fusions detected in those with NK (16 intra and 6 inter-chromosomal junctions), 37 CBF (32 intra and 5 inter-chromosomal junctions) and 8 in “other” (6 intra and 2 inter-chromosomal junctions). Thirteen novel fusions (19.4%) were found in at least 2 or more screened-patients: two (15.4%) inter- and 11 (84.6%) intra-chromosomal junctions. Median number of fusions identified per patient was 2 (range, 1 to 6). Novel fusions involving PDGFR-β gene were identified in two patients, each with a different translocation partner (G3BP1 and ETV6, which was an intra and inter-chromosomal fusions, respectively). Sequencing of the fusion transcript junctions verified the fusion junctions and demonstrated in frame fusions of G3BP1 and ETV6 to the kinase domain coding region of PDGFR-β, identical junction to that seen in cases of imatinib sensitive idiopathic hypereosinophilic syndrome (IHES). Frequency validation in 100 adult and 100 pediatric cases identified one additional patient with G3BP1-PDGFR-β. Cryptic NUP98/NSD1 was identified and verified in two patients with normal karyotype as well as NUP98/HOXD13 translocation in one patient. Frequency determination of NUP98/NSD1 demonstrated prevalence of 7.8% in patients with NK, and that of 13% in patients with FLT3/ITD. Patients who harbored both NUP98/NSD1 fusion and FLT3/ITD had a dismal remission induction rate (CR rate in FLT3/ITD with and without NUP98/NSD1 was 28% vs. 73%; p=0.002). Conclusion: Our data show the applicability of RNA-Seq as a tool to discover cryptic fusion transcripts in AML. These novel fusions may define new independent prognostic markers and potential therapeutic targets for patients with this highly treatment-resistant disease. Disclosures: No relevant conflicts of interest to declare.


1996 ◽  
Vol 16 (3) ◽  
pp. 379-392 ◽  
Author(s):  
C. S. Lee ◽  
Melissa C. Southey ◽  
Keith Waters ◽  
George Kannourakis ◽  
Toula Georgiou ◽  
...  

2020 ◽  
Author(s):  
Stefanie Friedrich ◽  
Erik LL Sonnhammer

Abstract Background Fusion transcripts are involved in tumourigenesis and play a crucial role in tumour heterogeneity, tumour evolution and cancer treatment resistance. However, fusion transcripts have not been studied at high spatial resolution in tissue sections due to the lack of full-length transcripts with spatial information. New high-throughput technologies like spatial transcriptomics measure the transcriptome of tissue sections on almost single-cell level. While this technique does not allow for direct detection of fusion transcripts, we show that they can be inferred using the relative poly(A) tail abundance of the involved parental genes. Method We present a new method STfusion, which uses spatial transcriptomics to infer the presence and absence of poly(A) tails. A fusion transcript lacks a poly(A) tail for the 5´ gene and has an elevated number of poly(A) tails for the 3´ gene. Its expression level is defined by the upstream promoter of the 5´ gene. STfusion measures the difference between the observed and expected number of poly(A) tails with a novel C-score. Results We verified the STfusion ability to predict fusion transcripts on HeLa cells with known fusions. STfusion and C-score applied to clinical prostate cancer data revealed the spatial distribution of the cis-SAGe SLC45A3-ELK4 in 12 tissue sections with almost single-cell resolution. The cis-SAGe occurred in disease areas, e.g. inflamed, prostatic intraepithelial neoplastic, or cancerous areas, and occasionally in normal glands. Conclusions STfusion detects fusion transcripts in cancer cell line and clinical tissue data, and distinguishes chimeric transcripts from chimeras caused by trans-splicing events. With STfusion and the use of C-scores, fusion transcripts can be spatially localised in clinical tissue sections on almost single cell level. Keywords Fusion transcript detection, Spatial Transcriptomics, gene fusion, cis-SAGE, oncogene


Sign in / Sign up

Export Citation Format

Share Document