Deep Learning Approach to Genomic Breakage Study from Primary Sequence

2021 ◽  
Author(s):  
Pora Kim ◽  
Hua Tan ◽  
Jiajia Liu ◽  
Megnyuan Yang ◽  
Xiaobo Zhou

Identifying the molecular mechanisms related to genomic breakage is an important goal of cancer mechanism studies. Among the diverse location of the breakpoints of structural variants, the fusion genes, which have the breakpoints in the gene bodies and typically identified from RNA-seq data, can provide a highlighted structural variant resource for studying the genomic breakages with expression and potential pathogenic impacts. In this study, we developed FusionAI which utilizes deep learning to predict gene fusion breakpoints based on primary sequences and let us identify fusion breakage code and genomic context. FusionAI leverages the known fusion breakpoints to provide a prediction model of the fusion genes from the primary genomic sequences via deep learning, thereby helping researchers a more accurate selection of fusion genes and better understand genomic breakage.

2019 ◽  
Vol 1 (4) ◽  
pp. 191-198 ◽  
Author(s):  
Tian Tian ◽  
Ji Wan ◽  
Qi Song ◽  
Zhi Wei

2019 ◽  
Author(s):  
Nicholas Bernstein ◽  
Nicole Fong ◽  
Irene Lam ◽  
Margaret Roy ◽  
David G. Hendrickson ◽  
...  

AbstractSingle cell RNA-seq (scRNA-seq) measurements of gene expression enable an unprecedented high-resolution view into cellular state. However, current methods often result in two or more cells that share the same cell-identifying barcode; these “doublets” violate the fundamental premise of single cell technology and can lead to incorrect inferences. Here, we describe Solo, a semi-supervised deep learning approach that identifies doublets with greater accuracy than existing methods. Solo can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells beyond any previous approach.


2016 ◽  
Author(s):  
Chengpei Zhu ◽  
Yanling Lv ◽  
Liangcai Wu ◽  
Jinxia Guan ◽  
Xue Bai ◽  
...  

AbstractMost hepatocellular carcinoma (HCC) patients are diagnosed at advanced stages and suffer limited treatment options. Challenges in early stage diagnosis may be due to the genetic complexity of HCC. Gene fusion plays a critical function in tumorigenesis and cancer progression in multiple cancers, yet the identities of fusion genes as potential diagnostic markers in HCC have not been investigated.Paired-end RNA sequencing was performed on noncancerous and cancerous lesions in two representative HBV-HCC patients. Potential fusion genes were identified by STAR-Fusion in STAR software and validated by four publicly available RNA-seq datasets. Fourteen pairs of frozen HBV-related HCC samples and adjacent non-tumor liver tissues were examined by RT-PCR analysis for gene fusion expression.We identified 2,354 different gene fusions in the two HBV-HCC patients. Validation analysis against the four RNA-seq datasets revealed only 1.8% (43/2,354) as recurrent fusions that were supported by public datasets. Comparison with four fusion databases demonstrated that three (HLA-DPB2-HLA-DRB1, CDH23-HLA-DPB1, and C15orf57-CBX3) out of 43 recurrent gene fusions were annotated as disease-related fusion events. Nineteen were novel recurrent fusions not previously annotated to diseases, including DCUN1D3-GSG1L and SERPINA5-SERPINA9. RT-PCR and Sanger sequencing of 14 pairs of HBV-related HCC samples confirmed expression of six of the new fusions, including RP11-476K15.1-CTD-2015H3.2.Our study provides new insights into gene fusions in HCC and could contribute to the development of anti-HCC therapy. RP11–476K15.1-CTD–2015H3.2 may serve as a new therapeutic biomarker in HCC.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 22874-22883 ◽  
Author(s):  
Nour Eldeen M. Khalifa ◽  
Mohamed Hamed N. Taha ◽  
Dalia Ezzat Ali ◽  
Adam Slowik ◽  
Aboul Ella Hassanien

Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 4655-4655
Author(s):  
Paul Kerbs ◽  
Aarif Mohamed Nazeer Batcha ◽  
Sebastian Vosberg ◽  
Dirk Metzler ◽  
Tobias Herold ◽  
...  

Accurate and complete genetic classification of AML is crucial for the prediction of clinical outcome and treatment stratification. Deciphering the spectrum of genetic abnormalities by polymerase chain reaction (PCR), karyotyping and fluorescence in situ hybridization (FISH) in routine diagnostics is the current gold standard, however, fusion genes might potentially be missed by these assays. Recently, several methods have been developed to improve the detection of gene fusion transcripts based on RNA sequencing data, providing robust results. To test the detection power and assess the applicability of RNA-Seq based methods in clinical diagnostics we applied two different algorithms, namely FusionCatcher (Nicorici D et al., bioRxiv, 2014) and Arriba (Uhrig S et al., DKFZ, https://github.com/suhrig/arriba), to the transcriptomes of 895 well-characterized AML samples from three independently sequenced cohorts: AMLCG (Herold T et al., Haematologica, 2018, n=261), DKTK (Greif PA et al., Clin Cancer Res, 2018 and unpublished data, n=166), BeatAML (Tyner JW et al., Nature 2018, n=468) and publicly available healthy control samples (SRA studies: SRP018028, SRP047126, SRP050146, SRP105369, SRP115911, SRP133442, n=38). According to karyotyping, 31% (277/895) of samples harbored chromosomal aberrations putatively causing gene fusions (i.e. translocations, interstitial deletions, duplications, inversions, insertions). Analyses by FISH and/or PCR confirmed these rearrangements in 51.3% (142/277) of samples, whereas fusion detection by the means of RNA-Seq showed evidence for fusion genes corresponding to these rearrangements in 60.3% (167/277) of samples. Chromosomal aberrations, identified by karyotyping, which are known to result in clinically relevant fusions (e.g. RUNX1-RUNX1T1, KMT2A fusions) were confirmed by FISH/PCR (AMLCG: n=27/27, DKTK: n=21/21, BeatAML: n=54/57) and RNA-Seq based methods (AMLCG: n=17/27, DKTK: n=21/21, BeatAML: n=56/57) in most of the cases. Of note, the AMLCG cohort was sequenced using the SENSE mRNA Library Prep Kit from Lexogen which seems to be not optimal for fusion detection. Furthermore, 19 samples (AMLCG: n=12, DKTK: n=4, BeatAML: n=3) were found to harbor known pathogenic fusions, described in previous studies, which were not reported by routine diagnostics: NUP98-NSD1 (n=11); CBFB-MYH11, RUNX1-RUNX1T1 and DEK-NUP214 (n=2 each); RUNX1-CBFA2T2 and RUNX1-CBFA2T3 (n=1 each). Reanalysis of six of these samples by PCR confirmed three fusions which were initially missed by routine diagnostics. In general, the amount of reported fusion events by RNA-Seq is high (on average 69 and 39 per sample as detected by FusionCatcher and Arriba respectively), even after applying the built-in filters, indicating a high false positive rate. To robustly identify putative novel fusions, we developed a filtering pipeline and incorporated two new filtering steps. The promiscuity score (PS) of a fusion measures the amount of further distinct fusion partners which were detected in the respective cohort for the 5' and 3' gene. The fusion transcript score (FTS) measures the relative abundance of a fusion transcript to its 5' and 3' partner gene. PS and FTS of known, clinically relevant fusions confirmed by FISH/PCR were used to define cut-offs. To further maximize specificity while maintaining sensitivity, we excluded fusion events which we detected in publicly available healthy samples and subsequently filtered for overlapping calls from FusionCatcher and Arriba (Fig. 1A). Additionally, we obtained further evidence for a fusion event by an elevated transcription of the 3' fusion partner. In case of a fusion event, the transcription of the 3' partner gene likely gets under the control of the promoter of the 5' partner gene. This results in an elevated transcription of genes which are otherwise transcribed at low levels (Fig. 1B-C). Thus, we identified five putatively novel recurrent fusion genes which were detected in two cohorts independently: NRIP1-MIR99AHG, LATS2-ZMYM2, ATP11A-ING1, MBP-SLC66A2, PRDM16-SKI (Fig. 1D-F). Although these events were called with high evidence, we aim at independent validation by complementary methods. In our study, we have not only demonstrated that the application of RNA-Seq to the detection of fusion genes is a valuable complement to diagnostic routine but also has the potential to discover novel putatively pathogenic fusions. Disclosures No relevant conflicts of interest to declare.


2018 ◽  
Author(s):  
Hailin Hu ◽  
An Xiao ◽  
Sai Zhang ◽  
Yangyang Li ◽  
Xuanling Shi ◽  
...  

AbstractMotivationHuman immunodeficiency virus type 1 (HIV-1) genome integration is closely related to clinical latency and viral rebound. In addition to human DNA sequences that directly interact with the integration machinery, the selection of HIV integration sites has also been shown to depend on the heterogeneous genomic context around a large region, which greatly hinders the prediction and mechanistic studies of HIV integration.ResultsWe have developed an attention-based deep learning framework, named DeepHINT, to simultaneously provide accurate prediction of HIV integration sites and mechanistic explanations of the detected sites. Extensive tests on a high-density HIV integration site dataset showed that DeepHINT can outperform conventional modeling strategies by automatically learning the genomic context of HIV integration solely from primary DNA sequence information. Systematic analyses on diverse known factors of HIV integration further validated the biological relevance of the prediction result. More importantly, in-depth analyses of the attention values output by DeepHINT revealed intriguing mechanistic implications in the selection of HIV integration sites, including potential roles of several basic helix-loop-helix (bHLH) transcription factors and zinc-finger proteins. These results established DeepHINT as an effective and explainable deep learning framework for the prediction and mechanistic study of HIV integration.AvailabilityDeepHINT is available as an open-source software and can be downloaded fromhttps://github.com/nonnerdling/[email protected]@tsinghua.edu.cn


BMC Genomics ◽  
2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Yi Zhang ◽  
Xinan Liu ◽  
James MacLeod ◽  
Jinze Liu

2017 ◽  
Author(s):  
Zhiqin Huang ◽  
David T.W. Jones ◽  
Yonghe Wu ◽  
Peter Lichter ◽  
Marc Zapatka

ABSTRACTBackgroundFusion genes play an important role in the tumorigenesis of many cancers. Next-generation sequencing (NGS) technologies have been successfully applied in fusion gene detection for the last several years, and a number of NGS-based tools have been developed for identifying fusion genes during this period. Most fusion gene detection tools based on RNA-seq data report a large number of candidates (mostly false positives), making it hard to prioritize candidates for experimental validation and further analysis. Selection of reliable fusion genes for downstream analysis becomes very important in cancer research. We therefore developed confFuse, a scoring algorithm to reliably select high-confidence fusion genes which are likely to be biologically relevant.ResultsConfFuse takes multiple parameters into account in order to assign each fusion candidate a confidence score, of which score ≥8 indicates high-confidence fusion gene predictions. These parameters were manually curated based on our experience and on certain structural motifs of fusion genes. Compared with alternative tools, based on 96 published RNA-seq samples from different tumor entities, our method can significantly reduce the number of fusion candidates (301 high-confidence from 8,083 total predicted fusion genes) and keep high detection accuracy (recovery rate 85.7%). Validation of 18 novel, high-confidence fusions detected in three breast tumor samples resulted in a 100% validation rate.ConclusionsConfFuse is a novel downstream filtering method that allows selection of highly reliable fusion gene candidates for further downstream analysis and experimental validations. confFuse is available at https://github.com/Zhiqin-HUANG/confFuse.


Sign in / Sign up

Export Citation Format

Share Document