scholarly journals Deep learning for cancer type classification and driver gene identification

2021 ◽  
Vol 22 (S4) ◽  
Author(s):  
Zexian Zeng ◽  
Chengsheng Mao ◽  
Andy Vo ◽  
Xiaoyu Li ◽  
Janna Ore Nugent ◽  
...  

Abstract Background Genetic information is becoming more readily available and is increasingly being used to predict patient cancer types as well as their subtypes. Most classification methods thus far utilize somatic mutations as independent features for classification and are limited by study power. We aim to develop a novel method to effectively explore the landscape of genetic variants, including germline variants, and small insertions and deletions for cancer type prediction. Results We proposed DeepCues, a deep learning model that utilizes convolutional neural networks to unbiasedly derive features from raw cancer DNA sequencing data for disease classification and relevant gene discovery. Using raw whole-exome sequencing as features, germline variants and somatic mutations, including insertions and deletions, were interactively amalgamated for feature generation and cancer prediction. We applied DeepCues to a dataset from TCGA to classify seven different types of major cancers and obtained an overall accuracy of 77.6%. We compared DeepCues to conventional methods and demonstrated a significant overall improvement (p < 0.001). Strikingly, using DeepCues, the top 20 breast cancer relevant genes we have identified, had a 40% overlap with the top 20 known breast cancer driver genes. Conclusion Our results support DeepCues as a novel method to improve the representational resolution of DNA sequencings and its power in deriving features from raw sequences for cancer type prediction, as well as discovering new cancer relevant genes.

2019 ◽  
Author(s):  
Zexian Zeng ◽  
Chengsheng Mao ◽  
Andy Vo ◽  
Janna Ore Nugent ◽  
Seema A Khan ◽  
...  

ABSTRACTGenetic information is becoming more readily available and is increasingly being used to predict patient cancer types as well as their subtypes. Most classification methods thus far utilize somatic mutations as independent features for classification and are limited by study power. To address these limitations, we propose DeepCues, a deep learning model that utilizes convolutional neural networks to derive features from DNA sequencing data for disease classification and relevant gene discovery. Using whole-exome sequencing, germline variants and somatic mutations, including insertions and deletions, are interactively amalgamated as features. In this study, we applied DeepCues to a dataset from TCGA to classify seven different types of major cancers and obtained an overall accuracy of 77.6%. We compared DeepCues to conventional methods and demonstrated a significant overall improvement (p=8.8E-25). Using DeepCues, we found that the top 20 genes associated with breast cancer have a 40% overlap with the top 20 breast cancer genes in the COSMIC database. These data support DeepCues as a novel method to improve the representational resolution of both germline variants and somatic mutations interactively and their power in predicting cancer types, as well the genes involved in each cancer.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 10582-10582
Author(s):  
Timothy A. Yap ◽  
Arya Ashok ◽  
Jessica Stoll ◽  
Anna Ewa Schwarzbach ◽  
Kimberly L. Blackwell ◽  
...  

10582 Background: Up to 10% of all cancers are associated with hereditary cancer syndromes; however, guidelines for germline testing are currently limited to patients and families with specific cancer types (ovarian, breast, prostate, pancreatic, etc.). Although germline alterations have been shown in genes associated with cancers such as bile-duct, head & neck, brain, bladder, esophageal, and lung cancers, genetic testing is not routinely offered (PMID: 28873162). In such cancers, a guidelines-based approach may fail to detect cancer risk variants found by tumor-normal (T/N) matched sequencing. Here, we report the prevalence of incidental germline findings in patients with the aforementioned 6 cancer types and highlight frequently mutated genes by cancer type. Methods: We retrospectively analyzed next-generation sequencing data from de-identified records of 19,630 patients tested using Tempus|xT T/N matched assay. Incidental germline findings (i.e., single nucleotide variants and small insertions/deletions) detected in 50 hereditary cancer genes were determined for: bile duct (n = 466), head & neck (n = 673), esophageal (n = 395), brain (n = 1,391), bladder (n = 810), and lung (n = 5,544), where n = total patients. For comparison, we also included 4 cancer types that frequently undergo germline testing: ovarian (n = 2,042), breast (n = 3,542), prostate (n = 2,146), and pancreatic (n = 2,621). Results: We detected incidental pathogenic/likely pathogenic germline variants (P/LPV) in 6.5% (601/9,279) of patients diagnosed with the 6 selected cancer types lacking hereditary cancer testing guidelines. The highest prevalence of P/LPV was identified in patients with bladder (8%), brain (6.9%), and lung (6.5%) cancers. Frequently mutated genes (Table) include ATM (n = 62), BRCA2 (n = 60), BRCA1 (n = 33), APC (n = 27), and CHEK2 (n = 21). Of note, the Ashkenazi Jewish variant (p.I1307K) was the most frequent mutation in APC. For cancer types where patients frequently undergo germline testing, the rates of incidental germline findings in descending order were ovarian (15%), breast (12%), prostate (9.4%), and pancreatic (8.5%) cancers. Conclusions: In addition to enhanced variant calling, T/N matched sequencing may identify germline variants missed by a guidelines-based approach to testing. The identification of such germline findings may have clinical implications for the patient, as well as at-risk family members, thereby resulting in the opportunity for genetic counseling and risk-stratified intervention.[Table: see text]


2019 ◽  
Vol 3 (1) ◽  
Author(s):  
Jean-Sébastien Milanese ◽  
Chabane Tibiche ◽  
Jinfeng Zou ◽  
Zhigang Meng ◽  
Andre Nantel ◽  
...  

Abstract Germline variants such as BRCA1/2 play an important role in tumorigenesis and clinical outcomes of cancer patients. However, only a small fraction (i.e., 5–10%) of inherited variants has been associated with clinical outcomes (e.g., BRCA1/2, APC, TP53, PTEN and so on). The challenge remains in using these inherited germline variants to predict clinical outcomes of cancer patient population. In an attempt to solve this issue, we applied our recently developed algorithm, eTumorMetastasis, which constructs predictive models, on exome sequencing data to ER+ breast (n = 755) cancer patients. Gene signatures derived from the genes containing functionally germline variants significantly distinguished recurred and non-recurred patients in two ER+ breast cancer independent cohorts (n = 200 and 295, P = 1.4 × 10−3). Furthermore, we compared our results with the widely known Oncotype DX test (i.e., Oncotype DX breast cancer recurrence score) and outperformed prediction for both high- and low-risk groups. Finally, we found that recurred patients possessed a higher rate of germline variants. In addition, the inherited germline variants from these gene signatures were predominately enriched in T cell function, antigen presentation, and cytokine interactions, likely impairing the adaptive and innate immune response thus favoring a pro-tumorigenic environment. Hence, germline genomic information could be used for developing non-invasive genomic tests for predicting patients’ outcomes in breast cancer.


2018 ◽  
Author(s):  
Jean-Sébastien Milanese ◽  
Chabane Tibiche ◽  
Jinfeng Zou ◽  
Zhi Gang Meng ◽  
Andre Nantel ◽  
...  

AbstractGermline genetic variants such as BRCA1/2 play an important role in tumorigenesis and clinical outcomes of cancer patients. However, only a small fraction (i.e., 5-10%) of inherited variants has been associated with clinical outcomes (e.g., BRCA1/2, APC, TP53, PTEN and so on). The challenge remains in using these inherited germline variants to predict clinical outcomes of cancer patient population. In an attempt to solve this issue, we applied our recently developed algorithm, eTumorMetastasis, which constructs predictive models, on exome sequencing data to ER+ breast (n=755) cancer patients. Gene signatures derived from the genes containing functionally germline genetic variants significantly distinguished recurred and non-recurred patients in two ER+ breast cancer independent cohorts (n=200 and 295, P=1.4×10−3). Furthermore, we found that recurred patients possessed a higher rate of germline genetic variants. In addition, the inherited germline variants from these gene signatures were predominately enriched in T cell function, antigen presentation and cytokine interactions, likely impairing the adaptive and innate immune response thus favoring a pro-tumorigenic environment. Hence, germline genomic information could be used for developing non-invasive genomic tests for predicting patients’ outcomes (or drug response) in breast cancer, other cancer types and even other complex diseases.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 1802-1802
Author(s):  
Deepak Singhal ◽  
Christopher N. Hahn ◽  
Cassandra M. Hirsch ◽  
Amilia Wee ◽  
Monika M Kutyna ◽  
...  

Abstract Therapy-related myeloid neoplasm (t-MN) is considered to be a direct stochastic complication of chemotherapy and/or radiotherapy for primary cancer or autoimmune diseases. However, genetic predisposition is reported in 8-12% of sporadic adult cancer patients [Lu et al Nature Communication 2015 and Huang et al Cell 2018]. Similarly, genetic predispositions to t-MN have also been reported in limited single institute studies of small numbers of patients [Churpek et al Cancer 2016]. In this study, we performed comprehensive germline and somatic mutation profiling in t-MN using next generation sequencing. Matched germline material was available for 62/194 (32%) patients. Mutation profiling was correlated with clinical features including family history in 194 patients enrolled in the South Australian MDS (SA-MDS) registry and Cleveland Clinic (CC). An in-house well established filtering pipeline was used for identification of somatic mutations. Only variants with Genome Aggregation Database (gnomAD) minor allele frequency (MAF) of ≤0.01% and variant allele frequency (VAF) of ≥35% were selected for further analysis of germline variants. Variants reported in in the Catalogue of Somatic Mutations in Cancer database and MDS/AML were excluded from further analysis. Variants reported pathogenic in Breast Cancer Information Core (BIC) database and Leiden Open Variation Database (LOVD) were retained. Other variants were included if truncating (nonsense, indels, splice alterations), CADD>20, or predicted deleterious by >4/6 scoring algorithms (GERP>4, PhyloP>2, SIFT, PolyPhen2, MutationTaster and FATHMM). Forty-one (21%) t-MN patients harbored 45 rare (MAF<0.001) and deleterious germline mutations in the Fanconi anaemia (FA) pathway and driver myeloid genes including frameshift indels and splice site alterations in BRCA1, BRCA2, FANCA, PALB2, RAD51, DDX41 and TP53 (Figure 1A-B). The highest number of FA germline variants were seen in BRCA1 and FANCA (n=5 each) followed by BRCA2 (n=4), ERCC4, PALB2 and FANCC (n=2 each). We also identified 14 rare, deleterious myeloid germline variants in 13/194 (6.7%) of t-MN patients. These germline myeloid variants were identified in TP53, DDX41, GATA2 and MET; genes with well-known drivers of myeloid malignancies. Of the five acute lymphoblastic leukaemia patients with t-MN, 2/5 (40%) had rare myeloid germline variants in TP53, GATA2 and KMT2A. The frequency of these germline mutations in our t-MN cohort is higher than in the general population (gnomAD; Table 1) and in patients with primary malignancies such as breast cancer and lymphoma [Lu et al Nature Communication 2015 and Churpek et al Cancer 2016]. Intriguingly, the frequency of germline FA gene mutations (FAMT) in our therapy-related myelodysplastic syndrome (t-MDS) patients is also higher than those reported in primary MDS patients (18% vs 9%, p=0.02) [Przychodzen et al 2018]. Additionally, of those with available family history, 62% of t-MN patients have first and/or second degree relatives with non-skin cancers. Significantly more patients with FA mutation (FAMT) had first and second degree relatives with cancers compared to patients without FA (FAWT) mutations (82% vs 58%; p=0.03). Additionally, chromosomes 3 and 7 abnormalities, as well as monosomal karyotype, were more frequent in FAMT cases compared to FAWT. Similarly, somatic mutations in GATA2 (10% vs 2%; p=0.02), BCOR (13% vs 4%; p=0.03) and IDH2 (10% vs 2%; p=0.02) were more frequent in FAMT compared to FAWT cases (Figure 1C). In summary, we show that at least one in five t-MN patients harbor deleterious germline mutations, and 82% of FAMT patients have a first or second degree relative with cancers. These findings have implication in management of not only t-MN patients but genetic testing for their family members. Disclosures Branford: Qiagen: Honoraria, Membership on an entity's Board of Directors or advisory committees; Novartis: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding, Speakers Bureau; BMS: Honoraria, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Cepheid: Honoraria. Maciejewski:Alexion Pharmaceuticals, Inc.: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Ra Pharmaceuticals, Inc: Consultancy; Alexion Pharmaceuticals, Inc.: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Ra Pharmaceuticals, Inc: Consultancy; Apellis Pharmaceuticals: Consultancy; Apellis Pharmaceuticals: Consultancy. Hiwase:Celgene: Research Funding; Novartis: Research Funding.


2014 ◽  
Vol 112 (1) ◽  
pp. 118-123 ◽  
Author(s):  
Cristian Tomasetti ◽  
Luigi Marchionni ◽  
Martin A. Nowak ◽  
Giovanni Parmigiani ◽  
Bert Vogelstein

Cancer arises through the sequential accumulation of mutations in oncogenes and tumor suppressor genes. However, how many such mutations are required for a normal human cell to progress to an advanced cancer? The best estimates for this number have been provided by mathematical models based on the relation between age and incidence. For example, the classic studies of Nordling [Nordling CO (1953) Br J Cancer 7(1):68–72] and Armitage and Doll [Armitage P, Doll R (1954) Br J Cancer 8(1):1–12] suggest that six or seven sequential mutations are required. Here, we describe a different approach to derive this estimate that combines conventional epidemiologic studies with genome-wide sequencing data: incidence data for different groups of patients with the same cancer type were compared with respect to their somatic mutation rates. In two well-documented cancer types (lung and colon adenocarcinomas), we find that only three sequential mutations are required to develop cancer. This conclusion deepens our understanding of the process of carcinogenesis and has important implications for the design of future cancer genome-sequencing efforts.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Rui Luo ◽  
Weelic Chong ◽  
Qiang Wei ◽  
Zhenchao Zhang ◽  
Chun Wang ◽  
...  

AbstractInflammatory breast cancer (IBC) is the most aggressive form of breast cancer. Although it is a rare subtype, IBC is responsible for roughly 10% of breast cancer deaths. In order to obtain a better understanding of the genomic landscape and intratumor heterogeneity (ITH) in IBC, we conducted whole-exome sequencing of 16 tissue samples (12 tumor and four normal samples) from six hormone-receptor-positive IBC patients, analyzed somatic mutations and copy number aberrations, and inferred subclonal structures to demonstrate ITH. Our results showed that KMT2C was the most frequently mutated gene (42%, 5/12 samples), followed by HECTD1, LAMA3, FLG2, UGT2B4, STK33, BRCA2, ACP4, PIK3CA, and DNAH8 (all nine genes tied at 33% frequency, 4/12 samples). Our data indicated that PTEN and FBXW7 mutations may be considered driver gene mutations for IBC. We identified various subclonal structures and different levels of ITH between IBC patients, and mutations in the genes EIF4G3, IL12RB2, and PDE4B may potentially generate ITH in IBC.


2020 ◽  
Author(s):  
Vu VH Pham ◽  
Lin Liu ◽  
Cameron P Bracken ◽  
Thin Nguyen ◽  
Gregory J Goodall ◽  
...  

AbstractMotivationUnravelling cancer driver genes is important in cancer research. Although computational methods have been developed to identify cancer drivers, most of them detect cancer drivers at population level. However, two patients who have the same cancer type and receive the same treatment may have different outcomes because each patient has a different genome and their disease might be driven by different driver genes. Therefore new methods are being developed for discovering cancer drivers at individual level, but existing personalised methods only focus on coding drivers while microRNAs (miRNAs) have been shown to drive cancer progression as well. Thus, novel methods are required to discover both coding and miRNA cancer drivers at individual level.ResultsWe propose the novel method, pDriver, to discover personalised cancer drivers. pDriver includes two stages: (1) Constructing gene networks for each cancer patient and (2) Discovering cancer drivers for each patient based on the constructed gene networks. To demonstrate the effectiveness of pDriver, we have applied it to five TCGA cancer datasets and compared it with the state-of-the-art methods. The result indicates that pDriver is more effective than other methods. Furthermore, pDriver can also detect miRNA cancer drivers and most of them have been confirmed to be associated with cancer by literature. We further analyse the predicted personalised drivers for breast cancer patients and the result shows that they are significantly enriched in many GO processes and KEGG pathways involved in breast cancer.Availability and implementationpDriver is available at https://github.com/pvvhoang/[email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Sayed Mohammad Ebrahim Sahraeian ◽  
Li Tai Fang ◽  
Marghoob Mohiyuddin ◽  
Huixiao Hong ◽  
Wenming Xiao

AbstractAccurate detection of somatic mutations is challenging but critical to the understanding of cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network based somatic mutation detection approach and demonstrated performance advantages on in silico data. In this study, we used the first comprehensive and well-characterized somatic reference samples from the SEQC-II consortium to investigate best practices for utilizing deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for these reference samples by the consortium, we identified strategies for building robust models on multiple datasets derived from samples representing real scenarios. The proposed strategies achieved high robustness across multiple sequencing technologies such as WGS, WES, AmpliSeq target sequencing for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages (ranging from 10× - 2000×). NeuSomatic significantly outperformed conventional detection approaches in general, as well as in challenging situations such as low coverage, low mutation frequency, DNA damage, and difficult genomic regions.


2012 ◽  
Vol 2012 ◽  
pp. 1-7 ◽  
Author(s):  
Parvin F. Peddi ◽  
Matthew J. Ellis ◽  
Cynthia Ma

Triple negative breast cancer is an aggressive form of breast cancer with limited treatment options and is without proven targeted therapy. Understanding the molecular basis of triple negative breast cancer is crucial for effective new drug development. Recent genomewide gene expression and DNA sequencing studies indicate that this cancer type is composed of a molecularly heterogeneous group of diseases that carry multiple somatic mutations and genomic structural changes. These findings have implications for therapeutic target identification and the design of future clinical trials for this aggressive group of breast cancer.


Sign in / Sign up

Export Citation Format

Share Document