scholarly journals PersonaDrive: A Method for the Identification and Prioritization of Personalized Cancer Drivers

2021 ◽  
Author(s):  
Cesim Erten ◽  
Aissa Houdjedj ◽  
Hilal Kazan ◽  
Ahmed Amine Taleb Bahmed

AbstractMotivationA major challenge in cancer genomics is to distinguish the driver mutations that are causally linked to cancer from passenger mutations that do not contribute to cancer development. The majority of existing methods provide a single driver gene list for the entire cohort of patients. However, since mutation profiles of patients from the same cancer type show a high degree of heterogeneity, a more ideal approach is to identify patient-specific drivers.ResultsWe propose a novel method that integrates genomic data, biological pathways, and protein connectivity information for personalized identification of driver genes. The method is formulated on a personalized bipartite graph for each patient. Our approach provides a personalized ranking of the mutated genes of a patient based on the sum of weighted ‘pairwise pathway coverage’ scores across all the patients, where appropriate pairwise patient similarity scores are used as weights to normalize these coverage scores. We compare our method against three state-of-the-art patient-specific cancer gene prioritization methods. The comparisons are with respect to a novel evaluation method that takes into account the personalized nature of the problem. We show that our approach outperforms the existing alternatives for both the TCGA and the cell-line data. Additionally, we show that the KEGG/Reactome pathways enriched in our ranked genes and those that are enriched in cell lines’ reference sets overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods. Our findings can provide valuable information towards the development of personalized treatments and therapies.AvailabilityAll the code and necessary datasets are available at https://github.com/abu-compbio/[email protected] or [email protected]

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Cesim Erten ◽  
Aissa Houdjedj ◽  
Hilal Kazan

Abstract Background Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. Results We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. Conclusions Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Xiaobao Dong ◽  
Dandan Huang ◽  
Xianfu Yi ◽  
Shijie Zhang ◽  
Zhao Wang ◽  
...  

AbstractMutation-specific effects of cancer driver genes influence drug responses and the success of clinical trials. We reasoned that these effects could unbalance the distribution of each mutation across different cancer types, as a result, the cancer preference can be used to distinguish the effects of the causal mutation. Here, we developed a network-based framework to systematically measure cancer diversity for each driver mutation. We found that half of the driver genes harbor cancer type-specific and pancancer mutations simultaneously, suggesting that the pervasive functional heterogeneity of the mutations from even the same driver gene. We further demonstrated that the specificity of the mutations could influence patient drug responses. Moreover, we observed that diversity was generally increased in advanced tumors. Finally, we scanned potentially novel cancer driver genes based on the diversity spectrum. Diversity spectrum analysis provides a new approach to define driver mutations and optimize off-label clinical trials.


2020 ◽  
Author(s):  
Cesim Erten ◽  
Aissa Houdjedj ◽  
Hilal Kazan

AbstractBackgroundRecent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. Results: We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. Conclusions: Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.


2022 ◽  
Author(s):  
Malvika Sudhakar ◽  
Raghunathan Rengaswamy ◽  
Karthik Raman

The progression of tumorigenesis starts with a few mutational and structural driver events in the cell. Various cohort-based computational tools exist to identify driver genes but require a large number of samples to produce reliable results. Many studies use different methods to identify driver mutations/genes from mutations that have no impact on tumour progression; however, a small fraction of patients show no mutational events in any known driver genes. Current unsupervised methods map somatic and expression data onto a network to identify the perturbation in the network. Our method is the first machine learning model to classify genes as tumour suppressor gene (TSG), oncogene (OG) or neutral, thus assigning the functional impact of the gene in the patient. In this study, we develop a multi-omic approach, PIVOT (Personalised Identification of driVer OGs and TSGs), to train on experimentally or computationally validated mutational and structural driver events. Given the lack of any gold standards for the identification of personalised driver genes, we label the data using four strategies and, based on classification metrics, show gene-based labelling strategies perform best. We build different models using SNV, RNA, and multi-omic features to be used based on the data available. Our models trained on multi-omic data improved predictions compared to mutation and expression data, achieving an accuracy >0.99 for BRCA, LUAD and COAD datasets. We show network and expression-based features contribute the most to PIVOT. Our predictions on BRCA, COAD and LUAD cancer types reveal commonly altered genes such as TP53, and PIK3CA, which are predicted drivers for multiple cancer types. Along with known driver genes, our models also identify new driver genes such as PRKCA, SOX9 and PSMD4. Our multi-omic model labels both CNV and mutations with a more considerable contribution by CNV alterations. While predicting labels for genes mutated in multiple samples, we also label rare driver events occurring in as few as one sample. We also identify genes with dual roles within the same cancer type. Overall, PIVOT labels personalised driver genes as TSGs and OGs and also identifies rare driver genes. PIVOT is available at https://github.com/RamanLab/PIVOT.


Author(s):  
Birgit Assmus ◽  
Sebastian Cremer ◽  
Klara Kirschbaum ◽  
David Culmann ◽  
Katharina Kiefer ◽  
...  

Abstract Aims Somatic mutations of the epigenetic regulators DNMT3A and TET2 causing clonal expansion of haematopoietic cells (clonal haematopoiesis; CH) were shown to be associated with poor prognosis in chronic ischaemic heart failure (CHF). The aim of our analysis was to define a threshold of variant allele frequency (VAF) for the prognostic significance of CH in CHF. Methods and results We analysed bone marrow and peripheral blood-derived cells from 419 patients with CHF by error-corrected amplicon sequencing. Cut-off VAFs were optimized by maximizing sensitivity plus specificity from a time-dependent receiver operating characteristic (ROC) curve analysis from censored data. 56.2% of patients were carriers of a DNMT3A- (N = 173) or a TET2- (N = 113) mutation with a VAF >0.5%, with 59 patients harbouring mutations in both genes. Survival ROC analyses revealed an optimized cut-off value of 0.73% for TET2- and 1.15% for DNMT3A-CH-driver mutations. Five-year-mortality was 18% in patients without any detected DNMT3A- or TET2 mutation (VAF < 0.5%), 29% with only one DNMT3A- or TET2-CH-driver mutations above the respective cut-off level and 42% in patients harbouring both DNMT3A- and TET2-CH-driver mutations above the respective cut-off levels. In carriers of a DNMT3A mutation with VAF ≥ 1.15%, 5-year mortality was 31%, compared with 18% mortality in those with VAF < 1.15% (P = 0.048). Likewise, in patients with TET2 mutations, 5-year mortality was 32% with VAF ≥ 0.73%, compared with 19% mortality with VAF < 0.73% (P = 0.029). Conclusion The present study defines novel threshold levels for clone size caused by acquired somatic mutations in the CH-driver genes DNMT3A and TET2 that are associated with worse outcome in patients with CHF.


Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 1952-1952 ◽  
Author(s):  
Dan A. Landau ◽  
Chip Stewart ◽  
Johannes G. Reiter ◽  
Michael Lawrence ◽  
Carrie Sougnez ◽  
...  

Abstract Unbiased high-throughput massively parallel sequencing methods have transformed the process of discovery of novel putative driver gene mutations in cancer. In chronic lymphocytic leukemia (CLL), these methods have yielded several unexpected findings, including the driver genes SF3B1, NOTCH1 and POT1. Recent analysis, utilizing down-sampling of existing datasets, has shown that the discovery process of putative drivers is far from complete across cancer. In CLL, while driver gene mutations affecting >10% of patients were efficiently discovered with previously published CLL cohorts of up to 160 samples subjected to whole exome sequencing (WES), this sample size has only 0.78 power to detect drivers affecting 5% of patients, and only 0.12 power for drivers affecting 2% of patients. These calculations emphasize the need to apply unbiased WES to larger patient cohorts. To this end, we performed a combined analysis of CLL WES data joining together our previously published cohort of 159 CLLs with data from 103 CLLs collected by the International Cancer Genome Consortium (ICGC). The raw sequencing reads from these 262 primary tumor samples (102 CLL with unmutated IGHV, 147 with mutated IGHV, 13 with unknown IGHV status) were processed together and aligned to the hg19 reference genome. Somatic single nucleotide variations (sSNVs) and indels were detected using MuTect. Subsequently, inference of recurrently mutated genes was performed using the MutSig algorithm. This method combined several characteristics such as the overall mutation rate per sample, the gene specific background mutation rate, non-synonymous/synonymous ratio and mutation clustering to detect genes that are affected by mutations more than expected by chance. This analysis identified 40 recurrently mutated genes in this cohort. This included 22 of 25 previously identified recurrently mutated genes in CLL. In addition, 18 novel candidate CLL drivers were identified, mostly affecting 1-2% of patients. The novel candidates included two histone proteins HIST1H1D and HIST1H1C, in addition to the previously identified HIST1H1E. Another was IKZF3, affected by a recurrent sSNV resulting in a p.L162R change in its DNA binding domain, in close proximity to a region recently identified as critical for lenalidomide resistance in multiple myeloma (MM). An additional recurrently mutated gene was nuclear RNA export factor 1 (NXF1), which along with previously known recurrently mutated genes (SF3B1, XPO1, DDX3X), highlights the importance of RNA processing to CLL biology. Finally, this search for putative CLL driver genes also identified ASXL1 and TRAF3, already characterized as drivers in acute myeloid leukemia and MM, respectively. Of the 59 of 262 samples for which RNA-seq data were available, 76% of the identified driver mutations were detected and thereby validated. Validation using RNAseq detection of driver mutations and targeted sequencing within the entire cohort are ongoing. The larger size of our cohort enabled the separate application of the somatic mutation discovery process to samples with mutated or unmutated IGHV. Among the 147 samples with mutated IGHV, only 5 driver genes (TP53, SF3B1, MYD88, CHD2, RANBP2) retained significance. In contrast, analysis of the 102 IGHV unmutated samples revealed a distinct and more diverse pattern of recurrently mutated genes (lacking MYD88 and CHD2, and including NOTCH1, RPS15, POT1, NRAS, EGR2, BRAF, MED12, XPO1, BCOR, IKZF3, MAP2K1, FBXW7 and KRAS). This extended cohort also allowed for better resolution of the clinical impact of those genetic variants with greater than 4% prevalence in the cohort. For example, samples with POT1 mutations were found to be associated with shorter time from sample to therapy compared with those with wild-type POT1 (P= 0.02). Our study demonstrates that with larger cohort size, we can effectively detect putative driver genes with lower prevalence, but which may nonetheless have important biological and clinical impact. Moreover, our interrogation shows that subset analysis can reveal distinct driver patterns in different disease subsets. In particular, the marked clinical difference between CLLs with mutated and unmutated IGHV may reflect the higher likelihood of the latter group to harbor a broader spectrum of driver mutations with a more complex pattern of co-occurrence. Disclosures Brown: Sanofi, Onyx, Vertex, Novartis, Boehringer, GSK, Roche/Genentech, Emergent, Morphosys, Celgene, Janssen, Pharmacyclics, Gilead: Consultancy.


2018 ◽  
Author(s):  
Lin Jiang ◽  
Jingjing Zheng ◽  
Johnny Sheung Him Kwan ◽  
Sheng Dai ◽  
Cong Li ◽  
...  

AbstractGenomic identification of driver mutations and genes in cancer cells are critical for precision medicine. Due to difficulty in modeling distribution of background mutations, existing statistical methods are often underpowered to discriminate driver genes from passenger genes. Here we propose a novel statistical approach, weighted iterative zero-truncated negative-binomial regression (WITER), to detect cancer-driver genes showing an excess of somatic mutations. By solving the problem of inaccurately modeling background mutations, this approach works even in small or moderate samples. Compared to alternative methods, it detected more significant and cancer-consensus genes in all tested cancers. Applying this approach, we estimated 178 driver genes in 26 different cancers types. In silico validation confirmed 90.5% of predicted genes as likely known drivers and 7 genes unique for individual cancers as likely new drivers. The technical advances of WITER enable the detection of driver genes in TCGA datasets as small as 30 subjects, rescuing more genes missed by alternative tools.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0242780
Author(s):  
Houriiyah Tegally ◽  
Kevin H. Kensler ◽  
Zahra Mungloo-Dilmohamud ◽  
Anisah W. Ghoorah ◽  
Timothy R. Rebbeck ◽  
...  

As the genomic profile across cancers varies from person to person, patient prognosis and treatment may differ based on the mutational signature of each tumour. Thus, it is critical to understand genomic drivers of cancer and identify potential mutational commonalities across tumors originating at diverse anatomical sites. Large-scale cancer genomics initiatives, such as TCGA, ICGC and GENIE have enabled the analysis of thousands of tumour genomes. Our goal was to identify new cancer-causing mutations that may be common across tumour sites using mutational and gene expression profiles. Genomic and transcriptomic data from breast, ovarian, and prostate cancers were aggregated and analysed using differential gene expression methods to identify the effect of specific mutations on the expression of multiple genes. Mutated genes associated with the most differentially expressed genes were considered to be novel candidates for driver mutations, and were validated through literature mining, pathway analysis and clinical data investigation. Our driver selection method successfully identified 116 probable novel cancer-causing genes, with 4 discovered in patients having no alterations in any known driver genes: MXRA5, OBSCN, RYR1, and TG. The candidate genes previously not officially classified as cancer-causing showed enrichment in cancer pathways and in cancer diseases. They also matched expectations pertaining to properties of cancer genes, for instance, showing larger gene and protein lengths, and having mutation patterns suggesting oncogenic or tumor suppressor properties. Our approach allows for the identification of novel putative driver genes that are common across cancer sites using an unbiased approach without any a priori knowledge on pathways or gene interactions and is therefore an agnostic approach to the identification of putative common driver genes acting at multiple cancer sites.


2019 ◽  
Author(s):  
Gal Dinstag ◽  
Ron Shamir

Abstract Motivation Evolution of cancer is driven by few somatic mutations that disrupt cellular processes, causing abnormal proliferation and tumor development, while most somatic mutations have no impact on progression. Distinguishing those mutated genes that drive tumorigenesis in a patient is a primary goal in cancer therapy: Knowledge of these genes and the pathways on which they operate can illuminate disease mechanisms and indicate potential therapies and drug targets. Current research focuses mainly on cohort-level driver gene identification, but patient-specific driver gene identification remains a challenge. Methods We developed a new algorithm for patient-specific ranking of driver genes. The algorithm, called PRODIGY, analyzes the expression and mutation profiles of the patient along with data on known pathways and protein-protein interactions. Prodigy quantifies the impact of each mutated gene on every deregulated pathway using the prize collecting Steiner tree model. Mutated genes are ranked by their aggregated impact on all deregulated pathways. Results In testing on five TCGA cancer cohorts spanning >2500 patients and comparison to validated driver genes, Prodigy outperformed extant methods and ranking based on network centrality measures. Our results pinpoint the pleiotropic effect of driver genes and show that Prodigy is capable of identifying even very rare drivers. Hence, Prodigy takes a step further towards personalized medicine and treatment. Availability The Prodigy R package is available at: https://github.com/Shamir-Lab/PRODIGY. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 14 (06) ◽  
pp. 1650031 ◽  
Author(s):  
Ana B. Pavel ◽  
Cristian I. Vasile

Cancer is a complex and heterogeneous genetic disease. Different mutations and dysregulated molecular mechanisms alter the pathways that lead to cell proliferation. In this paper, we explore a method which classifies genes into oncogenes (ONGs) and tumor suppressors. We optimize this method to identify specific (ONGs) and tumor suppressors for breast cancer, lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC) and colon adenocarcinoma (COAD), using data from the cancer genome atlas (TCGA). A set of genes were previously classified as ONGs and tumor suppressors across multiple cancer types (Science 2013). Each gene was assigned an ONG score and a tumor suppressor score based on the frequency of its driver mutations across all variants from the catalogue of somatic mutations in cancer (COSMIC). We evaluate and optimize this approach within different cancer types from TCGA. We are able to determine known driver genes for each of the four cancer types. After establishing the baseline parameters for each cancer type, we identify new driver genes for each cancer type, and the molecular pathways that are highly affected by them. Our methodology is general and can be applied to different cancer subtypes to identify specific driver genes and improve personalized therapy.


Sign in / Sign up

Export Citation Format

Share Document