Pan-cancer detection of driver genes at the single-patient resolution

ABSTRACTBackgroundIdentifying the complete repertoire of genes that drive cancer in individual patients is crucial for precision oncology. Most established methods identify driver genes that are recurrently altered across patient cohorts. However, mapping these genes back to patients leaves a sizeable fraction with few or no drivers, hindering our understanding of cancer mechanisms and limiting the choice of therapeutic interventions.ResultsWe present sysSVM2, a machine learning software that integrates cancer genetic alterations with gene systems-level properties to predict drivers in individual patients. Using simulated pan-cancer data, we optimise sysSVM2 for application to any cancer type. We benchmark its performance on real cancer data and validate its applicability to a rare cancer type with few known driver genes. We show that drivers predicted by sysSVM2 have a low false-positive rate, are stable and disrupt well-known cancer-related pathways.ConclusionssysSVM2 can be used to identify driver alterations in patients lacking sufficient canonical drivers or belonging to rare cancer types for which assembling a large enough cohort is challenging, furthering the goals of precision oncology. As resources for the community, we provide the code to implement sysSVM2 and the pre-trained models in all TCGA cancer types (https://github.com/ciccalab/sysSVM2).

Download Full-text

Pan-cancer detection of driver genes at the single-patient resolution

Genome Medicine ◽

10.1186/s13073-021-00830-0 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Joel Nulsen ◽

Hrvoje Misetic ◽

Christopher Yau ◽

Francesca D. Ciccarelli

Keyword(s):

False Positive Rate ◽

Genetic Alterations ◽

Therapeutic Interventions ◽

Precision Oncology ◽

Cancer Type ◽

Rare Cancer ◽

Driver Genes ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

Abstract Background Identifying the complete repertoire of genes that drive cancer in individual patients is crucial for precision oncology. Most established methods identify driver genes that are recurrently altered across patient cohorts. However, mapping these genes back to patients leaves a sizeable fraction with few or no drivers, hindering our understanding of cancer mechanisms and limiting the choice of therapeutic interventions. Results We present sysSVM2, a machine learning software that integrates cancer genetic alterations with gene systems-level properties to predict drivers in individual patients. Using simulated pan-cancer data, we optimise sysSVM2 for application to any cancer type. We benchmark its performance on real cancer data and validate its applicability to a rare cancer type with few known driver genes. We show that drivers predicted by sysSVM2 have a low false-positive rate, are stable and disrupt well-known cancer-related pathways. Conclusions sysSVM2 can be used to identify driver alterations in patients lacking sufficient canonical drivers or belonging to rare cancer types for which assembling a large enough cohort is challenging, furthering the goals of precision oncology. As resources for the community, we provide the code to implement sysSVM2 and the pre-trained models in all TCGA cancer types (https://github.com/ciccalab/sysSVM2).

Download Full-text

Identification of pan-cancer Ras pathway activation with deep learning

Briefings in Bioinformatics ◽

10.1093/bib/bbaa258 ◽

2020 ◽

Author(s):

Xiangtao Li ◽

Shaochuan Li ◽

Yunhe Wang ◽

Shixiong Zhang ◽

Ka-Chun Wong

Keyword(s):

Deep Learning ◽

Superior Performance ◽

Recent Attempt ◽

Precision Oncology ◽

Pathway Activity ◽

Ras Pathway ◽

Cancer Data ◽

Pathway Activation ◽

Cancer Types ◽

Pan Cancer

Abstract The identification of hidden responders is often an essential challenge in precision oncology. A recent attempt based on machine learning has been proposed for classifying aberrant pathway activity from multiomic cancer data. However, we note several critical limitations there, such as high-dimensionality, data sparsity and model performance. Given the central importance and broad impact of precision oncology, we propose nature-inspired deep Ras activation pan-cancer (NatDRAP), a deep neural network (DNN) model, to address those restrictions for the identification of hidden responders. In this study, we develop the nature-inspired deep learning model that integrates bulk RNA sequencing, copy number and mutation data from PanCanAltas to detect pan-cancer Ras pathway activation. In NatDRAP, we propose to synergize the nature-inspired artificial bee colony algorithm with different gradient-based optimizers in one framework for optimizing DNNs in a collaborative manner. Multiple experiments were conducted on 33 different cancer types across PanCanAtlas. The experimental results demonstrate that the proposed NatDRAP can provide superior performance over other benchmark methods with strong robustness towards diagnosing RAS aberrant pathway activity across different cancer types. In addition, gene ontology enrichment and pathological analysis are conducted to reveal novel insights into the RAS aberrant pathway activity identification and characterization. NatDRAP is written in Python and available at https://github.com/lixt314/NatDRAP1.

Download Full-text

From Genetic Alterations to Tumor Microenvironment: The Ariadne’s String in Pancreatic Cancer

Cells ◽

10.3390/cells9020309 ◽

2020 ◽

Vol 9 (2) ◽

pp. 309 ◽

Cited By ~ 3

Author(s):

Chiara Bazzichetto ◽

Fabiana Conciatori ◽

Claudio Luchini ◽

Francesca Simionato ◽

Raffaela Santoro ◽

...

Keyword(s):

Pancreatic Cancer ◽

Tumor Microenvironment ◽

Current Knowledge ◽

Myeloid Cells ◽

Genetic Alterations ◽

Molecular Characteristics ◽

Precision Oncology ◽

Cancer Type ◽

Tumor Type ◽

Driver Genes

The threatening notoriety of pancreatic cancer mainly arises from its negligible early diagnosis, highly aggressive progression, failure of conventional therapeutic options and consequent very poor prognosis. The most important driver genes of pancreatic cancer are the oncogene KRAS and the tumor suppressors TP53, CDKN2A, and SMAD4. Although the presence of few drivers, several signaling pathways are involved in the oncogenesis of this cancer type, some of them with promising targets for precision oncology. Pancreatic cancer is recognized as one of immunosuppressive phenotype cancer: it is characterized by a fibrotic-desmoplastic stroma, in which there is an intensive cross-talk between several cellular (e.g., fibroblasts, myeloid cells, lymphocytes, endothelial, and myeloid cells) and acellular (collagen, fibronectin, and soluble factors) components. In this review; we aim to describe the current knowledge of the genetic/biological landscape of pancreatic cancer and the composition of its tumor microenvironment; in order to better direct in the intrinsic labyrinth of this complex tumor type. Indeed; disentangling the genetic and molecular characteristics of cancer cells and the environment in which they evolve may represent the crucial step towards more effective therapeutic strategies

Download Full-text

The Cancer Genomic Atlas – “TO CONQUER CANCER”

International Journal of Molecular and Immuno Oncology ◽

10.25259/ijmio_28_2020 ◽

2020 ◽

Vol 0 ◽

pp. 1-6

Author(s):

Sai Sri Kavya Kadali ◽

Rachna Gowlikar ◽

Syeda Nooreen Fatima

Keyword(s):

Genetic Basis ◽

Data Repository ◽

Rare Cancer ◽

Cancer Data ◽

Data Collection Process ◽

Genomics And Proteomics ◽

Prevention Studies ◽

Cancer Types ◽

Pan Cancer

The Cancer Genomic Atlas (TCGA) is a publicly accessible cancer data repository and tool that allows us to understand the molecular basis of cancer through the application of genomics and proteomics. So far, researchers have been able to diagnose 33 cancer types including 10 rare cancer types. The key features of TCGA are to make the data collection process publicly accessible for the better understanding of the molecular and genetic basis of cancer and its mechanism of action along with its prevention. Studies on different cancer types along with comprehensive pan cancer analysis have expanded the understanding and purpose of TCGA. Ever since its’ conceptualization, its’ high-throughput approach has provided a platform for the identification of genes and pathways involved in cancers and accurate classification of cancers.

Download Full-text

MEXCOWalk: Mutual Exclusion and Coverage Based Random Walk to Identify Cancer Modules

10.1101/547653 ◽

2019 ◽

Author(s):

Rafsan Ahmed ◽

Ilyes Baali ◽

Cesim Erten ◽

Evis Hoxha ◽

Hilal Kazan

Keyword(s):

Random Walk ◽

Mutual Exclusion ◽

Risk Scores ◽

Cancer Genes ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

AbstractMotivationGenomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein-protein interaction networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules.ResultsWe present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein-protein interactions, mutual exclusion, and coverage to identify cancer driver modules. MEXCOWalk outperforms several state-of-the-art computational methods on TCGA pan-cancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples, and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code, and useful scripts are available at:https://github.com/abu-compbio/[email protected]

Download Full-text

The landscape of genomic alterations across childhood cancers

Nature ◽

10.1038/nature25480 ◽

2018 ◽

Vol 555 (7696) ◽

pp. 321-327 ◽

Cited By ~ 363

Author(s):

Susanne N. Gröbner ◽

◽

Barbara C. Worst ◽

Joachim Weischenfeldt ◽

Ivo Buchhalter ◽

...

Keyword(s):

Cancer Biology ◽

Copy Number Variant ◽

Genetic Alterations ◽

Childhood Cancers ◽

Mutational Signatures ◽

Driver Genes ◽

Mutation Status ◽

Cancer Driver ◽

Cancer Types ◽

Pan Cancer

Abstract Pan-cancer analyses that examine commonalities and differences among various cancer types have emerged as a powerful way to obtain novel insights into cancer biology. Here we present a comprehensive analysis of genetic alterations in a pan-cancer cohort including 961 tumours from children, adolescents, and young adults, comprising 24 distinct molecular types of cancer. Using a standardized workflow, we identified marked differences in terms of mutation frequency and significantly mutated genes in comparison to previously analysed adult cancers. Genetic alterations in 149 putative cancer driver genes separate the tumours into two classes: small mutation and structural/copy-number variant (correlating with germline variants). Structural variants, hyperdiploidy, and chromothripsis are linked to TP53 mutation status and mutational signatures. Our data suggest that 7–8% of the children in this cohort carry an unambiguous predisposing germline variant and that nearly 50% of paediatric neoplasms harbour a potentially druggable event, which is highly relevant for the design of future clinical trials.

Download Full-text

A novel unsupervised learning model for detecting driver genes from pan-cancer data through matrix tri-factorization framework with pairwise similarities constraints

Neurocomputing ◽

10.1016/j.neucom.2018.03.026 ◽

2018 ◽

Vol 296 ◽

pp. 64-73 ◽

Cited By ~ 7

Author(s):

Jianing Xi ◽

Ao Li ◽

Minghui Wang

Keyword(s):

Unsupervised Learning ◽

Learning Model ◽

Driver Genes ◽

Cancer Data ◽

Pan Cancer

Download Full-text

A computational method for prioritizing targeted therapies in precision oncology: performance analysis in the SHIVA01 trial

npj Precision Oncology ◽

10.1038/s41698-021-00191-2 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Istvan Petak ◽

Maud Kamal ◽

Anna Dirner ◽

Ivan Bieche ◽

Robert Doczi ◽

...

Keyword(s):

Clinical Benefit ◽

Progressive Disease ◽

Genetic Alterations ◽

Outcome Data ◽

Computational Method ◽

Molecular Profile ◽

Precision Oncology ◽

Driver Genes ◽

Oncology Clinical Trial ◽

Molecularly Targeted Agents

AbstractPrecision oncology is currently based on pairing molecularly targeted agents (MTA) to predefined single driver genes or biomarkers. Each tumor harbors a combination of a large number of potential genetic alterations of multiple driver genes in a complex system that limits the potential of this approach. We have developed an artificial intelligence (AI)-assisted computational method, the digital drug-assignment (DDA) system, to prioritize potential MTAs for each cancer patient based on the complex individual molecular profile of their tumor. We analyzed the clinical benefit of the DDA system on the molecular and clinical outcome data of patients treated in the SHIVA01 precision oncology clinical trial with MTAs matched to individual genetic alterations or biomarkers of their tumor. We found that the DDA score assigned to MTAs was significantly higher in patients experiencing disease control than in patients with progressive disease (1523 versus 580, P = 0.037). The median PFS was also significantly longer in patients receiving MTAs with high (1000+ <) than with low (<0) DDA scores (3.95 versus 1.95 months, P = 0.044). Our results indicate that AI-based systems, like DDA, are promising new tools for oncologists to improve the clinical benefit of precision oncology.

Download Full-text

Exploration of the MSI landscape in Chinese pan-cancer patient by Next-Generation Sequencing.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e14576 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e14576-e14576

Author(s):

Xinlu Liu ◽

Jiasheng Xu ◽

Jian Sun ◽

Deng Wei ◽

Xinsheng Zhang ◽

...

Keyword(s):

Colorectal Cancer ◽

Correlation Analysis ◽

Cancer Patients ◽

Cancer Type ◽

Multiple Tumor ◽

Treatment Plans ◽

Cancer Types ◽

Chinese Cancer Patients ◽

Colorectal Cancer Patients ◽

Pan Cancer

e14576 Background: Clinically, MSI had been used as an important molecular marker for the prognosis of colorectal cancer and other solid tumors and the formulation of adjuvant treatment plans, and it had been used to assist in the screening of Lynch syndrome. However, there were currently few reports on the incidence of MSI-H in Chinese pan-cancer patients. This study described the occurrence of MSI in a large multi-center pan-cancer cohort in China, and explored the correlation between MSI and patients' TMB, age, PD-L1 expression and other indicators. Methods: The study included 8361 patients with 8 cancer types from multiple tumor centers. Use immunohistochemistry to detect the expression of MMR protein (MLH1, MSH2, MSH6 and PMS2) in patients with various cancer types to determine the MSI status and detect the expression of PD-L1 in patients. Through NGS technology, 831 genes of 8361 Chinese cancer patients were sequenced and the tumor mutation load of the patients was calculated. The MSI mutations of patients in 8 cancer types were analyzed and the correlation between MSI mutations of patients and the patient's age, TMB and PD-L1 expression was analyzed. Results: The test results showed that MSI patients accounted for 1.66% of pan-cancers. Among them, MSI-H patients accounted for the highest proportion in intestinal cancer, reaching 7.2%. The correlation analysis between MSI and TMB was performed on patients of various cancer types. The results showed that: in each cancer type, MSI-H patients had TMB greater than 10, and 26.83% of MSI-H patients had TMB greater than 100 in colorectal cancer patients. The result of correlation analysis showed that there was no significant correlation between the patient's age and the risk of MSI mutation ( P> 0.05). In addition to PAAD and LUAD, the expression of PD-L1 in MSI-H patients was higher than that in MSS patients in other cancer types( P< 0.05). The correlation analysis between PD-L1 expression and TMB in patients found that in colorectal cancer, the higher the expression of PD-L1, the higher the patient's TMB ( P< 0.05). Conclusions: In this study, we explored the incidence of MSI-H in pan-cancer patients in China and found that the TMB was greater than 10 in patients with MSI-H. Compared with MSS patients, MSI-H patients have higher PD-L1 expression, and the higher the PD-L1 expression in colorectal cancer, the higher the TMB value of patients.

Download Full-text

Multi-omic data helps improve prediction of personalised tumor suppressors and oncogenes

10.1101/2022.01.13.476163 ◽

2022 ◽

Author(s):

Malvika Sudhakar ◽

Raghunathan Rengaswamy ◽

Karthik Raman

Keyword(s):

Tumour Progression ◽

Suppressor Gene ◽

Tumour Suppressor Gene ◽

Driver Mutations ◽

Cancer Type ◽

Expression Data ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Types ◽

Omic Data

The progression of tumorigenesis starts with a few mutational and structural driver events in the cell. Various cohort-based computational tools exist to identify driver genes but require a large number of samples to produce reliable results. Many studies use different methods to identify driver mutations/genes from mutations that have no impact on tumour progression; however, a small fraction of patients show no mutational events in any known driver genes. Current unsupervised methods map somatic and expression data onto a network to identify the perturbation in the network. Our method is the first machine learning model to classify genes as tumour suppressor gene (TSG), oncogene (OG) or neutral, thus assigning the functional impact of the gene in the patient. In this study, we develop a multi-omic approach, PIVOT (Personalised Identification of driVer OGs and TSGs), to train on experimentally or computationally validated mutational and structural driver events. Given the lack of any gold standards for the identification of personalised driver genes, we label the data using four strategies and, based on classification metrics, show gene-based labelling strategies perform best. We build different models using SNV, RNA, and multi-omic features to be used based on the data available. Our models trained on multi-omic data improved predictions compared to mutation and expression data, achieving an accuracy >0.99 for BRCA, LUAD and COAD datasets. We show network and expression-based features contribute the most to PIVOT. Our predictions on BRCA, COAD and LUAD cancer types reveal commonly altered genes such as TP53, and PIK3CA, which are predicted drivers for multiple cancer types. Along with known driver genes, our models also identify new driver genes such as PRKCA, SOX9 and PSMD4. Our multi-omic model labels both CNV and mutations with a more considerable contribution by CNV alterations. While predicting labels for genes mutated in multiple samples, we also label rare driver events occurring in as few as one sample. We also identify genes with dual roles within the same cancer type. Overall, PIVOT labels personalised driver genes as TSGs and OGs and also identifies rare driver genes. PIVOT is available at https://github.com/RamanLab/PIVOT.

Download Full-text