Integrative pathway enrichment analysis of multivariate omics data

Mapping Intimacies ◽

10.1101/399113 ◽

2018 ◽

Cited By ~ 1

Author(s):

Marta Paczkowska ◽

Jonathan Barenboim ◽

Nardnisa Sintupisut ◽

Natalie C. Fox ◽

Helen Zhu ◽

...

Keyword(s):

Enrichment Analysis ◽

Data Interpretation ◽

Pathway Enrichment Analysis ◽

Driver Genes ◽

Long Tail ◽

Cancer Subtypes ◽

Cancer Driver ◽

Cancer Genomes ◽

Health And Disease ◽

Statistical Data Fusion

ABSTRACTMulti-omics datasets quantify complementary aspects of molecular biology and thus pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple omics datasets using a statistical data fusion approach, rationalizes contributing evidence and highlights associated genes. We demonstrate its utility by analyzing coding and non-coding mutations from 2,583 whole cancer genomes, revealing frequently mutated hallmark pathways and a long tail of known and putative cancer driver genes. We also studied prognostic molecular pathways in breast cancer subtypes by integrating genomic and transcriptomic features of tumors and tumor-adjacent cells and found significant associations with immune response processes and anti-apoptotic signaling pathways. ActivePathways is a versatile method that improves systems-level understanding of cellular organization in health and disease through integration of multiple molecular datasets and pathway annotations.

driveR: a novel method for prioritizing cancer driver genes using somatic genomics data

BMC Bioinformatics ◽

10.1186/s12859-021-04203-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ege Ülgen ◽

O. Uğur Sezerman

Keyword(s):

Biological Knowledge ◽

Driver Gene ◽

Driver Genes ◽

Cancer Driver ◽

Prior Biological Knowledge ◽

Wilcoxon Rank Sum Test ◽

Cancer Genomes ◽

Novel Method ◽

Cancer Driver Genes ◽

Batch Analysis

Abstract Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR.

DriverSubNet: A Novel Algorithm for Identifying Cancer Driver Genes by Subnetwork Enrichment Analysis

Frontiers in Genetics ◽

10.3389/fgene.2020.607798 ◽

2021 ◽

Vol 11 ◽

Author(s):

Di Zhang ◽

Yannan Bin

Keyword(s):

Gene Expression ◽

Cancer Patients ◽

Therapeutic Targets ◽

Enrichment Analysis ◽

Prognostic Biomarkers ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Driver Genes ◽

Critical Challenge ◽

Novel Algorithm

Identification of driver genes from mass non-functional passenger genes in cancers is still a critical challenge. Here, an effective and no parameter algorithm, named DriverSubNet, is presented for detecting driver genes by effectively mining the mutation and gene expression information based on subnetwork enrichment analysis. Compared with the existing classic methods, DriverSubNet can rank driver genes and filter out passenger genes more efficiently in terms of precision, recall, and F1 score, as indicated by the analysis of four cancer datasets. The method recovered about 50% more known cancer driver genes in the top 100 detected genes than those found in other algorithms. Intriguingly, DriverSubNet was able to find these unknown cancer driver genes which could act as potential therapeutic targets and useful prognostic biomarkers for cancer patients. Therefore, DriverSubNet may act as a useful tool for the identification of driver genes by subnetwork enrichment analysis.

driveR: A Novel Method for Prioritizing Cancer Driver Genes Using Somatic Genomics Data

10.1101/2020.11.10.376707 ◽

2020 ◽

Author(s):

Ege Ülgen ◽

O. Uğur Sezerman

Keyword(s):

Genomic Data ◽

Biological Knowledge ◽

Driver Gene ◽

Driver Genes ◽

Cancer Driver ◽

Prior Biological Knowledge ◽

Cancer Genomes ◽

Novel Method ◽

Cancer Drivers ◽

Cancer Driver Genes

AbstractCancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomic data. However, methods for personalized analysis of driver genes are underdeveloped.In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomic data, called driveR. Combining genomic information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model.Testing on 28 different datasets, this study demonstrates that driveR performs adequately, outperforms existing approaches, and is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes. driveR is available on CRAN: https://cran.r-project.org/package=driveR.

The Integrative Method Based on Module-Network for Identifying Driver Genes in Cancer Subtypes

10.20944/preprints201712.0084.v1 ◽

2017 ◽

Author(s):

Xinguo Lu ◽

Xing Li ◽

Xin Qian ◽

Qiumai Miao ◽

Shaoliang Peng

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Breast Cancer Subtype ◽

Enrichment Analysis ◽

Potential Candidate ◽

Statistical Machine Learning ◽

Driver Genes ◽

Module Network ◽

Cancer Subtypes ◽

Integrative Method

With advances in next-generation sequencing(NGS) technologies, large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer mechanism is to identify the driver genes from the mutation genes by analyzing and integrating multi-types genomics data. Breast cancer is known as a heterogeneous disease. The identification of subtype-specific driver genes is critical to guide the diagnosis, assessment of prognosis and treatment of breast cancer. We developed an integrated frame based on gene expression profilings and copy number variation(CNV) data to identify breast cancer subtype-specific driver genes. In this frame, we employed statistical machine-learning method to select gene subsets and utilized an module-network analysis method to identify potential candidate driver genes. The final subtype-specific driver genes were acquired by paired-wise comparison in subtypes. To validate specificity of the driver genes, the gene expression data of these genes were applied to classify the patient samples with 10-fold cross validation and the enrichment analysis were also conducted on the identified driver genes. The experimental results show that the proposed integrative method can identify the potential driver genes and the classifier with these genes acquired better performance than with genes identified by other methods.

ModulOmics: Integrating Multi-Omics Data to Identify Cancer Driver Modules

10.1101/288399 ◽

2018 ◽

Cited By ~ 1

Author(s):

Dana Silverbush ◽

Simona Cristea ◽

Gali Yanovich ◽

Tamar Geiger ◽

Niko Beerenwinkel ◽

...

Keyword(s):

Cancer Progression ◽

Protein Interactions ◽

Molecular Mechanisms ◽

De Novo ◽

Optimization Procedure ◽

Biological Information ◽

Data Types ◽

Driver Genes ◽

Cancer Subtypes ◽

Cancer Driver

AbstractThe identification of molecular pathways driving cancer progression is a fundamental unsolved problem in tumorigenesis, which can substantially further our understanding of cancer mechanisms and inform the development of targeted therapies. Most current approaches to address this problem use primarily somatic mutations, not fully exploiting additional layers of biological information. Here, we describe ModulOmics, a method to de novo identify cancer driver pathways, or modules, by integrating multiple data types (protein-protein interactions, mutual exclusivity of mutations or copy number alterations, transcriptional co-regulation, and RNA co-expression) into a single probabilistic model. To efficiently search the exponential space of candidate modules, ModulOmics employs a two-step optimization procedure that combines integer linear programming with stochastic search. Across several cancer types, ModulOmics identifies highly functionally connected modules enriched with cancer driver genes, outperforming state-of-the-art methods. For breast cancer subtypes, the inferred modules recapitulate known molecular mechanisms and suggest novel subtype-specific functionalities. These findings are supported by an independent patient cohort, as well as independent proteomic and phosphoproteomic datasets.

EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer

10.21203/rs.3.rs-113748/v1 ◽

2020 ◽

Author(s):

Leila Mirsadeghi ◽

Reza Haji Hosseini ◽

Ali Mohammad Banaei-Moghaddam ◽

Kaveh Kavousi

Keyword(s):

Breast Cancer ◽

Metastatic Breast Cancer ◽

Learning Algorithm ◽

Metastatic Breast ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Support Vector ◽

Pathway Enrichment Analysis ◽

Driver Genes ◽

Gene Set

Abstract BackgroundToday, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited.MethodsIn this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). ResultsThis study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions based on gene set enrichment analysis are discussed. Third, statistical validation and comparison of all learning methods based on evaluation metrics are done. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR<0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA, including HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reached 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case.ConclusionsThis research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing.

Genomic biomarkers in relation to PD-1 checkpoint blockade response.

Journal of Clinical Oncology ◽

10.1200/jco.2018.36.5_suppl.25 ◽

2018 ◽

Vol 36 (5_suppl) ◽

pp. 25-25 ◽

Cited By ~ 4

Author(s):

Tanguy Y. Seiwert ◽

Razvan Cristescu ◽

Robin Mogg ◽

Mark Ayers ◽

Andrew Albright ◽

...

Keyword(s):

T Cell ◽

Clinical Response ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Youden Index ◽

Specific Gene ◽

Driver Genes ◽

Cancer Driver ◽

Multiple Tumor ◽

Cancer Data

25 Background: Somatic tumor mutational burden (TMB) and a T-cell inflamed gene expression profile (GEP) predict response to anti-PD-1/PD-L1 immunotherapies in multiple tumor types. We assessed the potential for GEP and TMB to jointly predict clinical response to pembrolizumab and to identify distinct, targetable patterns of biology that may modulate response/resistance. Methods: To assess the individual and joint clinical utility of TMB and GEP in a pan-tumor context, pembrolizumab-treated patients with advanced solid tumors and melanoma were stratified as 4 biomarker-defined clinical response groups (GEP low/TMB low, GEP low/TMB high, GEP high/TMB low, GEP high/TMB high; N > 300) based on cutoffs for TMB (ROC Youden Index associated) and GEP (selected via analysis of pan cancer data). TMB and GEP were used to guide transcriptome and exome analysis of tumors in 2 large databases (Moffitt, n = 2944; TCGA, n = 6978). Results: TMB and GEP had a low, but significant, correlation in these clinical datasets. ORR was highest in GEP high/TMB high (37-57%), modest in GEP high/TMB low (12-35%) and GEP low/TMB high (11-42%), and lowest in GEP low/TMB low (0-9%) groups. Within the Moffitt and TCGA databases, GEP and TMB again had a low correlation, demonstrating their potential joint utility for stratifying additional transcriptomic and genomic features of these datasets. Specific gene modules showed strong positive or negative and highly statistically significant associations with TMB, GEP or both in each dataset, and patterns were consistent between datasets. In particular, gene set enrichment analysis identified proliferative, stromal and vascular biology corresponding to specific TMB-defined subgroups within GEP high tumors. In TMB-high tumors, indication-dependent somatic DNA alterations in key cancer driver genes showed a strong negative association ( P< 1e-5) with GEP. Conclusions: This analysis shows that TMB and T-cell inflamed GEP score can stratify human cancers into groups with different response rates to pembrolizumab monotherapy, and identify patterns of underlying, targetable biology related to these groups. This approach may provide a precision medicine framework for evaluating anti-PD-1/L1-based combination therapy regimens. Clinical trial information: NCT01848834; NCT02054806; NCT01295827; NCT01866319.

EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer

BMC Medical Genomics ◽

10.1186/s12920-021-00974-3 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Leila Mirsadeghi ◽

Reza Haji Hosseini ◽

Ali Mohammad Banaei-Moghaddam ◽

Kaveh Kavousi

Keyword(s):

Breast Cancer ◽

Metastatic Breast Cancer ◽

Learning Algorithm ◽

Metastatic Breast ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Support Vector ◽

Pathway Enrichment Analysis ◽

Driver Genes ◽

Gene Set

Abstract Background Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited. Methods In this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI. Results This study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR < 0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case. Conclusions This research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphic abstract. Graphic abstract

BioinformaticsAnalysis of the Muscle-invasive Bladder Cancer Subtypes

Proceedings of Anticancer Research ◽

10.26689/par.v2i2.363 ◽

1970 ◽

Vol 2 (2) ◽

Author(s):

Wenbin Xu ◽

Weiying Zheng ◽

Hong Xia ◽

Lin Hua

Keyword(s):

Extracellular Matrix ◽

Bladder Cancer ◽

Inflammatory Response ◽

Differentially Expressed Genes ◽

Kegg Pathway ◽

Enrichment Analysis ◽

Differentially Expressed ◽

Pathway Enrichment Analysis ◽

Cancer Subtypes ◽

Pathway Enrichment

ObjectiveÂ In order to improve the accuracy in distinguishing subtypes of bladder cancer and to explore its potential therapeutic targets,Â we identify differences between two kinds of bladder cancer subtypes (basal-like and luminal) in molecular mechanism and molecular characteristics based on the bioinformatics analysis. MethodsÂ In this study,Â the RMA (robust multichip averaging) was applied to normalize the mRNA profile which included 22 samples from basal-like subtype and 132 from luminal subtype,Â and the differential expression analysis of genes with top 1000 highest standard deviation was performed. Then,Â the Gene Ontology and KEGG pathway enrichment analysis of differentially expressed genes was performed. In addition,Â the protein-protein interactions networks analysis for the top 100 most significant differentially expressed genes was performed. ResultsÂ A total of 742 differentially expressed genes distinguishing basal-like and luminal subtypes were found,Â of which 405 were up-regulated and 337 genes were down-regulated in basal-like subtype. GO enrichment analysis showed that differentially expressed genes were significantly enriched in the extracellular matrix,Â chemotaxis and inflammatory response. KEGG pathway enrichment analysis showed that the differentially expressed genes were significantly enriched in the pathway of extracellular matrix receptor interaction. The hub proteins we founded in protein-protein interaction networks were LNX1,Â MSN and PPARG. ConclusionÂ In this study,Â the mainly difference of molecular mechanism between basal-like and luminal subtypes are alteration in extracellular matrix region,Â cell chemotaxis and inflammatory response. Genes such as LNX1,Â MSN and PPARG were forecast to play important roles in the classification of bladder carcinoma subtypes.

Mutational likeliness and entropy help to identify driver mutations and their functional role in cancer

10.1101/354324 ◽

2018 ◽

Author(s):

Giorgio Mattiuz ◽

Salvatore Di Giorgio ◽

Lorenzo Tofani ◽

Antonio Frandi ◽

Francesco Donati ◽

...

Keyword(s):

Cancer Progression ◽

Somatic Mutations ◽

Driver Mutations ◽

Cancer Evolution ◽

Loss Of Function ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Genomes ◽

Passenger Mutations ◽

Mutational Processes

AbstractAlterations in cancer genomes originate from mutational processes taking place throughout oncogenesis and cancer progression. We show that likeliness and entropy are two properties of somatic mutations crucial in cancer evolution, as cancer-driver mutations stand out, with respect to both of these properties, as being distinct from the bulk of passenger mutations. Our analysis can identify novel cancer driver genes and differentiate between gain and loss of function mutations.