scholarly journals Integrative pathway enrichment analysis of multivariate omics data

2018 ◽  
Author(s):  
Marta Paczkowska ◽  
Jonathan Barenboim ◽  
Nardnisa Sintupisut ◽  
Natalie C. Fox ◽  
Helen Zhu ◽  
...  

ABSTRACTMulti-omics datasets quantify complementary aspects of molecular biology and thus pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple omics datasets using a statistical data fusion approach, rationalizes contributing evidence and highlights associated genes. We demonstrate its utility by analyzing coding and non-coding mutations from 2,583 whole cancer genomes, revealing frequently mutated hallmark pathways and a long tail of known and putative cancer driver genes. We also studied prognostic molecular pathways in breast cancer subtypes by integrating genomic and transcriptomic features of tumors and tumor-adjacent cells and found significant associations with immune response processes and anti-apoptotic signaling pathways. ActivePathways is a versatile method that improves systems-level understanding of cellular organization in health and disease through integration of multiple molecular datasets and pathway annotations.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ege Ülgen ◽  
O. Uğur Sezerman

Abstract Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR.


2021 ◽  
Vol 11 ◽  
Author(s):  
Di Zhang ◽  
Yannan Bin

Identification of driver genes from mass non-functional passenger genes in cancers is still a critical challenge. Here, an effective and no parameter algorithm, named DriverSubNet, is presented for detecting driver genes by effectively mining the mutation and gene expression information based on subnetwork enrichment analysis. Compared with the existing classic methods, DriverSubNet can rank driver genes and filter out passenger genes more efficiently in terms of precision, recall, and F1 score, as indicated by the analysis of four cancer datasets. The method recovered about 50% more known cancer driver genes in the top 100 detected genes than those found in other algorithms. Intriguingly, DriverSubNet was able to find these unknown cancer driver genes which could act as potential therapeutic targets and useful prognostic biomarkers for cancer patients. Therefore, DriverSubNet may act as a useful tool for the identification of driver genes by subnetwork enrichment analysis.


2020 ◽  
Author(s):  
Ege Ülgen ◽  
O. Uğur Sezerman

AbstractCancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomic data. However, methods for personalized analysis of driver genes are underdeveloped.In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomic data, called driveR. Combining genomic information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model.Testing on 28 different datasets, this study demonstrates that driveR performs adequately, outperforms existing approaches, and is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes. driveR is available on CRAN: https://cran.r-project.org/package=driveR.


Author(s):  
Xinguo Lu ◽  
Xing Li ◽  
Xin Qian ◽  
Qiumai Miao ◽  
Shaoliang Peng

With advances in next-generation sequencing(NGS) technologies, large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer mechanism is to identify the driver genes from the mutation genes by analyzing and integrating multi-types genomics data. Breast cancer is known as a heterogeneous disease. The identification of subtype-specific driver genes is critical to guide the diagnosis, assessment of prognosis and treatment of breast cancer. We developed an integrated frame based on gene expression profilings and copy number variation(CNV) data to identify breast cancer subtype-specific driver genes. In this frame, we employed statistical machine-learning method to select gene subsets and utilized an module-network analysis method to identify potential candidate driver genes. The final subtype-specific driver genes were acquired by paired-wise comparison in subtypes. To validate specificity of the driver genes, the gene expression data of these genes were applied to classify the patient samples with 10-fold cross validation and the enrichment analysis were also conducted on the identified driver genes. The experimental results show that the proposed integrative method can identify the potential driver genes and the classifier with these genes acquired better performance than with genes identified by other methods.


2018 ◽  
Author(s):  
Dana Silverbush ◽  
Simona Cristea ◽  
Gali Yanovich ◽  
Tamar Geiger ◽  
Niko Beerenwinkel ◽  
...  

AbstractThe identification of molecular pathways driving cancer progression is a fundamental unsolved problem in tumorigenesis, which can substantially further our understanding of cancer mechanisms and inform the development of targeted therapies. Most current approaches to address this problem use primarily somatic mutations, not fully exploiting additional layers of biological information. Here, we describe ModulOmics, a method to de novo identify cancer driver pathways, or modules, by integrating multiple data types (protein-protein interactions, mutual exclusivity of mutations or copy number alterations, transcriptional co-regulation, and RNA co-expression) into a single probabilistic model. To efficiently search the exponential space of candidate modules, ModulOmics employs a two-step optimization procedure that combines integer linear programming with stochastic search. Across several cancer types, ModulOmics identifies highly functionally connected modules enriched with cancer driver genes, outperforming state-of-the-art methods. For breast cancer subtypes, the inferred modules recapitulate known molecular mechanisms and suggest novel subtype-specific functionalities. These findings are supported by an independent patient cohort, as well as independent proteomic and phosphoproteomic datasets.


2020 ◽  
Author(s):  
Leila Mirsadeghi ◽  
Reza Haji Hosseini ◽  
Ali Mohammad Banaei-Moghaddam ◽  
Kaveh Kavousi

Abstract BackgroundToday, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited.MethodsIn this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). ResultsThis study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions based on gene set enrichment analysis are discussed. Third, statistical validation and comparison of all learning methods based on evaluation metrics are done. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR<0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA, including HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reached 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case.ConclusionsThis research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing.


2018 ◽  
Vol 36 (5_suppl) ◽  
pp. 25-25 ◽  
Author(s):  
Tanguy Y. Seiwert ◽  
Razvan Cristescu ◽  
Robin Mogg ◽  
Mark Ayers ◽  
Andrew Albright ◽  
...  

25 Background: Somatic tumor mutational burden (TMB) and a T-cell inflamed gene expression profile (GEP) predict response to anti-PD-1/PD-L1 immunotherapies in multiple tumor types. We assessed the potential for GEP and TMB to jointly predict clinical response to pembrolizumab and to identify distinct, targetable patterns of biology that may modulate response/resistance. Methods: To assess the individual and joint clinical utility of TMB and GEP in a pan-tumor context, pembrolizumab-treated patients with advanced solid tumors and melanoma were stratified as 4 biomarker-defined clinical response groups (GEP low/TMB low, GEP low/TMB high, GEP high/TMB low, GEP high/TMB high; N > 300) based on cutoffs for TMB (ROC Youden Index associated) and GEP (selected via analysis of pan cancer data). TMB and GEP were used to guide transcriptome and exome analysis of tumors in 2 large databases (Moffitt, n = 2944; TCGA, n = 6978). Results: TMB and GEP had a low, but significant, correlation in these clinical datasets. ORR was highest in GEP high/TMB high (37-57%), modest in GEP high/TMB low (12-35%) and GEP low/TMB high (11-42%), and lowest in GEP low/TMB low (0-9%) groups. Within the Moffitt and TCGA databases, GEP and TMB again had a low correlation, demonstrating their potential joint utility for stratifying additional transcriptomic and genomic features of these datasets. Specific gene modules showed strong positive or negative and highly statistically significant associations with TMB, GEP or both in each dataset, and patterns were consistent between datasets. In particular, gene set enrichment analysis identified proliferative, stromal and vascular biology corresponding to specific TMB-defined subgroups within GEP high tumors. In TMB-high tumors, indication-dependent somatic DNA alterations in key cancer driver genes showed a strong negative association ( P< 1e-5) with GEP. Conclusions: This analysis shows that TMB and T-cell inflamed GEP score can stratify human cancers into groups with different response rates to pembrolizumab monotherapy, and identify patterns of underlying, targetable biology related to these groups. This approach may provide a precision medicine framework for evaluating anti-PD-1/L1-based combination therapy regimens. Clinical trial information: NCT01848834; NCT02054806; NCT01295827; NCT01866319.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Leila Mirsadeghi ◽  
Reza Haji Hosseini ◽  
Ali Mohammad Banaei-Moghaddam ◽  
Kaveh Kavousi

Abstract Background Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited. Methods In this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI. Results This study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR < 0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case. Conclusions This research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphic abstract. Graphic abstract


1970 ◽  
Vol 2 (2) ◽  
Author(s):  
Wenbin Xu ◽  
Weiying Zheng ◽  
Hong Xia ◽  
Lin Hua

Objective In order to improve the accuracy in distinguishing subtypes of bladder cancer and to explore its potential therapeutic targets, we identify differences between two kinds of bladder cancer subtypes (basal-like and luminal) in molecular mechanism and molecular characteristics based on the bioinformatics analysis. Methods In this study, the RMA (robust multichip averaging) was applied to normalize the mRNA profile which included 22 samples from basal-like subtype and 132 from luminal subtype, and the differential expression analysis of genes with top 1000 highest standard deviation was performed. Then, the Gene Ontology and KEGG pathway enrichment analysis of differentially expressed genes was performed. In addition, the protein-protein interactions networks analysis for the top 100 most significant differentially expressed genes was performed. Results A total of 742 differentially expressed genes distinguishing basal-like and luminal subtypes were found, of which 405 were up-regulated and 337 genes were down-regulated in basal-like subtype. GO enrichment analysis showed that differentially expressed genes were significantly enriched in the extracellular matrix, chemotaxis and inflammatory response. KEGG pathway enrichment analysis showed that the differentially expressed genes were significantly enriched in the pathway of extracellular matrix receptor interaction. The hub proteins we founded in protein-protein interaction networks were LNX1, MSN and PPARG. Conclusion In this study, the mainly difference of molecular mechanism between basal-like and luminal subtypes are alteration in extracellular matrix region, cell chemotaxis and inflammatory response. Genes such as LNX1, MSN and PPARG were forecast to play important roles in the classification of bladder carcinoma subtypes.


2018 ◽  
Author(s):  
Giorgio Mattiuz ◽  
Salvatore Di Giorgio ◽  
Lorenzo Tofani ◽  
Antonio Frandi ◽  
Francesco Donati ◽  
...  

AbstractAlterations in cancer genomes originate from mutational processes taking place throughout oncogenesis and cancer progression. We show that likeliness and entropy are two properties of somatic mutations crucial in cancer evolution, as cancer-driver mutations stand out, with respect to both of these properties, as being distinct from the bulk of passenger mutations. Our analysis can identify novel cancer driver genes and differentiate between gain and loss of function mutations.


Sign in / Sign up

Export Citation Format

Share Document