scholarly journals EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer

Author(s):  
Leila Mirsadeghi ◽  
Reza Haji Hosseini ◽  
Ali Mohammad Banaei-Moghaddam ◽  
Kaveh Kavousi

Abstract BackgroundToday, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited.MethodsIn this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). ResultsThis study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions based on gene set enrichment analysis are discussed. Third, statistical validation and comparison of all learning methods based on evaluation metrics are done. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR<0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA, including HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reached 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case.ConclusionsThis research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing.

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Leila Mirsadeghi ◽  
Reza Haji Hosseini ◽  
Ali Mohammad Banaei-Moghaddam ◽  
Kaveh Kavousi

Abstract Background Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited. Methods In this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI. Results This study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR < 0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case. Conclusions This research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphic abstract. Graphic abstract


2020 ◽  
Author(s):  
Leila Mirsadeghi ◽  
Reza Haji Hosseini ◽  
Ali Mohammad Banaei-Moghaddam ◽  
Kaveh Kavousi

Abstract Background Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited.Methods In this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI.Results This study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR<0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case.Conclusions This research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphical Abstract (Figure 1).


2014 ◽  
Vol 13s5 ◽  
pp. CIN.S14069 ◽  
Author(s):  
Fan Zhang ◽  
Youping Deng ◽  
Mu Wang ◽  
Li Cui ◽  
Renee Drabier

Genes do not function alone but through complex biological pathways. Pathway-based biomarkers may be a reliable diagnostic tool for early detection of breast cancer due to the fact that breast cancer is not a single homogeneous disease. We applied Integrated Pathway Analysis Database (IPAD) and Gene Set Enrichment Analysis (GSEA) approaches to the study of pathway-based biomarker discovery problem in breast cancer proteomics. Our strategy for identifying and analyzing pathway-based biomarkers are threefold. Firstly, we performed pathway analysis with IPAD to build the gene set database. Secondly, we ran GSEA to identify 16 pathway-based biomarkers. Lastly, we built a Support Vector Machine model with three-way data split and fivefold cross-validation to validate the biomarkers. The approach-unraveling the intricate pathways, networks, and functional contexts in which genes or proteins function-is essential to the understanding molecular mechanisms of pathway-based biomarkers in breast cancer.


2021 ◽  
Vol 27 ◽  
Author(s):  
Aoshuang Qi ◽  
Mingyi Ju ◽  
Yinfeng Liu ◽  
Jia Bi ◽  
Qian Wei ◽  
...  

Background: Complex antigen processing and presentation processes are involved in the development and progression of breast cancer (BC). A single biomarker is unlikely to adequately reflect the complex interplay between immune cells and cancer; however, there have been few attempts to find a robust antigen processing and presentation-related signature to predict the survival outcome of BC patients with respect to tumor immunology. Therefore, we aimed to develop an accurate gene signature based on immune-related genes for prognosis prediction of BC.Methods: Information on BC patients was obtained from The Cancer Genome Atlas. Gene set enrichment analysis was used to confirm the gene set related to antigen processing and presentation that contributed to BC. Cox proportional regression, multivariate Cox regression, and stratified analysis were used to identify the prognostic power of the gene signature. Differentially expressed mRNAs between high- and low-risk groups were determined by KEGG analysis.Results: A three-gene signature comprising HSPA5 (heat shock protein family A member 5), PSME2 (proteasome activator subunit 2), and HLA-F (major histocompatibility complex, class I, F) was significantly associated with OS. HSPA5 and PSME2 were protective (hazard ratio (HR) &lt; 1), and HLA-F was risky (HR &gt; 1). Risk score, estrogen receptor (ER), progesterone receptor (PR) and PD-L1 were independent prognostic indicators. KIT and ACACB may have important roles in the mechanism by which the gene signature regulates prognosis of BC.Conclusion: The proposed three-gene signature is a promising biomarker for estimating survival outcomes in BC patients.


2020 ◽  
Author(s):  
Yang Liu ◽  
Qian Du ◽  
Dan Sun ◽  
Ruiying Han ◽  
Mengmeng Teng ◽  
...  

Abstract Background: SQSTM1 (Sequestosome 1, p62) is degraded by activated autophagy and involved in the progression of in various types of cancers. However, the prognostic role and underlying regulation mechanism of SQSTM1 in the progression and development of breast cancer remain unclear.Methods: In this study, 1336 samples with available mRNA data from Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) database and 27 formalin fixation and paraffin embedding (FFPE) tissue samples from the First Affiliated Hospital of Xi’an Jiaotong University were collected to evaluate SQSTM1 expression in mRNA and protein levels. Kaplan–Meier and Cox regression were used for revealing prognostic value in three independent breast cancer independent datasets. Tumor Immune Estimation Resource (TIMER) database and Gene Set Variation Analysis (GSVA) was used to explore the relationship of SQSTM1 mRNA expression and immune infiltration level in breast cancer. Dysregulation mechanisms of SQSTM1 were also explored including copy number variation (CNV), somatic mutation, epigenetic alterations and other transcription and post-transcription level using multiple datasets. Finally, Gene Set Enrichment Analysis (GSEA) was constructed to elucidate functional regulating performance of SQSTM1 in breast cancer.Results: The results showed that mRNA and protein level of SQSTM1 were significantly elevated in breast cancer and receiver operating characteristic (ROC) curve showed that p62 may act as diagnostic biomarker. Lower expression of SQSTM1 predicted better outcome through multiple datasets. It was also found that SQSTM1 correlated with immune infiltrates in breast cancer. Moreover, CNV and methylation of SQSTM1 DNA was correlated with SQSTM1 dysregulation and act as prognostic factors for breast cancer patients. Yet, somatic mutation status of SQSTM1 didn’t show any prognostic relevance. We also identified diverse transcription factors that directly bound to SQSTM1 DNA and the miRNAs which may regulate SQSTM1 mRNA. Finally, functional enrichment analysis revealed that SQSTM1 is related to cell signal transduction, oxidative stress and autophagy in breast cancer.Conclusion: Our findings revealed that SQSTM1 plays a key role in the progression of breast cancer and might be a promising biomarker for the diagnosis and personalized treatment of breast cancer patients.


Cells ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 622 ◽  
Author(s):  
Marianna Talia ◽  
Ernestina De Francesco ◽  
Damiano Rigiracciolo ◽  
Maria Muoio ◽  
Lucia Muglia ◽  
...  

The G protein-coupled estrogen receptor (GPER, formerly known as GPR30) is a seven-transmembrane receptor that mediates estrogen signals in both normal and malignant cells. In particular, GPER has been involved in the activation of diverse signaling pathways toward transcriptional and biological responses that characterize the progression of breast cancer (BC). In this context, a correlation between GPER expression and worse clinical-pathological features of BC has been suggested, although controversial data have also been reported. In order to better assess the biological significance of GPER in the aggressive estrogen receptor (ER)-negative BC, we performed a bioinformatics analysis using the information provided by The Invasive Breast Cancer Cohort of The Cancer Genome Atlas (TCGA) project and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) datasets. Gene expression correlation and the statistical analysis were carried out with R studio base functions and the tidyverse package. Pathway enrichment analysis was evaluated with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway on the Database for Annotation, Visualization and Integrated Discovery (DAVID) website, whereas gene set enrichment analysis (GSEA) was performed with the R package phenoTest. The survival analysis was determined with the R package survivALL. Analyzing the expression data of more than 2500 primary BC, we ascertained that GPER levels are associated with pro-migratory and metastatic genes belonging to cell adhesion molecules (CAMs), extracellular matrix (ECM)-receptor interaction, and focal adhesion (FA) signaling pathways. Thereafter, evaluating the disease-free interval (DFI) in ER-negative BC patients, we found that the subjects expressing high GPER levels exhibited a shorter DFI in respect to those exhibiting low GPER levels. Overall, our results may pave the way to further dissect the network triggered by GPER in the breast malignancies lacking ER toward a better assessment of its prognostic significance and the action elicited in mediating the aggressive features of the aforementioned BC subtype.


2020 ◽  
Author(s):  
Yang Liu ◽  
Qian Du ◽  
Dan Sun ◽  
Ruiying Han ◽  
Mengmeng Teng ◽  
...  

Abstract Background: SQSTM1 (Sequestosome 1, p62) is degraded by activated autophagy and involved in the progression of in various types of cancers. However, the prognostic role and underlying regulation mechanism of SQSTM1 in the progression and development of breast cancer remain unclear.Methods: In this study, 1336 samples with available mRNA data from Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) database and 27 formalin fixation and paraffin embedding (FFPE) tissue samples from the First Affiliated Hospital of Xi’an Jiaotong University were collected to evaluate SQSTM1 expression in mRNA and protein levels. Kaplan–Meier and Cox regression were used for revealing prognostic value in three independent breast cancer independent datasets. Tumor Immune Estimation Resource (TIMER) database and Gene Set Variation Analysis (GSVA) was used to explore the relationship of SQSTM1 mRNA expression and immune infiltration level in breast cancer. Dysregulation mechanisms of SQSTM1 were also explored including copy number variation (CNV), somatic mutation, epigenetic alterations and other transcription and post-transcription level using multiple datasets. Finally, Gene Set Enrichment Analysis (GSEA) was constructed to elucidate functional regulating performance of SQSTM1 in breast cancer.Results: The results showed that mRNA and protein level of SQSTM1 were significantly elevated in breast cancer and receiver operating characteristic (ROC) curve showed that p62 may act as diagnostic biomarker. Lower expression of SQSTM1 predicted better outcome through multiple datasets. It was also found that SQSTM1 correlated with immune infiltrates in breast cancer. Moreover, CNV and methylation of SQSTM1 DNA was correlated with SQSTM1 dysregulation and act as prognostic factors for breast cancer patients. Yet, somatic mutation status of SQSTM1 didn’t show any prognostic relevance. We also identified diverse transcription factors that directly bound to SQSTM1 DNA and the miRNAs which may regulate SQSTM1 mRNA. Finally, functional enrichment analysis revealed that SQSTM1 is related to cell signal transduction, oxidative stress and autophagy in breast cancer.Conclusion: Our findings revealed that overexpression of SQSTM1 significantly to poor survival and immune infiltrations in breast cancer. In addition, SQSTM1 plays a key role in the progression of breast cancer and might be a promising biomarker for the diagnosis and personalized treatment of breast cancer patients.


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Fei Liu ◽  
Xiaopeng Yu ◽  
Guijin He

Background. We analyzed the n6-methyladenosine (m6A) modification patterns of immune cells infiltrating the tumor microenvironment of breast cancer (BC) to provide a new perspective for the early diagnosis and treatment of BC. Methods. Based on 23 m6A regulatory factors, we identified m6A-related gene characteristics and m6A modification patterns in BC through unsupervised cluster analysis. To examine the differences in biological processes among various m6A modification modes, we performed genomic variation analysis. We then quantified the relative infiltration levels of different immune cell subpopulations in the tumor microenvironment of BC using the CIBERSORT algorithm and single-sample gene set enrichment analysis. Univariate Cox analysis was used to screen for m6A characteristic genes related to prognosis. Finally, we evaluated the m6A modification pattern of patients with a single BC by constructing the m6Ascore based on principal component analysis. Results. We identified three different m6A modification patterns in 2128 BC samples. A higher abundance of the immune infiltration of the m6Acluster C was indicated by the results of CIBERSORT and the single-sample gene set enrichment analysis. Based on the m6A characteristic genes obtained through screening, the m6Ascore was determined. The BC patients were segregated into m6Ascore groups of low and high categories, which revealed significant survival benefits among patients with low m6Ascores. Additionally, the high-m6Ascore group had a higher mutation frequency and was associated with low PD-L1 expression, and the m6Ascore and tumor mutation burden showed a positive correlation. In addition, treatment effects were better in patients in the high-m6Ascore group. Conclusions. In case of a single patient with BC, the immune cell infiltration characteristics of the tumor microenvironment and the m6A methylation modification pattern could be evaluated using the m6Ascore. Our results provide a foundation for improving personalized immunotherapy of BC.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Jingbo Sun ◽  
Jingzhan Huang ◽  
Jin Lan ◽  
Kun Zhou ◽  
Yuan Gao ◽  
...  

Abstract Background Centromere Protein F (CENPF) associates with the centromere–kinetochore complex and influences cell proliferation and metastasis in several cancers. The role of CENPF in breast cancer (BC) bone metastasis remains unclear. Methods Using the ONCOMINE database, we compared the expression of CENPF in breast cancer and normal tissues. Findings were confirmed in 60 BC patients through immunohistochemical (IHC) staining. Microarray data from GEO and Kaplan–Meier plots were used analyze the overall survival (OS) and relapse free survival (RFS). Using the GEO databases, we compared the expression of CENPF in primary lesions, lung metastasis lesions and bone metastasis lesions, and validated our findings in BALB/C mouse 4T1 BC models. Based on gene set enrichment analysis (GSEA) and western blot, we predicted the mechanisms by which CENPF regulates BC bone metastasis. Results The ONCOMINE database and immunohistochemical (IHC) showed higher CENPF expression in BC tissue compared to normal tissue. Kaplan–Meier plots also revealed that high CENPF mRNA expression correlated to poor survival and shorter progression-free survival (RFS). From BALB/C mice 4T1 BC models and the GEO database, CENPF was overexpressed in primary lesions, other target organs, and in bone metastasis. Based on gene set enrichment analysis (GSEA) and western blot, we predicted that CENPF regulates the secretion of parathyroid hormone-related peptide (PTHrP) through its ability to activate PI3K–AKT–mTORC1. Conclusion CENPF promotes BC bone metastasis by activating PI3K–AKT–mTORC1 signaling and represents a novel therapeutic target for BC treatment.


Sign in / Sign up

Export Citation Format

Share Document