A new feature selection approach for optimizing prediction models, applied to breast cancer subtype classification

Author(s):  
Pham Quang Huy ◽  
Alioune Ngom ◽  
Luis Rueda
2020 ◽  
Vol 21 (21) ◽  
pp. 7891
Author(s):  
Chi-Wei Chen ◽  
Lan-Ying Huang ◽  
Chia-Feng Liao ◽  
Kai-Po Chang ◽  
Yen-Wei Chu

Protein phosphorylation is one of the most important post-translational modifications, and many biological processes are related to phosphorylation, such as DNA repair, transcriptional regulation and signal transduction and, therefore, abnormal regulation of phosphorylation usually causes diseases. If we can accurately predict human phosphorylation sites, this could help to solve human diseases. Therefore, we developed a kinase-specific phosphorylation prediction system, GasPhos, and proposed a new feature selection approach, called Gas, based on the ant colony system and a genetic algorithm and used performance evaluation strategies focused on different kinases to choose the best learning model. Gas uses the mean decrease Gini index (MDGI) as a heuristic value for path selection and adopts binary transformation strategies and new state transition rules. GasPhos can predict phosphorylation sites for six kinases and showed better performance than other phosphorylation prediction tools. The disease-related phosphorylated proteins that were predicted with GasPhos are also discussed. Finally, Gas can be applied to other issues that require feature selection, which could help to improve prediction performance.


2016 ◽  
Vol 14 (05) ◽  
pp. 1644002 ◽  
Author(s):  
Jinwoo Park ◽  
Benjamin Hur ◽  
Sungmin Rhee ◽  
Sangsoo Lim ◽  
Min-Su Kim ◽  
...  

A breast cancer subtype classification scheme, PAM50, based on genetic information is widely accepted for clinical applications. On the other hands, experimental cancer biology studies have been successful in revealing the mechanisms of breast cancer and now the hallmarks of cancer have been determined to explain the core mechanisms of tumorigenesis. Thus, it is important to understand how the breast cancer subtypes are related to the cancer core mechanisms, but multiple studies are yet to address the hallmarks of breast cancer subtypes. Therefore, a new approach that can explain the differences among breast cancer subtypes in terms of cancer hallmarks is needed. We developed an information theoretic sub-network mining algorithm, differentially expressed sub-network and pathway analysis (DeSPA), that retrieves tumor-related genes by mining a gene regulatory network (GRN) of transcription factors and miRNAs. With extensive experiments of the cancer genome atlas (TCGA) breast cancer sequencing data, we showed that our approach was able to select genes that belong to cancer core pathways such as DNA replication, cell cycle, p53 pathways while keeping the accuracy of breast cancer subtype classification comparable to that of PAM50. In addition, our method produces a regulatory network of TF, miRNA, and their target genes that distinguish breast cancer subtypes, which is confirmed by experimental studies in the literature.


Author(s):  
Ahmed Abdullah Farid ◽  
Gamal Selim ◽  
Hatem Khater

Breast cancer is a significant health issue across the world. Breast cancer is the most widely-diagnosed cancer in women; early-stage diagnosis of disease and therapies increase patient safety. This paper proposes a synthetic model set of features focused on the optimization of the genetic algorithm (CHFS-BOGA) to forecast breast cancer. This hybrid feature selection approach combines the advantages of three filter feature selection approaches with an optimize Genetic Algorithm (OGA) to select the best features to improve the performance of the classification process and scalability. We propose OGA by improving the initial population generating and genetic operators using the results of filter approaches as some prior information with using the C4.5 decision tree classifier as a fitness function instead of probability and random selection. The authors collected available updated data from Wisconsin UCI machine learning with a total of 569 rows and 32 columns. The dataset evaluated using an explorer set of weka data mining open-source software for the analysis purpose. The results show that the proposed hybrid feature selection approach significantly outperforms the single filter approaches and principal component analysis (PCA) for optimum feature selection. These characteristics are good indicators for the return prediction. The highest accuracy achieved with the proposed system before (CHFS-BOGA) using the support vector machine (SVM) classifiers was 97.3%. The highest accuracy after (CHFS-BOGA-SVM) was 98.25% on split 70.0% train, remainder test, and 100% on the full training set. Moreover, the receiver operating characteristic (ROC) curve was equal to 1.0. The results showed that the proposed (CHFS-BOGA-SVM) system was able to accurately classify the type of breast tumor, whether malignant or benign.


2020 ◽  
Vol 47 (9) ◽  
pp. 835-841
Author(s):  
Joungmin Choi ◽  
Jiyoung Lee ◽  
Jieun Kim ◽  
Jihyun Kim ◽  
Heejoon Chae

2019 ◽  
Author(s):  
Sara Ravaioli ◽  
Francesca Pirini ◽  
Andrea Rocca ◽  
Maurizio Puccetti ◽  
Massimiliano Bonafè ◽  
...  

Author(s):  
Andrea Rocca ◽  
Sara Ravaioli ◽  
Eugenio Fonzi ◽  
Iros Barozzi ◽  
Ylenia Perone ◽  
...  

2009 ◽  
Vol 29 (7) ◽  
pp. 1755-1757
Author(s):  
Zhong-yang XIONG ◽  
Jian JIANG ◽  
Yu-fang ZHANG

Sign in / Sign up

Export Citation Format

Share Document