Prediction of human disease-associated phosphorylation sites with combined feature selection approach and support vector machine

Protein phosphorylation is one of the most important post-translational modifications, and many biological processes are related to phosphorylation, such as DNA repair, transcriptional regulation and signal transduction and, therefore, abnormal regulation of phosphorylation usually causes diseases. If we can accurately predict human phosphorylation sites, this could help to solve human diseases. Therefore, we developed a kinase-specific phosphorylation prediction system, GasPhos, and proposed a new feature selection approach, called Gas, based on the ant colony system and a genetic algorithm and used performance evaluation strategies focused on different kinases to choose the best learning model. Gas uses the mean decrease Gini index (MDGI) as a heuristic value for path selection and adopts binary transformation strategies and new state transition rules. GasPhos can predict phosphorylation sites for six kinases and showed better performance than other phosphorylation prediction tools. The disease-related phosphorylated proteins that were predicted with GasPhos are also discussed. Finally, Gas can be applied to other issues that require feature selection, which could help to improve prediction performance.

Download Full-text

SVR-FFS: A novel forward feature selection approach for high-frequency time series forecasting using support vector regression

Expert Systems with Applications ◽

10.1016/j.eswa.2020.113729 ◽

2020 ◽

Vol 160 ◽

pp. 113729 ◽

Cited By ~ 2

Author(s):

José Manuel Valente ◽

Sebastián Maldonado

Keyword(s):

Time Series ◽

Feature Selection ◽

Support Vector Regression ◽

High Frequency ◽

Time Series Forecasting ◽

Support Vector ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

Sparse Least Squares Support Vector Machines Based on Genetic Algorithms: A Feature Selection Approach

Advances in Computational Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-20518-8_42 ◽

2019 ◽

pp. 500-511

Author(s):

Pedro Hericson Machado Araújo ◽

Ajalmar R. Rocha Neto

Keyword(s):

Genetic Algorithms ◽

Feature Selection ◽

Support Vector Machines ◽

Least Squares ◽

Support Vector ◽

Vector Machines ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

A Composite Hybrid Feature Selection Learning-Based Optimization of Genetic Algorithm For Breast Cancer Detection

10.20944/preprints202003.0298.v1 ◽

2020 ◽

Author(s):

Ahmed Abdullah Farid ◽

Gamal Selim ◽

Hatem Khater

Keyword(s):

Breast Cancer ◽

Genetic Algorithm ◽

Feature Selection ◽

Early Stage ◽

Fitness Function ◽

Support Vector ◽

Initial Population ◽

Tree Classifier ◽

Selection Approach ◽

Feature Selection Approach

Breast cancer is a significant health issue across the world. Breast cancer is the most widely-diagnosed cancer in women; early-stage diagnosis of disease and therapies increase patient safety. This paper proposes a synthetic model set of features focused on the optimization of the genetic algorithm (CHFS-BOGA) to forecast breast cancer. This hybrid feature selection approach combines the advantages of three filter feature selection approaches with an optimize Genetic Algorithm (OGA) to select the best features to improve the performance of the classification process and scalability. We propose OGA by improving the initial population generating and genetic operators using the results of filter approaches as some prior information with using the C4.5 decision tree classifier as a fitness function instead of probability and random selection. The authors collected available updated data from Wisconsin UCI machine learning with a total of 569 rows and 32 columns. The dataset evaluated using an explorer set of weka data mining open-source software for the analysis purpose. The results show that the proposed hybrid feature selection approach significantly outperforms the single filter approaches and principal component analysis (PCA) for optimum feature selection. These characteristics are good indicators for the return prediction. The highest accuracy achieved with the proposed system before (CHFS-BOGA) using the support vector machine (SVM) classifiers was 97.3%. The highest accuracy after (CHFS-BOGA-SVM) was 98.25% on split 70.0% train, remainder test, and 100% on the full training set. Moreover, the receiver operating characteristic (ROC) curve was equal to 1.0. The results showed that the proposed (CHFS-BOGA-SVM) system was able to accurately classify the type of breast tumor, whether malignant or benign.

Download Full-text

Penguin Search Optimization Based Feature Selection for Automated Opinion Mining

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2629.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 648-653

Keyword(s):

Feature Selection ◽

Language Processing ◽

Opinion Mining ◽

Sentiment Classification ◽

Support Vector ◽

Svm Classifier ◽

Np Hard ◽

Search Optimization ◽

Selection Approach ◽

Feature Selection Approach

Twitter sentiment analysis is a vital concept in determining the public opinions about products, services, events or personality. Analyzing the medical tweets on a specific topic can provide immense benefits in medical industry. However, the medical tweets require efficient feature selection approach to produce significantly accurate results. Penguin search optimization algorithm (PeSOA) has the ability to resolve NP-hard problems. This paper aims at developing an automated opinion mining framework by modeling the feature selection problem as NP-hard optimization problem and using PeSOA based feature selection approach to solve it. Initially, the medical tweets based on cancer and drugs keywords are extracted and pre-processed to filter the relevant informative tweets. Then the features are extracted based on the Natural Language Processing (NLP) concepts and the optimal features are selected using PeSOA whose results are fed as input to three baseline classifiers to achieve optimal and accurate sentiment classification. The experimental results obtained through MATLAB simulations on cancer and drug tweets using k-Nearest Neighbor (KNN), Naïve Bayes (NB) and Support Vector Machine (SVM) indicate that the proposed PeSOA feature selection based tweet opinion mining has improved the classification performance significantly. It shows that the PeSOA feature selection with the SVM classifier provides superior sentiment classification than the other classifiers

Download Full-text