Using a Feature Subset Selection method and Support Vector Machine to address curse of dimensionality and redundancy in Hyperion hyperspectral data classification

Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available logBBmodels. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our logBBmodel suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration.

Download Full-text

Efficacy of Interferon Treatment for Chronic Hepatitis C Predicted by Feature Subset Selection and Support Vector Machine

Journal of Medical Systems ◽

10.1007/s10916-006-9046-8 ◽

2007 ◽

Vol 31 (2) ◽

pp. 117-123 ◽

Cited By ~ 3

Author(s):

Jun Yang ◽

Anto Satriyo Nugroho ◽

Kazunobu Yamauchi ◽

Kentaro Yoshioka ◽

Jiang Zheng ◽

...

Keyword(s):

Support Vector Machine ◽

Chronic Hepatitis ◽

Hepatitis C ◽

Chronic Hepatitis C ◽

Subset Selection ◽

Feature Subset Selection ◽

Support Vector ◽

Feature Subset ◽

Interferon Treatment

Download Full-text

Mixed-variable ant colony optimisation algorithm for feature subset selection and tuning support vector machine parameter

International Journal of Bio-Inspired Computation ◽

10.1504/ijbic.2017.081842 ◽

2017 ◽

Vol 9 (1) ◽

pp. 53 ◽

Cited By ~ 3

Author(s):

Hiba Basim Alwan ◽

Ku Ruhana Ku Mahamud

Keyword(s):

Support Vector Machine ◽

Subset Selection ◽

Feature Subset Selection ◽

Ant Colony ◽

Support Vector ◽

Feature Subset ◽

Ant Colony Optimisation ◽

Machine Parameter ◽

Support Vector Machine Parameter ◽

Mixed Variable

Download Full-text

A Wrapper Feature Subset Selection Method Based on Randomized Search and Multilayer Structure

BioMed Research International ◽

10.1155/2019/9864213 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Yifei Mao ◽

Yuansheng Yang

Keyword(s):

Multilayer Structure ◽

Subset Selection ◽

Selection Method ◽

Feature Subset Selection ◽

Biomedical Science ◽

Support Vector ◽

Current Layer ◽

Feature Subset ◽

Feature Weights ◽

Randomized Search

The identification of discriminative features from information-rich data with the goal of clinical diagnosis is crucial in the field of biomedical science. In this context, many machine-learning techniques have been widely applied and achieved remarkable results. However, disease, especially cancer, is often caused by a group of features with complex interactions. Unlike traditional feature selection methods, which only focused on finding single discriminative features, a multilayer feature subset selection method (MLFSSM), which employs randomized search and multilayer structure to select a discriminative subset, is proposed herein. In each level of this method, many feature subsets are generated to assure the diversity of the combinations, and the weights of features are evaluated on the performances of the subsets. The weight of a feature would increase if the feature is selected into more subsets with better performances compared with other features on the current layer. In this manner, the values of feature weights are revised layer-by-layer; the precision of feature weights is constantly improved; and better subsets are repeatedly constructed by the features with higher weights. Finally, the topmost feature subset of the last layer is returned. The experimental results based on five public gene datasets showed that the subsets selected by MLFSSM were more discriminative than the results by traditional feature methods including LVW (a feature subset method used the Las Vegas method for randomized search strategy), GAANN (a feature subset selection method based genetic algorithm (GA)), and support vector machine recursive feature elimination (SVM-RFE). Furthermore, MLFSSM showed higher classification performance than some state-of-the-art methods which selected feature pairs or groups, including top scoring pair (TSP), k-top scoring pairs (K-TSP), and relative simplicity-based direct classifier (RS-DC).

Download Full-text