On the Use of Variable Complementarity for Feature Selection in Cancer Classification

AbstractLung cancer is one of the deadliest cancers in the world. Two of the most common subtypes, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), have drastically different biological signatures, yet they are often treated similarly and classified together as non-small cell lung cancer (NSCLC). LUAD and LUSC biomarkers are scarce, and their distinct biological mechanisms have yet to be elucidated. To detect biologically relevant markers, many studies have attempted to improve traditional machine learning algorithms or develop novel algorithms for biomarker discovery. However, few have used overlapping machine learning or feature selection methods for cancer classification, biomarker identification, or gene expression analysis. This study proposes to use overlapping traditional feature selection or feature reduction techniques for cancer classification and biomarker discovery. The genes selected by the overlapping method were then verified using random forest. The classification statistics of the overlapping method were compared to those of the traditional feature selection methods. The identified biomarkers were validated in an external dataset using AUC and ROC analysis. Gene expression analysis was then performed to further investigate biological differences between LUAD and LUSC. Overall, our method achieved classification results comparable to, if not better than, the traditional algorithms. It also identified multiple known biomarkers, and five potentially novel biomarkers with high discriminating values between LUAD and LUSC. Many of the biomarkers also exhibit significant prognostic potential, particularly in LUAD. Our study also unraveled distinct biological pathways between LUAD and LUSC.

Download Full-text

A Reduced Variable Neighborhood Search Approach for Feature Selection in Cancer Classification

Variable Neighborhood Search - Lecture Notes in Computer Science ◽

10.1007/978-3-030-44932-2_1 ◽

2020 ◽

pp. 1-16

Author(s):

Angelos Pentelas ◽

Angelo Sifaleras ◽

Georgia Koloniari

Keyword(s):

Feature Selection ◽

Variable Neighborhood Search ◽

Cancer Classification ◽

Neighborhood Search ◽

Search Approach

Download Full-text

Prostate Cancer Classification Based on Best First Search and Taguchi Feature Selection Method

Image and Video Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-030-34879-3_25 ◽

2019 ◽

pp. 325-336

Author(s):

Md Akizur Rahman ◽

Priyanka Singh ◽

Ravie Chandren Muniyandi ◽

Domingo Mery ◽

Mukesh Prasad

Keyword(s):

Prostate Cancer ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Cancer Classification ◽

Best First Search ◽

Prostate Cancer Classification

Download Full-text

Liver Cancer Classification Model Using Hybrid Feature Selection Based on Class-Dependent Technique for the Central Region of Thailand

Information ◽

10.3390/info10060187 ◽

2019 ◽

Vol 10 (6) ◽

pp. 187

Author(s):

Rattanawadee Panthong ◽

Anongnart Srivihok

Keyword(s):

Feature Selection ◽

Liver Cancer ◽

Predictive Model ◽

Information Gain ◽

Classification Performance ◽

Cancer Classification ◽

Feature Subset Selection ◽

Classification Model ◽

Feature Subset ◽

Cancer Data

Liver cancer data always consist of a large number of multidimensional datasets. A dataset that has huge features and multiple classes may be irrelevant to the pattern classification in machine learning. Hence, feature selection improves the performance of the classification model to achieve maximum classification accuracy. The aims of the present study were to find the best feature subset and to evaluate the classification performance of the predictive model. This paper proposed a hybrid feature selection approach by combining information gain and sequential forward selection based on the class-dependent technique (IGSFS-CD) for the liver cancer classification model. Two different classifiers (decision tree and naïve Bayes) were used to evaluate feature subsets. The liver cancer datasets were obtained from the Cancer Hospital Thailand database. Three ensemble methods (ensemble classifiers, bagging, and AdaBoost) were applied to improve the performance of classification. The IGSFS-CD method provided good accuracy of 78.36% (sensitivity 0.7841 and specificity 0.9159) on LC_dataset-1. In addition, LC_dataset II delivered the best performance with an accuracy of 84.82% (sensitivity 0.8481 and specificity 0.9437). The IGSFS-CD method achieved better classification performance compared to the class-independent method. Furthermore, the best feature subset selection could help reduce the complexity of the predictive model.

Download Full-text

Gene selection in cancer classification using hybrid method based on Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC) feature selection and support vector machine

10.1063/1.5132474 ◽

2019 ◽

Cited By ~ 1

Author(s):

D. A. Utami ◽

Z. Rustam

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Particle Swarm Optimization ◽

Hybrid Method ◽

Artificial Bee Colony ◽

Gene Selection ◽

Cancer Classification ◽

Support Vector ◽

Swarm Optimization ◽

Bee Colony

Download Full-text

Multi-tier hybrid feature selection by combining filter and wrapper for subset feature selection in cancer classification

Indian Journal of Science and Technology ◽

10.17485/ijst/2019/v12i3/141010 ◽

2019 ◽

Vol 12 (3) ◽

pp. 1-11

Author(s):

Bibhuprasad Sahu ◽

Keyword(s):

Feature Selection ◽

Cancer Classification

Download Full-text

A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification

IEEE Access ◽

10.1109/access.2019.2922987 ◽

2019 ◽

Vol 7 ◽

pp. 78533-78548 ◽

Cited By ~ 21

Author(s):

Nada Almugren ◽

Hala Alshamlan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Cancer Classification ◽

Expression Data ◽

Selection Methods ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

An Enhancement in Cancer Classification Accuracy Using a Two-Step Feature Selection Method Based on Artificial Neural Networks with 15 Neurons

Symmetry ◽

10.3390/sym12020271 ◽

2020 ◽

Vol 12 (2) ◽

pp. 271 ◽

Cited By ~ 3

Author(s):

Md Akizur Rahman ◽

Ravie Chandren Muniyandi

Keyword(s):

Neural Network ◽

Feature Selection ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Cancer Classification ◽

Neuron Network ◽

Artificial Neural ◽

Artificial Neural Network Ann ◽

Risk Of Cancer

An artificial neural network (ANN) is a tool that can be utilized to recognize cancer effectively. Nowadays, the risk of cancer is increasing dramatically all over the world. Detecting cancer is very difficult due to a lack of data. Proper data are essential for detecting cancer accurately. Cancer classification has been carried out by many researchers, but there is still a need to improve classification accuracy. For this purpose, in this research, a two-step feature selection (FS) technique with a 15-neuron neural network (NN), which classifies cancer with high accuracy, is proposed. The FS method is utilized to reduce feature attributes, and the 15-neuron network is utilized to classify the cancer. This research utilized the benchmark Wisconsin Diagnostic Breast Cancer (WDBC) dataset to compare the proposed method with other existing techniques, showing a significant improvement of up to 99.4% in classification accuracy. The results produced in this research are more promising and significant than those in existing papers.

Download Full-text