An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance. Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test. Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods. Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.

Download Full-text

Microarray Gene Expression Data Classification using a Hybrid Algorithm: MRMRAGA

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j8873.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 706-713

Keyword(s):

Gene Expression ◽

Genetic Algorithm ◽

Feature Selection ◽

Gene Expression Data ◽

Feature Selection Method ◽

Small Sample ◽

Adaptive Genetic Algorithm ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

In the field of microarray gene expression research, the high dimension of the features with a comparatively small sample size of these data became necessary for the development of a robust and efficient feature selection method in order to perform classification task more precisely on gene expression data. We propose the hybrid feature selection (mRMRAGA) approach in this paper, which combines the minimum redundancy and maximum relevance (mRMR) with the adaptive genetic algorithm (AGA). The mRMR method is frequently used to identify the characteristics more accurately for gene and its phenotypes. Then their relevance is narrowed down which is described in pairing with its relevant feature selection. This approach is known as Minimum Redundancy and Maximum Relevance. The Genetic Algorithm (GA) has been propelled with the procedure of natural selection and it is based on heuristic search method. And the adaptive genetic algorithm is improvised one which gives better performance. We have conducted an experiment on four benchmarked dataset using our proposed approach and then classified using four well-known classification approaches. The accuracy was measured and observed that it gives better performance compared to the other conventional feature selection methods.

Download Full-text

Feature Selection Using Approximate Conditional Entropy Based on Fuzzy Information Granule for Gene Expression Data Classification

Frontiers in Genetics ◽

10.3389/fgene.2021.631505 ◽

2021 ◽

Vol 12 ◽

Author(s):

Hengyi Zhang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Large Scale ◽

Small Sample Size ◽

Conditional Entropy ◽

Small Sample ◽

Expression Data ◽

Fuzzy Information ◽

Information Granule

Classification is widely used in gene expression data analysis. Feature selection is usually performed before classification because of the large number of genes and the small sample size in gene expression data. In this article, a novel feature selection algorithm using approximate conditional entropy based on fuzzy information granule is proposed, and the correctness of the method is proved by the monotonicity of entropy. Firstly, the fuzzy relation matrix is established by Laplacian kernel. Secondly, the approximately equal relation on fuzzy sets is defined. And then, the approximate conditional entropy based on fuzzy information granule and the importance of internal attributes are defined. Approximate conditional entropy can measure the uncertainty of knowledge from two different perspectives of information and algebra theory. Finally, the greedy algorithm based on the approximate conditional entropy is designed for feature selection. Experimental results for six large-scale gene datasets show that our algorithm not only greatly reduces the dimension of the gene datasets, but also is superior to five state-of-the-art algorithms in terms of classification accuracy.

Download Full-text

A filter feature selection method based LLRFC and redundancy analysis for tumor classification using gene expression data

2016 12th World Congress on Intelligent Control and Automation (WCICA) ◽

10.1109/wcica.2016.7578590 ◽

2016 ◽

Cited By ~ 2

Author(s):

Jiangeng Li ◽

Xiaodan Li ◽

Wei Zhang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Redundancy Analysis ◽

Feature Selection Method ◽

Selection Method ◽

Tumor Classification ◽

Expression Data

Download Full-text

A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization

Proceedings of the 2016 International Conference on Biological Engineering and Pharmacy (BEP 2016) ◽

10.2991/bep-16.2017.14 ◽

2017 ◽

Author(s):

Mingyue SHENG ◽

Wei DU ◽

Yuan TIAN ◽

Yanchun LIANG

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Feature Selection Method ◽

Selection Method ◽

Expression Data

Download Full-text

COMBINING GENERALIZED NMF AND DISCRIMINATIVE MIXTURE MODELS FOR CLASSIFICATION OF GENE EXPRESSION DATA

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001408006892 ◽

2008 ◽

Vol 22 (08) ◽

pp. 1587-1598 ◽

Cited By ~ 3

Author(s):

WEIXIANG LIU ◽

KEHONG YUAN ◽

JIAN WU ◽

DATIAN YE ◽

ZHEN JI ◽

...

Keyword(s):

Gene Expression ◽

Mixture Model ◽

Gene Expression Data ◽

Small Sample Size ◽

Data Classification ◽

Small Sample ◽

Training Data ◽

Microarray Data Analysis ◽

Expression Data

Classification of gene expression samples is a core task in microarray data analysis. How to reduce thousands of genes and to select a suitable classifier are two key issues for gene expression data classification. This paper introduces a framework on combining both feature extraction and classifier simultaneously. Considering the non-negativity, high dimensionality and small sample size, we apply a discriminative mixture model which is designed for non-negative gene express data classification via non-negative matrix factorization (NMF) for dimension reduction. In order to enhance the sparseness of training data for fast learning of the mixture model, a generalized NMF is also adopted. Experimental results on several real gene expression datasets show that the classification accuracy, stability and decision quality can be significantly improved by using the generalized method, and the proposed method can give better performance than some previous reported results on the same datasets.

Download Full-text

A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It’s Application to Tumor Gene Expression Data Analysis

Interdisciplinary Sciences Computational Life Sciences ◽

10.1007/s12539-015-0272-y ◽

2015 ◽

Vol 7 (4) ◽

pp. 391-396 ◽

Cited By ~ 2

Author(s):

Jiangeng Li ◽

Lei Su ◽

Zenan Pang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Data Analysis ◽

Gene Expression Data ◽

Feature Selection Method ◽

Selection Method ◽

Expression Data ◽

Gene Expression Data Analysis ◽

Tumor Gene Expression ◽

Tumor Gene

Download Full-text

A Cascade Flexible Neural Forest Model for Cancer Subtypes Classification on Gene Expression Data

Computational Intelligence and Neuroscience ◽

10.1155/2021/6480456 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Lianxin Zhong ◽

Qingfang Meng ◽

Yuehui Chen

Keyword(s):

Gene Expression ◽

Sample Size ◽

Gene Expression Data ◽

Small Sample Size ◽

Small Sample ◽

Expression Data ◽

Cancer Subtypes ◽

Subtype Classification ◽

Cancer Subtype

The correct classification of cancer subtypes is of great significance for the in-depth study of cancer pathogenesis and the realization of accurate treatment for cancer patients. In recent years, the classification of cancer subtypes using deep neural networks and gene expression data has become a hot topic. However, most classifiers may face the challenges of overfitting and low classification accuracy when dealing with small sample size and high-dimensional biological data. In this paper, the Cascade Flexible Neural Forest (CFNForest) Model was proposed to accomplish cancer subtype classification. CFNForest extended the traditional flexible neural tree structure to FNT Group Forest exploiting a bagging ensemble strategy and could automatically generate the model’s structure and parameters. In order to deepen the FNT Group Forest without introducing new hyperparameters, the multilayer cascade framework was exploited to design the FNT Group Forest model, which transformed features between levels and improved the performance of the model. The proposed CFNForest model also improved the operational efficiency and the robustness of the model by sample selection mechanism between layers and setting different weights for the output of each layer. To accomplish cancer subtype classification, FNT Group Forest with different feature sets was used to enrich the structural diversity of the model, which make it more suitable for processing small sample size datasets. The experiments on RNA-seq gene expression data showed that CFNForest effectively improves the accuracy of cancer subtype classification. The classification results have good robustness.

Download Full-text

A laminar augmented cascading flexible neural forest model for classification of cancer subtypes based on gene expression data

BMC Bioinformatics ◽

10.1186/s12859-021-04391-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Lianxin Zhong ◽

Qingfang Meng ◽

Yuehui Chen ◽

Lei Du ◽

Peng Wu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Small Sample Size ◽

Small Sample ◽

Expression Data ◽

Cancer Subtypes ◽

Cancer Pathogenesis ◽

Depth Study ◽

And Function

Abstract Background Correctly classifying the subtypes of cancer is of great significance for the in-depth study of cancer pathogenesis and the realization of personalized treatment for cancer patients. In recent years, classification of cancer subtypes using deep neural networks and gene expression data has gradually become a research hotspot. However, most classifiers may face overfitting and low classification accuracy when dealing with small sample size and high-dimensional biology data. Results In this paper, a laminar augmented cascading flexible neural forest (LACFNForest) model was proposed to complete the classification of cancer subtypes. This model is a cascading flexible neural forest using deep flexible neural forest (DFNForest) as the base classifier. A hierarchical broadening ensemble method was proposed, which ensures the robustness of classification results and avoids the waste of model structure and function as much as possible. We also introduced an output judgment mechanism to each layer of the forest to reduce the computational complexity of the model. The deep neural forest was extended to the densely connected deep neural forest to improve the prediction results. The experiments on RNA-seq gene expression data showed that LACFNForest has better performance in the classification of cancer subtypes compared to the conventional methods. Conclusion The LACFNForest model effectively improves the accuracy of cancer subtype classification with good robustness. It provides a new approach for the ensemble learning of classifiers in terms of structural design.

Download Full-text

Hybrid Feature Selection Algorithm mRMR-ICA for Cancer Classification from Microarray Gene Expression Data

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180601074349 ◽

2018 ◽

Vol 21 (6) ◽

pp. 420-430 ◽

Cited By ~ 4

Author(s):

Shuaiqun Wang ◽

Wei Kong ◽

Aorigele ◽

Jin Deng ◽

Shangce Gao ◽

...

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Classification Accuracy ◽

Microarray Gene Expression Data ◽

Cancer Classification ◽

Expression Data ◽

Microarray Gene Expression ◽

Redundant Genes ◽

Microarray Gene

Aims and Objective: Redundant information of microarray gene expression data makes it difficult for cancer classification. Hence, it is very important for researchers to find appropriate ways to select informative genes for better identification of cancer. This study was undertaken to present a hybrid feature selection method mRMR-ICA which combines minimum redundancy maximum relevance (mRMR) with imperialist competition algorithm (ICA) for cancer classification in this paper. Materials and Methods: The presented algorithm mRMR-ICA utilizes mRMR to delete redundant genes as preprocessing and provide the small datasets for ICA for feature selection. It will use support vector machine (SVM) to evaluate the classification accuracy for feature genes. The fitness function includes classification accuracy and the number of selected genes. Results: Ten benchmark microarray gene expression datasets are used to test the performance of mRMR-ICA. Experimental results including the accuracy of cancer classification and the number of informative genes are improved for mRMR-ICA compared with the original ICA and other evolutionary algorithms. Conclusion: The comparison results demonstrate that mRMR-ICA can effectively delete redundant genes to ensure that the algorithm selects fewer informative genes to get better classification results. It also can shorten calculation time and improve efficiency.

Download Full-text

A Hybrid Feature Selection Method Using Gene Expression Data

2009 Ninth IEEE International Conference on Bioinformatics and BioEngineering ◽

10.1109/bibe.2009.24 ◽

2009 ◽

Cited By ~ 4

Author(s):

Li-Yeh Chuang ◽

Kuo-Chuan Wu ◽

Cheng-Hong Yang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Feature Selection Method ◽

Selection Method ◽

Expression Data

Download Full-text