GENE SELECTION FOR CANCER CLASSIFICATION USING WRAPPER APPROACHES

Despite the fact that cancer classification has considerably improved, nowadays a general method that classifies known types of cancer has not yet been developed. In this work, we propose the use of supervised classification techniques, coupled with feature subset selection algorithms, to automatically perform this classification in gene expression datasets. Due to the large number of features of gene expression datasets, the search of a highly accurate combination of features is done by means of the new Estimation of Distribution Algorithms paradigm. In order to assess the accuracy level of the proposed approach, the naïve-Bayes classification algorithm is employed in a wrapper form. Promising results are achieved, in addition to a considerable reduction in the number of genes. Stating the optimal selection of genes as a search task, an automatic and robust choice in the genes finally selected is performed, in contrast to previous works that research the same types of problems.

Download Full-text

Feature Subset Selection by Estimation of Distribution Algorithms

Estimation of Distribution Algorithms - Genetic Algorithms and Evolutionary Computation ◽

10.1007/978-1-4615-1539-5_13 ◽

2002 ◽

pp. 269-293 ◽

Cited By ~ 5

Author(s):

I. Inza ◽

P. Larrañaga ◽

B. Sierra

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Estimation Of Distribution Algorithms ◽

Estimation Of Distribution ◽

Distribution Algorithms

Download Full-text

Estimation of Distribution Algorithms for Feature Subset Selection in Large Dimensionality Domains

Data Mining ◽

10.4018/978-1-930708-25-9.ch005 ◽

2011 ◽

pp. 97-116 ◽

Cited By ~ 1

Author(s):

Inaki Inza ◽

Pedro Larranaga ◽

Basilio Sierra

Keyword(s):

Probabilistic Models ◽

Subset Selection ◽

Population Based ◽

Feature Subset Selection ◽

Feature Subset ◽

Estimation Of Distribution Algorithms ◽

Text Learning ◽

Estimation Of Distribution ◽

Selection Tasks ◽

Distribution Algorithms

Feature Subset Selection (FSS) is a well-known task of Machine Learning, Data Mining, Pattern Recognition or Text Learning paradigms. Genetic Algorithms (GAs) are possibly the most commonly used algorithms for Feature Subset Selection tasks. Although the FSS literature contains many papers, few of them tackle the task of FSS in domains with more than 50 features. In this chapter we present a novel search heuristic paradigm, called Estimation of Distribution Algorithms (EDAs), as an alternative to GAs, to perform a population-based and randomized search in datasets of a large dimensionality. The EDA paradigm avoids the use of genetic crossover and mutation operators to evolve the populations. In absence of these operators, the evolution is guaranteed by the factorization of the probability distribution of the best solutions found in a generation of the search and the subsequent simulation of this distribution to obtain a new pool of solutions. In this chapter we present four different probabilistic models to perform this factorization. In a comparison with two types of GAs in natural and artificial datasets of a large dimensionality, EDAbased approaches obtain encouraging results with regard to accuracy, and a fewer number of evaluations were needed than used in genetic approaches.

Download Full-text

Feature subset selection by genetic algorithms and estimation of distribution algorithms

Artificial Intelligence in Medicine ◽

10.1016/s0933-3657(01)00085-9 ◽

2001 ◽

Vol 23 (2) ◽

pp. 187-205 ◽

Cited By ~ 33

Author(s):

I. Inza ◽

M. Merino ◽

P. Larrañaga ◽

J. Quiroga ◽

B. Sierra ◽

...

Keyword(s):

Genetic Algorithms ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Estimation Of Distribution Algorithms ◽

Estimation Of Distribution ◽

Distribution Algorithms

Download Full-text

Gene expression data analyses for supervised prostate cancer classification based on feature subset selection combined with different classifiers

2016 5th International Conference on Multimedia Computing and Systems (ICMCS) ◽

10.1109/icmcs.2016.7905660 ◽

2016 ◽

Author(s):

Sara Haddou Bouazza ◽

Abdelouhab Zeroual ◽

Khalid Auhmani

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Gene Expression Data ◽

Subset Selection ◽

Cancer Classification ◽

Feature Subset Selection ◽

Feature Subset ◽

Expression Data ◽

Data Analyses ◽

Prostate Cancer Classification

Download Full-text

Prototype Selection and Feature Subset Selection by Estimation of Distribution Algorithms. A Case Study in the Survival of Cirrhotic Patients Treated with TIPS

Artificial Intelligence in Medicine - Lecture Notes in Computer Science ◽

10.1007/3-540-48229-6_3 ◽

2001 ◽

pp. 20-29 ◽

Cited By ~ 12

Author(s):

B. Sierra ◽

E. Lazkano ◽

I. Inza ◽

M. Merino ◽

P. Larrañaga ◽

...

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Estimation Of Distribution Algorithms ◽

Prototype Selection ◽

Estimation Of Distribution ◽

Cirrhotic Patients ◽

Distribution Algorithms

Download Full-text

Optimization for Gene Selection and Cancer Classification

Proceedings ◽

10.3390/proceedings2021074021 ◽

2021 ◽

Vol 74 (1) ◽

pp. 21

Author(s):

Hülya Başeğmez ◽

Emrah Sezer ◽

Çiğdem Selçukcan Erol

Keyword(s):

Candidate Genes ◽

Cancer Diagnosis ◽

Microarray Data ◽

Gene Selection ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Methods ◽

Diagnosis And Classification ◽

Selection Algorithms

Recently, gene selection has played an important role in cancer diagnosis and classification. In this study, it was studied to select high descriptive genes for use in cancer diagnosis in order to develop a classification analysis for cancer diagnosis using microarray data. For this purpose, comparative analysis and intersections of six different methods obtained by using two feature selection algorithms and three search algorithms are presented. As a result of the six different feature subset selection methods applied, it was seen that instead of 15,155 genes, 24 genes should be focused. In this case, cancer diagnosis may be possible using 24 candidate genes that have been reduced, rather than similar studies involving larger features. However, in order to see the diagnostic success of diagnoses made using these candidate genes, they should be examined in a wet laboratory.

Download Full-text

A Hybrid Barnacles Mating Optimizer Algorithm With Support Vector Machines for Gene Selection of Microarray Cancer Classification

IEEE Access ◽

10.1109/access.2021.3075942 ◽

2021 ◽

Vol 9 ◽

pp. 64895-64905

Author(s):

Essam H. Houssein ◽

Diaa Salama Abdelminaam ◽

Hager N. Hassan ◽

Mustafa M. Al-Sayed ◽

Emad Nabil

Keyword(s):

Support Vector Machines ◽

Gene Selection ◽

Cancer Classification ◽

Support Vector ◽

Vector Machines ◽

Selection Of

Download Full-text

Liver Cancer Classification Model Using Hybrid Feature Selection Based on Class-Dependent Technique for the Central Region of Thailand

Information ◽

10.3390/info10060187 ◽

2019 ◽

Vol 10 (6) ◽

pp. 187

Author(s):

Rattanawadee Panthong ◽

Anongnart Srivihok

Keyword(s):

Feature Selection ◽

Liver Cancer ◽

Predictive Model ◽

Information Gain ◽

Classification Performance ◽

Cancer Classification ◽

Feature Subset Selection ◽

Classification Model ◽

Feature Subset ◽

Cancer Data

Liver cancer data always consist of a large number of multidimensional datasets. A dataset that has huge features and multiple classes may be irrelevant to the pattern classification in machine learning. Hence, feature selection improves the performance of the classification model to achieve maximum classification accuracy. The aims of the present study were to find the best feature subset and to evaluate the classification performance of the predictive model. This paper proposed a hybrid feature selection approach by combining information gain and sequential forward selection based on the class-dependent technique (IGSFS-CD) for the liver cancer classification model. Two different classifiers (decision tree and naïve Bayes) were used to evaluate feature subsets. The liver cancer datasets were obtained from the Cancer Hospital Thailand database. Three ensemble methods (ensemble classifiers, bagging, and AdaBoost) were applied to improve the performance of classification. The IGSFS-CD method provided good accuracy of 78.36% (sensitivity 0.7841 and specificity 0.9159) on LC_dataset-1. In addition, LC_dataset II delivered the best performance with an accuracy of 84.82% (sensitivity 0.8481 and specificity 0.9437). The IGSFS-CD method achieved better classification performance compared to the class-independent method. Furthermore, the best feature subset selection could help reduce the complexity of the predictive model.

Download Full-text

Improving Feature Subset Selection Using a Genetic Algorithm for Microarray Gene Expression Data

2006 IEEE International Conference on Evolutionary Computation ◽

10.1109/cec.2006.1688623 ◽

2006 ◽

Cited By ~ 7

Author(s):

Feng Tan ◽

Xuezheng Fu ◽

Yanqing Zhang ◽

A.G. Bourgeois

Keyword(s):

Gene Expression ◽

Genetic Algorithm ◽

Gene Expression Data ◽

Subset Selection ◽

Microarray Gene Expression Data ◽

Feature Subset Selection ◽

Feature Subset ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

BioMed Research International ◽

10.1155/2015/604910 ◽

2015 ◽

Vol 2015 ◽

pp. 1-15 ◽

Cited By ~ 70

Author(s):

Hala Alshamlan ◽

Ghada Badr ◽

Yousef Alohali

Keyword(s):

Gene Expression ◽

Gene Selection ◽

Cancer Classification ◽

Support Vector ◽

Optimization Approach ◽

Selection Algorithm ◽

Classification Problems ◽

Microarray Gene Expression ◽

Abc Algorithm ◽

Microarray Gene

An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

Download Full-text