Evolutionary Computation for Feature Selection in Classification

Experimental Results ◽

Multi Objective ◽

Binary Pso

Classification aims to identify a class label of an instance according to the information from its characteristics or features. Unfortunately, many classification problems have a large feature set containing irrelevant and redundant features, which reduce the classification performance. In order to address the above problem, feature selection is proposed to select a small subset of relevant features. There are three main types of feature selection methods, i.e. wrapper, embedded and filter approaches. Wrappers use a classification algorithm to evaluate candidate feature subsets. In embedded approaches, the selection process is embedded in the training process of a classification algorithm. Different from the other two approaches, filters do not involve any classification algorithm during the selection process. Feature selection is an important process but it is not an easy task due to its large search space and complex feature interactions. Because of the potential global search ability, Evolutionary Computation (EC), especially Particle Swarm Optimization (PSO), has been widely and successfully applied to feature selection. However, there is potential to improve the effectiveness and efficiency of EC-based feature selection. The overall goal of this thesis is to investigate and improve the capability of EC for feature selection to select small feature subsets while maintaining or even improving the classification performance compared to using all features. Different aspects of feature selection are considered in this thesis such as the number of objectives (single-objective/multi-objective), the fitness function (filter/wrapper), and the searching mechanism. This thesis introduces a new fitness function based on mutual information which is calculated by an estimation approach instead of the traditional counting approach. Results show that the estimation approach works well on both continuous and discrete data. More importantly, mutual information calculated by the estimation approach can capture feature interactions better than the traditional counting approach. This thesis develops a novel binary PSO algorithm, which is the first work to redefine some core concepts of PSO such as velocity and momentum to suit the characteristics of binary search spaces. Experimental results show that the proposed binary PSO algorithm evolve better solutions than other binary EC algorithms when the search spaces are large and complex. Specifically, on feature selection, the proposed binary PSO algorithm can select smaller feature subsets with similar or better classification accuracies, especially when there are a large number of features. This thesis proposes surrogate models for wrapper-based feature selection. The surrogate models use surrogate training sets which are subsets of informative instances selected from the training set. Experimental results show that the proposed surrogate models assist PSO to reduce the computational cost while maintaining or even improving the classification performance compared to using only the original training set. The thesis develops the first wrapper-based multi-objective feature selection algorithm using MOEA/D. A new decomposition strategy using multiple reference points for MOEA/D is designed, which can deal with different characteristics of multi-objective feature selection such as highly discontinuous Pareto fronts and complex relationships between objectives. The experimental results show that the proposed algorithm can evolve more diverse non-dominated sets than other multi-objective algorithms. This thesis introduces the first PSO-based feature selection algorithm for transfer learning. In the proposed algorithm, the fitness function uses classification performance to reduce the differences between domains while maintaining the discriminative ability on the target domain. The experimental results show that the proposed algorithm can select feature subsets which achieve better classification performance than four state-of-the-art feature-based transfer learning algorithms.

Particle Swarm Optimisation for Feature Selection in Classification

10.26686/wgtn.17006317 ◽

2021 ◽

Author(s):

◽

Bing Xue

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Computational Time ◽

Classification Algorithms ◽

Multi Objective ◽

Selection Approach ◽

Feature Selection Approach ◽

Single Objective

Classification problems often have a large number of features, but not all of them are useful for classification. Irrelevant and redundant features may even reduce the classification accuracy. Feature selection is a process of selecting a subset of relevant features, which can decrease the dimensionality, shorten the running time, and/or improve the classification accuracy. There are two types of feature selection approaches, i.e. wrapper and filter approaches. Their main difference is that wrappers use a classification algorithm to evaluate the goodness of the features during the feature selection process while filters are independent of any classification algorithm. Feature selection is a difficult task because of feature interactions and the large search space. Existing feature selection methods suffer from different problems, such as stagnation in local optima and high computational cost. Evolutionary computation (EC) techniques are well-known global search algorithms. Particle swarm optimisation (PSO) is an EC technique that is computationally less expensive and can converge faster than other methods. PSO has been successfully applied to many areas, but its potential for feature selection has not been fully investigated. The overall goal of this thesis is to investigate and improve the capability of PSO for feature selection to select a smaller number of features and achieve similar or better classification performance than using all features. This thesis investigates the use of PSO for both wrapper and filter, and for both single objective and multi-objective feature selection, and also investigates the differences between wrappers and filters. This thesis proposes a new PSO based wrapper, single objective feature selection approach by developing new initialisation and updating mechanisms. The results show that by considering the number of features in the initialisation and updating procedures, the new algorithm can improve the classification performance, reduce the number of features and decrease computational time. This thesis develops the first PSO based wrapper multi-objective feature selection approach, which aims to maximise the classification accuracy and simultaneously minimise the number of features. The results show that the proposed multi-objective algorithm can obtain more and better feature subsets than single objective algorithms, and outperform other well-known EC based multi-objective feature selection algorithms. This thesis develops a filter, single objective feature selection approach based on PSO and information theory. Two measures are proposed to evaluate the relevance of the selected features based on each pair of features and a group of features, respectively. The results show that PSO and information based algorithms can successfully address feature selection tasks. The group based method achieves higher classification accuracies, but the pair based method is faster and selects smaller feature subsets. This thesis proposes the first PSO based multi-objective filter feature selection approach using information based measures. This work is also the first work using other two well-known multi-objective EC algorithms in filter feature selection, which are also used to compare the performance of the PSO based approach. The results show that the PSO based multiobjective filter approach can successfully address feature selection problems, outperform single objective filter algorithms and achieve better classification performance than other multi-objective algorithms. This thesis investigates the difference between wrapper and filter approaches in terms of the classification performance and computational time, and also examines the generality of wrappers. The results show that wrappers generally achieve better or similar classification performance than filters, but do not always need longer computational time than filters. The results also show that wrappers built with simple classification algorithms can be general to other classification algorithms.

Particle Swarm Optimisation for Feature Selection in Classification

10.26686/wgtn.17006317.v1 ◽

2021 ◽

Author(s):

◽

Bing Xue

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Computational Time ◽

Classification Algorithms ◽

Multi Objective ◽

Selection Approach ◽

Feature Selection Approach ◽

Single Objective

Classification problems often have a large number of features, but not all of them are useful for classification. Irrelevant and redundant features may even reduce the classification accuracy. Feature selection is a process of selecting a subset of relevant features, which can decrease the dimensionality, shorten the running time, and/or improve the classification accuracy. There are two types of feature selection approaches, i.e. wrapper and filter approaches. Their main difference is that wrappers use a classification algorithm to evaluate the goodness of the features during the feature selection process while filters are independent of any classification algorithm. Feature selection is a difficult task because of feature interactions and the large search space. Existing feature selection methods suffer from different problems, such as stagnation in local optima and high computational cost. Evolutionary computation (EC) techniques are well-known global search algorithms. Particle swarm optimisation (PSO) is an EC technique that is computationally less expensive and can converge faster than other methods. PSO has been successfully applied to many areas, but its potential for feature selection has not been fully investigated. The overall goal of this thesis is to investigate and improve the capability of PSO for feature selection to select a smaller number of features and achieve similar or better classification performance than using all features. This thesis investigates the use of PSO for both wrapper and filter, and for both single objective and multi-objective feature selection, and also investigates the differences between wrappers and filters. This thesis proposes a new PSO based wrapper, single objective feature selection approach by developing new initialisation and updating mechanisms. The results show that by considering the number of features in the initialisation and updating procedures, the new algorithm can improve the classification performance, reduce the number of features and decrease computational time. This thesis develops the first PSO based wrapper multi-objective feature selection approach, which aims to maximise the classification accuracy and simultaneously minimise the number of features. The results show that the proposed multi-objective algorithm can obtain more and better feature subsets than single objective algorithms, and outperform other well-known EC based multi-objective feature selection algorithms. This thesis develops a filter, single objective feature selection approach based on PSO and information theory. Two measures are proposed to evaluate the relevance of the selected features based on each pair of features and a group of features, respectively. The results show that PSO and information based algorithms can successfully address feature selection tasks. The group based method achieves higher classification accuracies, but the pair based method is faster and selects smaller feature subsets. This thesis proposes the first PSO based multi-objective filter feature selection approach using information based measures. This work is also the first work using other two well-known multi-objective EC algorithms in filter feature selection, which are also used to compare the performance of the PSO based approach. The results show that the PSO based multiobjective filter approach can successfully address feature selection problems, outperform single objective filter algorithms and achieve better classification performance than other multi-objective algorithms. This thesis investigates the difference between wrapper and filter approaches in terms of the classification performance and computational time, and also examines the generality of wrappers. The results show that wrappers generally achieve better or similar classification performance than filters, but do not always need longer computational time than filters. The results also show that wrappers built with simple classification algorithms can be general to other classification algorithms.

Multi-Objective Feature Selection Method by Using ACO with PSO Algorithm for Breast Cancer Detection

International Journal of Intelligent Engineering and Systems ◽

10.22266/ijies2021.1031.32 ◽

2021 ◽

Vol 14 (5) ◽

pp. 359-368

Author(s):

Rajesh Saturi ◽

◽

Parvataneni Premchand ◽

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Cancer Detection ◽

Feature Selection Method ◽

Pso Algorithm ◽

Selection Method ◽

Breast Cancer Detection ◽

Multi Objective

Multi-Objective Binary PSO with Kernel P System on GPU

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2018.3.3282 ◽

2018 ◽

Vol 13 (3) ◽

pp. 323-336 ◽

Cited By ~ 3

Author(s):

Naeimeh Elkhani ◽

Ravie Chandren Muniyandi ◽

Gexiang Zhang

Keyword(s):

Feature Selection ◽

Computational Cost ◽

Feature Selection Method ◽

Binary Particle Swarm Optimization ◽

P System ◽

Time Cost ◽

Multi Objective ◽

Almost All ◽

Binary Pso ◽

Independent Tasks

Computational cost is a big challenge for almost all intelligent algorithms which are run on CPU. In this regard, our proposed kernel P system multi-objective binary particle swarm optimization feature selection and classification method should perform with an efficient time that we aimed to settle via using potentials of membrane computing in parallel processing and nondeterminism. Moreover, GPUs perform better with latency-tolerant, highly parallel and independent tasks. In this study, to meet all the potentials of a membrane-inspired model particularly parallelism and to improve the time cost, feature selection method implemented on GPU. The time cost of the proposed method on CPU, GPU and Multicore indicates a significant improvement via implementing method on GPU.

A New SVM Multi-Class Classification Algorithm Based on Sample Scale and Distribution Area

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.712-715.2529 ◽

2013 ◽

Vol 712-715 ◽

pp. 2529-2533

Author(s):

Yu Ping Qin ◽

Peng Da Qin ◽

Shu Xian Lun ◽

Yi Wang

Keyword(s):

Binary Tree ◽

Experimental Results ◽

Leaf Node ◽

Distribution Area ◽

Root Node ◽

Multi Class Classification

A new SVM multi-class classification algorithm is proposed. Firstly, the optimal binary tree is constructed by the scale and the distribution area of every class sample, and then the sub-classifiers are trained for every non-leaf node in the binary tree. For the sample to be classified, the classification is done from the root node until someone leaf node, and the corresponding class of the leaf node is the class of the sample. The experimental results show that the algorithm improves the classification precision and classification speed, especially in the situation that the sample scale is less but its distribution area is bigger, the algorithm can improve greatly the classification performance.

A Rough Based Hybrid Binary PSO Algorithm for Flat Feature Selection and Classification in Gene Expression Data

Annals of Data Science ◽

10.1007/s40745-017-0106-3 ◽

2017 ◽

Vol 4 (3) ◽

pp. 341-360 ◽

Cited By ~ 8

Author(s):

Suresh Dara ◽

Haider Banka ◽

Chandra Sekhara Rao Annavarapu

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Pso Algorithm ◽

Expression Data ◽

Binary Pso

A self-adaptive multi-objective feature selection approach for classification problems

Integrated Computer-Aided Engineering ◽

10.3233/ica-210664 ◽

2021 ◽

pp. 1-19

Author(s):

Yu Xue ◽

Haokai Zhu ◽

Ferrante Neri

Keyword(s):

Feature Selection ◽

High Performance ◽

Heuristic Algorithms ◽

Classification Problems ◽

Main Concept ◽

Detection Mechanism ◽

Multi Objective ◽

Feature Selection Approach ◽

Self Adaptive

In classification tasks, feature selection (FS) can reduce the data dimensionality and may also improve classification accuracy, both of which are commonly treated as the two objectives in FS problems. Many meta-heuristic algorithms have been applied to solve the FS problems and they perform satisfactorily when the problem is relatively simple. However, once the dimensionality of the datasets grows, their performance drops dramatically. This paper proposes a self-adaptive multi-objective genetic algorithm (SaMOGA) for FS, which is designed to maintain a high performance even when the dimensionality of the datasets grows. The main concept of SaMOGA lies in the dynamic selection of five different crossover operators in different evolution process by applying a self-adaptive mechanism. Meanwhile, a search stagnation detection mechanism is also proposed to prevent premature convergence. In the experiments, we compare SaMOGA with five multi-objective FS algorithms on sixteen datasets. According to the experimental results, SaMOGA yields a set of well converged and well distributed solutions on most data sets, indicating that SaMOGA can guarantee classification performance while removing many features, and the advantage over its counterparts is more obvious when the dimensionality of datasets grows.

A New Representation in PSO for Discretization-Based Feature Selection

10.26686/wgtn.14273582 ◽

2021 ◽

Author(s):

B Tran ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Fitness Function ◽

Particle Swarm ◽

Search Space ◽

High Dimensional ◽

Swarm Optimization ◽

Learning Capacity ◽

Feature Interactions

In machine learning, discretization and feature selection (FS) are important techniques for preprocessing data to improve the performance of an algorithm on high-dimensional data. Since many FS methods require discrete data, a common practice is to apply discretization before FS. In addition, for the sake of efficiency, features are usually discretized individually (or univariate). This scheme works based on the assumption that each feature independently influences the task, which may not hold in cases where feature interactions exist. Therefore, univariate discretization may degrade the performance of the FS stage since information showing feature interactions may be lost during the discretization process. Initial results of our previous proposed method [evolve particle swarm optimization (EPSO)] showed that combining discretization and FS in a single stage using bare-bones particle swarm optimization (BBPSO) can lead to a better performance than applying them in two separate stages. In this paper, we propose a new method called potential particle swarm optimization (PPSO) which employs a new representation that can reduce the search space of the problem and a new fitness function to better evaluate candidate solutions to guide the search. The results on ten high-dimensional datasets show that PPSO select less than 5% of the number of features for all datasets. Compared with the two-stage approach which uses BBPSO for FS on the discretized data, PPSO achieves significantly higher accuracy on seven datasets. In addition, PPSO obtains better (or similar) classification performance than EPSO on eight datasets with a smaller number of selected features on six datasets. Furthermore, PPSO also outperforms the three compared (traditional) methods and performs similar to one method on most datasets in terms of both generalization ability and learning capacity. © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Comparison of Profit-Based Multi-Objective Approaches for Feature Selection in Credit Scoring

Algorithms ◽

10.3390/a14090260 ◽

2021 ◽

Vol 14 (9) ◽

pp. 260

Author(s):

Naomi Simumba ◽

Suguru Okami ◽

Akira Kodaka ◽

Naohiko Kohtake

Keyword(s):

Feature Selection ◽

Selection Process ◽

Credit Scoring ◽

Computational Time ◽

Multi Objective Optimization ◽

Multi Objective ◽

Base Classifier ◽

Grasshopper Optimization Algorithm ◽

Specific Implementation ◽

The Impact

Feature selection is crucial to the credit-scoring process, allowing for the removal of irrelevant variables with low predictive power. Conventional credit-scoring techniques treat this as a separate process wherein features are selected based on improving a single statistical measure, such as accuracy; however, recent research has focused on meaningful business parameters such as profit. More than one factor may be important to the selection process, making multi-objective optimization methods a necessity. However, the comparative performance of multi-objective methods has been known to vary depending on the test problem and specific implementation. This research employed a recent hybrid non-dominated sorting binary Grasshopper Optimization Algorithm and compared its performance on multi-objective feature selection for credit scoring to that of two popular benchmark algorithms in this space. Further comparison is made to determine the impact of changing the profit-maximizing base classifiers on algorithm performance. Experiments demonstrate that, of the base classifiers used, the neural network classifier improved the profit-based measure and minimized the mean number of features in the population the most. Additionally, the NSBGOA algorithm gave relatively smaller hypervolumes and increased computational time across all base classifiers, while giving the highest mean objective values for the solutions. It is clear that the base classifier has a significant impact on the results of multi-objective optimization. Therefore, careful consideration should be made of the base classifier to use in the scenarios.