fGAAM: A fast and resizable genetic algorithm with aggressive mutation for feature selection

Pattern Analysis and Applications ◽

10.1007/s10044-021-01000-z ◽

2021 ◽

Author(s):

Izabela Rejer ◽

Jarosław Jankowski

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Classification Accuracy ◽

Processing Time ◽

Nsga Ii ◽

Penalty Term ◽

Reference Methods ◽

Number Of Individuals

AbstractThe paper introduces a modified version of a genetic algorithm with aggressive mutation (GAAM) called fGAAM (fast GAAM) that significantly decreases the time needed to find feature subsets of a satisfactory classification accuracy. To demonstrate the time gains provided by fGAAM both algorithms were tested on eight datasets containing different number of features, classes, and examples. The fGAAM was also compared with four reference methods: the Holland GA with and without penalty term, Culling GA, and NSGA II. Results: (i) The fGAAM processing time was about 35% shorter than that of the original GAAM. (ii) The fGAAM was also 20 times quicker than two Holland GAs and 50 times quicker than NSGA II. (iii) For datasets of different number of features, classes, and examples, another number of individuals, stored for further processing, provided the highest acceleration. On average, the best results were obtained when individuals from the last 10 populations were stored (time acceleration: 36.39%) or when the number of individuals to be stored was calculated by the algorithm itself (time acceleration: 35.74%). (iv) The fGAAM was able to process all datasets used in the study, even those that, because of their high number of features, could not be processed by the two Holland GAs and NSGA II.

Download Full-text

A Survey on Feature Selection Techniques using Evolutionary Algorithms

Iraqi Journal of Science ◽

10.24996/ijs.2021.62.8.32 ◽

2021 ◽

pp. 2796-2812

Author(s):

Nishath Ansari

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Machine Learning Algorithms ◽

Nsga Ii ◽

Feature Selection Technique ◽

Heuristic Strategies ◽

Real World Applications ◽

Heuristic Strategy ◽

Feature Selection Techniques ◽

Straightforward Approach

Feature selection, a method of dimensionality reduction, is nothing but collecting a range of appropriate feature subsets from the total number of features. In this paper, a point by point explanation review about the feature selection in this segment preferred affairs and its appraisal techniques are discussed. I will initiate my conversation with a straightforward approach so that we consider taking care of features and preferred issues depending upon meta-heuristic strategy. These techniques help in obtaining the best highlight subsets. Thereafter, this paper discusses some system models that drive naturally from the environment are discussed and calculations are performed so that we can take care of the preferred feature matters in complex and massive data. Here, furthermore, I discuss algorithms like the genetic algorithm (GA), the Non-Dominated Sorting Genetic Algorithm (NSGA-II), Particle Swarm Optimization (PSO), and some other meta-heuristic strategies for considering the provisional separation of issues. A comparison of these algorithms has been performed; the results show that the feature selection technique benefits machine learning algorithms by improving the performance of the algorithm. This paper also presents various real-world applications of using feature selection.

Download Full-text

A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500202 ◽

2019 ◽

Vol 18 (03) ◽

pp. 1950020 ◽

Cited By ~ 13

Author(s):

Alok Kumar Shukla ◽

Pradeep Singh ◽

Manu Vardhan

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Classification Accuracy ◽

B Cell Lymphoma ◽

Feature Subset Selection ◽

Classification Model ◽

Significant Feature ◽

Support Vector ◽

Feature Subset ◽

Binary Genetic Algorithm

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).

Download Full-text

Feature Selection for Fatigue Segment Classification System Using Elitist Non-Dominated Sorting in Genetic Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.165.232 ◽

2012 ◽

Vol 165 ◽

pp. 232-236 ◽

Cited By ~ 1

Author(s):

Mohd Haniff Osman ◽

Z.M. Nopiah ◽

S. Abdullah

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Classification System ◽

Learning Algorithm ◽

Selection Procedure ◽

Classification Error ◽

Algorithm Selection ◽

Nsga Ii ◽

Nearest Neighbours ◽

Good Set

Having relevant features for representing dataset would motivate such algorithms to provide a highly accurate classification system in less-consuming time. Unfortunately, one good set of features is sometimes not fit to all learning algorithms. To confirm that learning algorithm selection does not weights system accuracy user has to validate that the given dataset is a feature-oriented dataset. Thus, in this study we propose a simple verification procedure based on multi objective approach by means of elitist Non-dominated Sorting in Genetic Algorithm (NSGA-II). The way NSGA-II performs in this work is quite similar to the feature selection procedure except on interpretation of the results i.e. set of optimal solutions. Two conflicting minimization elements namely classification error and number of used features are taken as objective functions. A case study of fatigue segment classification was chosen for the purpose of this study where simulations were repeated using four single classifiers such as Naive-Bayes, k nearest neighbours, decision tree and radial basis function. The proposed procedure demonstrates that only two features are needed for classifying a fatigue segment task without having to place concern on learning algorithm

Download Full-text

A Hybrid GA–MLPNN Model for One-Hour-Ahead Forecasting of the Global Horizontal Irradiance in Elizabeth City, North Carolina

Energies ◽

10.3390/en11102641 ◽

2018 ◽

Vol 11 (10) ◽

pp. 2641 ◽

Cited By ~ 8

Author(s):

Aydin Jadidi ◽

Raimundo Menezes ◽

Nilmar de Souza ◽

Antonio de Castro Lima

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

North Carolina ◽

Feature Selection ◽

Spatial Clustering ◽

Noise Detection ◽

Nsga Ii ◽

Data Set ◽

The Neural Network ◽

Hourly Data

The use of photovoltaics is still considered to be challenging because of certain reliability issues and high dependence on the global horizontal irradiance (GHI). GHI forecasting has a wide application from grid safety to supply–demand balance and economic load dispatching. Given a data set, a multi-layer perceptron neural network (MLPNN) is a strong tool for solving the forecasting problems. Furthermore, noise detection and feature selection in a data set with numerous variables including meteorological parameters and previous values of GHI are of crucial importance to obtain the desired results. This paper employs density-based spatial clustering of applications with noise (DBSCAN) and non-dominated sorting genetic algorithm II (NSGA II) algorithms for noise detection and feature selection, respectively. Tuning the neural network is another important issue that includes choosing the hidden layer size and activation functions between the layers of the network. Previous studies have utilized a combination of different parameters based on trial and error, which seems to be inefficient in terms of accurate selection of the desired features and also tuning of the neural network. In this research, two different methods—namely, particle swarm optimization (PSO) algorithm and genetic algorithm (GA)—are utilized in order to tune the MLPNN, and the results of one-hour-ahead forecasting of the GHI are subsequently compared. The methodology is validated using the hourly data for Elizabeth City located in North Carolina, USA, and the results demonstrated a better performance of GA in comparison with PSO. The GA-tuned MLPNN reported a normalized root mean square error (nRMSE) of 0.0458 and a normalized mean absolute error (nMAE) of 0.0238.

Download Full-text

A Combination of Shuffled Frog-Leaping Algorithm and Genetic Algorithm for Gene Selection

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2008.p0218 ◽

2008 ◽

Vol 12 (3) ◽

pp. 218-226 ◽

Cited By ~ 5

Author(s):

Cheng-San Yang ◽

◽

Li-Yeh Chuang ◽

Chao-Hsuan Ke ◽

Cheng-Hong Yang ◽

...

Keyword(s):

Gene Expression ◽

Genetic Algorithm ◽

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Expression Profiles ◽

Classification Problems ◽

Shuffled Frog Leaping Algorithm ◽

Shuffled Frog Leaping

Microarray data referencing to gene expression profiles provides valuable answers to a variety of problems, and contributes to advances in clinical medicine. The application of microarray data to the classification of cancer types has recently assumed increasing importance. The classification of microarray data samples involves feature selection, whose goal is to identify subsets of differentially expressed gene potentially relevant for distinguishing sample classes and classifier design. We propose an efficient evolutionary approach for selecting gene subsets from gene expression data that effectively achieves higher accuracy for classification problems. Our proposal combines a shuffled frog-leaping algorithm (SFLA) and a genetic algorithm (GA), and chooses genes (features) related to classification. The K-nearest neighbor (KNN) with leave-one-out cross validation (LOOCV) is used to evaluate classification accuracy. We apply a novel hybrid approach based on SFLA-GA and KNN classification and compare 11 classification problems from the literature. Experimental results show that classification accuracy obtained using selected features was higher than the accuracy of datasets without feature selection.

Download Full-text

A High Efficiency Thyroid Disorders Prediction System with Non-Dominated Sorting Genetic Algorithm NSGA-II as a Feature Selection Algorithm

2020 International Conference for Emerging Technology (INCET) ◽

10.1109/incet49848.2020.9154189 ◽

2020 ◽

Author(s):

Sefer KURNAZ ◽

Mohammed Sami MOHAMMED ◽

Sahar Jasim MOHAMMED

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

High Efficiency ◽

Thyroid Disorders ◽

Prediction System ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Nsga Ii

Download Full-text

SPEECH EMOTION RECOGNITION METHOD BASED ON IMPROVED DECISION TREE AND LAYERED FEATURE SELECTION

International Journal of Humanoid Robotics ◽

10.1142/s0219843610002088 ◽

2010 ◽

Vol 07 (02) ◽

pp. 245-261 ◽

Cited By ~ 9

Author(s):

QIRONG MAO ◽

XIAOJIA WANG ◽

YONGZHAO ZHAN

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Feature Selection ◽

Decision Tree ◽

Classification Accuracy ◽

Feature Selection Method ◽

Speech Emotion Recognition ◽

Recognition Method ◽

Speech Corpus ◽

Binary Decision Tree

In this paper, in order to improve the classification accuracy with features as few as possible, a new hierarchical recognition method based on an improved SVM decision tree and the layered feature selection method combining neural network with genetic algorithm are proposed. The improved SVM decision tree is constructed according to confusion degrees between two emotions or those between two emotion groups. The classifier in each node of the improved decision tree is a SVM. On the emotional speech corpus recorded by our workgroup including 7 emotions, with the features and parameters gotten by the method combining neural network with genetic algorithm, improved SVM decision tree, multi-SVM, SVM-based binary decision tree, the traditional SVM-based decision directed acyclic graph and HMM are evaluated respectively. The experiments reveal that, compared with the other four methods, the proposed method in this paper appears better classification accuracy with fewer features and less time.

Download Full-text

The use of fuzzy sets to determine the parameters of genetic algorithms that provide approximately the same execution time on the CPU and GPU

Journal of Physics Conference Series ◽

10.1088/1742-6596/2131/3/032025 ◽

2021 ◽

Vol 2131 (3) ◽

pp. 032025

Author(s):

Oleg Agibalov ◽

Nikolay Ventsov

Keyword(s):

Genetic Algorithm ◽

Processing Time ◽

Fuzzy Numbers ◽

Subject Area ◽

Initial Information ◽

Hardware Architectures ◽

The Subject ◽

Gpu Architectures ◽

Number Of Individuals ◽

Gpu Architecture

Abstract The problem under consideration consists in choosing the number of k individuals, so that the time for processing k individuals by the genetic algorithm (GA) on the CPU architecture is close to the time for processing l individuals on the GPU architecture by the genetic algorithm. The initial information is data arrays containing information about the processing time of a given number of individuals by the genetic algorithm on the available hardware architectures. Fuzzy numbers are determined based on these arrays?~? and?~?, describing the processing time of a given number of individuals, respectively, on the CPU and GPU architectures. The peculiarities of the subject area do not allow considering the well-known methods of comparison based on the equalities of the membership functions and the nearest clear sets as adequate. Based on the known formula “close to Y (around Y)” the way to compare fuzzy numbers?~? and?~? was developed in order to determine the degree of closeness of the processing time of k and l individuals, respectively, on the hardware architectures of the CPU and GPU.

Download Full-text

An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem

Mathematical Problems in Engineering ◽

10.1155/2016/8752181 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 7

Author(s):

Jing Bian ◽

Xin-guang Peng ◽

Ying Wang ◽

Hai Zhang

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Network Security ◽

Classification Accuracy ◽

Large Scale ◽

Class Imbalance ◽

Evaluation Function ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Chaos Genetic Algorithm

In the era of big data, feature selection is an essential process in machine learning. Although the class imbalance problem has recently attracted a great deal of attention, little effort has been undertaken to develop feature selection techniques. In addition, most applications involving feature selection focus on classification accuracy but not cost, although costs are important. To cope with imbalance problems, we developed a cost-sensitive feature selection algorithm that adds the cost-based evaluation function of a filter feature selection using a chaos genetic algorithm, referred to as CSFSG. The evaluation function considers both feature-acquiring costs (test costs) and misclassification costs in the field of network security, thereby weakening the influence of many instances from the majority of classes in large-scale datasets. The CSFSG algorithm reduces the total cost of feature selection and trades off both factors. The behavior of the CSFSG algorithm is tested on a large-scale dataset of network security, using two kinds of classifiers: C4.5 andk-nearest neighbor (KNN). The results of the experimental research show that the approach is efficient and able to effectively improve classification accuracy and to decrease classification time. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.

Download Full-text

Improved Grasshopper Optimization Algorithm based Feature Selection with Evolutionary Outlay-Aware Deep Belief Network Classifier (IGOA-EOA-DBNC) for High Dimensional Datasets

Revista Gestão Inovação e Tecnologias ◽

10.47059/revistageintec.v11i2.1716 ◽

2021 ◽

Vol 11 (2) ◽

pp. 817-835

Author(s):

Dr.M. Praveena ◽

Dr.V. Jaiganesh

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Classification Accuracy ◽

Processing Time ◽

Deep Belief Network ◽

High Dimensional ◽

Belief Network ◽

Grasshopper Optimization Algorithm ◽

Grasshopper Optimization ◽

High Dimensional Datasets

Background: High dimensional datasets contain the curse of dimensionality, and hence data mining becomes a more difficult task. Feature selection in the knowledge data and discovery process provides a solution for this curse of dimensionality issue and helps the classification task reduce the time complexity and improve the accuracy. Objectives: This paper aims to recognize a bio-inspired algorithm that best suits feature selection and utilizes optimized feature selection techniques. This algorithm is used to design machine learning classifiers that are suitable for multiple datasets and for both high dimensional datasets, moreover to carry out performance analysis with regards to the accuracy of a classification and the processing time for classification. Methods: This study employs an improved form of grasshopper optimization algorithm to perform feature selection task. Evolutionary outlay aware deep belief network is used to perform the classification task. Findings: In this research, 20 UCI benchmark data sets are taken with full 60 features and 30000 instances. The datasets are Mammography, Monks-1, Bupa, Credit, Parkinson's, Monk-2, Sonar, Ecoli, Prognostic, Ionosphere, Monk-3, Yeast, Car, Blood, Pima, Spect, Vert, Prognostic, Contraceptive, and Tic-Tac-Toe endgame. Table 1 describes the dataset details, number of instances, datasets and features. The overall performance is performed using MATLAB 6.0 tool, which runs on Microsoft Windows 8, and the configuration is Core 13 processor with 1 TB hard disk and 8GB RAM. Performance standards, like classification accuracy and the processing time for classification, is achieved. Novelty: Interestingly, the Improved Grasshopper Optimization Algorithm uses error rate and classification accuracy of the Evolutionary Outlay Aware –Deep Belief Network Classifier as fitness function values. This combined work of classification and feature selection is briefly represented as IGOA-EOA-DBNC. Twenty datasets are selected for testing the performance regarding elapsed time and accuracy, which gives better results.

Download Full-text