scholarly journals fGAAM: A fast and resizable genetic algorithm with aggressive mutation for feature selection

Author(s):  
Izabela Rejer ◽  
Jarosław Jankowski

AbstractThe paper introduces a modified version of a genetic algorithm with aggressive mutation (GAAM) called fGAAM (fast GAAM) that significantly decreases the time needed to find feature subsets of a satisfactory classification accuracy. To demonstrate the time gains provided by fGAAM both algorithms were tested on eight datasets containing different number of features, classes, and examples. The fGAAM was also compared with four reference methods: the Holland GA with and without penalty term, Culling GA, and NSGA II. Results: (i) The fGAAM processing time was about 35% shorter than that of the original GAAM. (ii) The fGAAM was also 20 times quicker than two Holland GAs and 50 times quicker than NSGA II. (iii) For datasets of different number of features, classes, and examples, another number of individuals, stored for further processing, provided the highest acceleration. On average, the best results were obtained when individuals from the last 10 populations were stored (time acceleration: 36.39%) or when the number of individuals to be stored was calculated by the algorithm itself (time acceleration: 35.74%). (iv) The fGAAM was able to process all datasets used in the study, even those that, because of their high number of features, could not be processed by the two Holland GAs and NSGA II.

2021 ◽  
pp. 2796-2812
Author(s):  
Nishath Ansari

     Feature selection, a method of dimensionality reduction, is nothing but collecting a range of appropriate feature subsets from the total number of features. In this paper, a point by point explanation review about the feature selection in this segment preferred affairs and its appraisal techniques are discussed. I will initiate my conversation with a straightforward approach so that we consider taking care of features and preferred issues depending upon meta-heuristic strategy. These techniques help in obtaining the best highlight subsets. Thereafter, this paper discusses some system models that drive naturally from the environment are discussed and calculations are performed so that we can take care of the preferred feature matters in complex and massive data. Here, furthermore, I discuss algorithms like the genetic algorithm (GA), the Non-Dominated Sorting Genetic Algorithm (NSGA-II), Particle Swarm Optimization (PSO), and some other meta-heuristic strategies for considering the provisional separation of issues. A comparison of these algorithms has been performed; the results show that the feature selection technique benefits machine learning algorithms by improving the performance of the algorithm. This paper also presents various real-world applications of using feature selection.


Author(s):  
Alok Kumar Shukla ◽  
Pradeep Singh ◽  
Manu Vardhan

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).


2012 ◽  
Vol 165 ◽  
pp. 232-236 ◽  
Author(s):  
Mohd Haniff Osman ◽  
Z.M. Nopiah ◽  
S. Abdullah

Having relevant features for representing dataset would motivate such algorithms to provide a highly accurate classification system in less-consuming time. Unfortunately, one good set of features is sometimes not fit to all learning algorithms. To confirm that learning algorithm selection does not weights system accuracy user has to validate that the given dataset is a feature-oriented dataset. Thus, in this study we propose a simple verification procedure based on multi objective approach by means of elitist Non-dominated Sorting in Genetic Algorithm (NSGA-II). The way NSGA-II performs in this work is quite similar to the feature selection procedure except on interpretation of the results i.e. set of optimal solutions. Two conflicting minimization elements namely classification error and number of used features are taken as objective functions. A case study of fatigue segment classification was chosen for the purpose of this study where simulations were repeated using four single classifiers such as Naive-Bayes, k nearest neighbours, decision tree and radial basis function. The proposed procedure demonstrates that only two features are needed for classifying a fatigue segment task without having to place concern on learning algorithm


Energies ◽  
2018 ◽  
Vol 11 (10) ◽  
pp. 2641 ◽  
Author(s):  
Aydin Jadidi ◽  
Raimundo Menezes ◽  
Nilmar de Souza ◽  
Antonio de Castro Lima

The use of photovoltaics is still considered to be challenging because of certain reliability issues and high dependence on the global horizontal irradiance (GHI). GHI forecasting has a wide application from grid safety to supply–demand balance and economic load dispatching. Given a data set, a multi-layer perceptron neural network (MLPNN) is a strong tool for solving the forecasting problems. Furthermore, noise detection and feature selection in a data set with numerous variables including meteorological parameters and previous values of GHI are of crucial importance to obtain the desired results. This paper employs density-based spatial clustering of applications with noise (DBSCAN) and non-dominated sorting genetic algorithm II (NSGA II) algorithms for noise detection and feature selection, respectively. Tuning the neural network is another important issue that includes choosing the hidden layer size and activation functions between the layers of the network. Previous studies have utilized a combination of different parameters based on trial and error, which seems to be inefficient in terms of accurate selection of the desired features and also tuning of the neural network. In this research, two different methods—namely, particle swarm optimization (PSO) algorithm and genetic algorithm (GA)—are utilized in order to tune the MLPNN, and the results of one-hour-ahead forecasting of the GHI are subsequently compared. The methodology is validated using the hourly data for Elizabeth City located in North Carolina, USA, and the results demonstrated a better performance of GA in comparison with PSO. The GA-tuned MLPNN reported a normalized root mean square error (nRMSE) of 0.0458 and a normalized mean absolute error (nMAE) of 0.0238.


Author(s):  
Cheng-San Yang ◽  
◽  
Li-Yeh Chuang ◽  
Chao-Hsuan Ke ◽  
Cheng-Hong Yang ◽  
...  

Microarray data referencing to gene expression profiles provides valuable answers to a variety of problems, and contributes to advances in clinical medicine. The application of microarray data to the classification of cancer types has recently assumed increasing importance. The classification of microarray data samples involves feature selection, whose goal is to identify subsets of differentially expressed gene potentially relevant for distinguishing sample classes and classifier design. We propose an efficient evolutionary approach for selecting gene subsets from gene expression data that effectively achieves higher accuracy for classification problems. Our proposal combines a shuffled frog-leaping algorithm (SFLA) and a genetic algorithm (GA), and chooses genes (features) related to classification. The K-nearest neighbor (KNN) with leave-one-out cross validation (LOOCV) is used to evaluate classification accuracy. We apply a novel hybrid approach based on SFLA-GA and KNN classification and compare 11 classification problems from the literature. Experimental results show that classification accuracy obtained using selected features was higher than the accuracy of datasets without feature selection.


2010 ◽  
Vol 07 (02) ◽  
pp. 245-261 ◽  
Author(s):  
QIRONG MAO ◽  
XIAOJIA WANG ◽  
YONGZHAO ZHAN

In this paper, in order to improve the classification accuracy with features as few as possible, a new hierarchical recognition method based on an improved SVM decision tree and the layered feature selection method combining neural network with genetic algorithm are proposed. The improved SVM decision tree is constructed according to confusion degrees between two emotions or those between two emotion groups. The classifier in each node of the improved decision tree is a SVM. On the emotional speech corpus recorded by our workgroup including 7 emotions, with the features and parameters gotten by the method combining neural network with genetic algorithm, improved SVM decision tree, multi-SVM, SVM-based binary decision tree, the traditional SVM-based decision directed acyclic graph and HMM are evaluated respectively. The experiments reveal that, compared with the other four methods, the proposed method in this paper appears better classification accuracy with fewer features and less time.


2021 ◽  
Vol 2131 (3) ◽  
pp. 032025
Author(s):  
Oleg Agibalov ◽  
Nikolay Ventsov

Abstract The problem under consideration consists in choosing the number of k individuals, so that the time for processing k individuals by the genetic algorithm (GA) on the CPU architecture is close to the time for processing l individuals on the GPU architecture by the genetic algorithm. The initial information is data arrays containing information about the processing time of a given number of individuals by the genetic algorithm on the available hardware architectures. Fuzzy numbers are determined based on these arrays?~? and?~?, describing the processing time of a given number of individuals, respectively, on the CPU and GPU architectures. The peculiarities of the subject area do not allow considering the well-known methods of comparison based on the equalities of the membership functions and the nearest clear sets as adequate. Based on the known formula “close to Y (around Y)” the way to compare fuzzy numbers?~? and?~? was developed in order to determine the degree of closeness of the processing time of k and l individuals, respectively, on the hardware architectures of the CPU and GPU.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Jing Bian ◽  
Xin-guang Peng ◽  
Ying Wang ◽  
Hai Zhang

In the era of big data, feature selection is an essential process in machine learning. Although the class imbalance problem has recently attracted a great deal of attention, little effort has been undertaken to develop feature selection techniques. In addition, most applications involving feature selection focus on classification accuracy but not cost, although costs are important. To cope with imbalance problems, we developed a cost-sensitive feature selection algorithm that adds the cost-based evaluation function of a filter feature selection using a chaos genetic algorithm, referred to as CSFSG. The evaluation function considers both feature-acquiring costs (test costs) and misclassification costs in the field of network security, thereby weakening the influence of many instances from the majority of classes in large-scale datasets. The CSFSG algorithm reduces the total cost of feature selection and trades off both factors. The behavior of the CSFSG algorithm is tested on a large-scale dataset of network security, using two kinds of classifiers: C4.5 andk-nearest neighbor (KNN). The results of the experimental research show that the approach is efficient and able to effectively improve classification accuracy and to decrease classification time. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.


2021 ◽  
Vol 11 (2) ◽  
pp. 817-835
Author(s):  
Dr.M. Praveena ◽  
Dr.V. Jaiganesh

Background: High dimensional datasets contain the curse of dimensionality, and hence data mining becomes a more difficult task. Feature selection in the knowledge data and discovery process provides a solution for this curse of dimensionality issue and helps the classification task reduce the time complexity and improve the accuracy. Objectives: This paper aims to recognize a bio-inspired algorithm that best suits feature selection and utilizes optimized feature selection techniques. This algorithm is used to design machine learning classifiers that are suitable for multiple datasets and for both high dimensional datasets, moreover to carry out performance analysis with regards to the accuracy of a classification and the processing time for classification. Methods: This study employs an improved form of grasshopper optimization algorithm to perform feature selection task. Evolutionary outlay aware deep belief network is used to perform the classification task. Findings: In this research, 20 UCI benchmark data sets are taken with full 60 features and 30000 instances. The datasets are Mammography, Monks-1, Bupa, Credit, Parkinson's, Monk-2, Sonar, Ecoli, Prognostic, Ionosphere, Monk-3, Yeast, Car, Blood, Pima, Spect, Vert, Prognostic, Contraceptive, and Tic-Tac-Toe endgame. Table 1 describes the dataset details, number of instances, datasets and features. The overall performance is performed using MATLAB 6.0 tool, which runs on Microsoft Windows 8, and the configuration is Core 13 processor with 1 TB hard disk and 8GB RAM. Performance standards, like classification accuracy and the processing time for classification, is achieved. Novelty: Interestingly, the Improved Grasshopper Optimization Algorithm uses error rate and classification accuracy of the Evolutionary Outlay Aware –Deep Belief Network Classifier as fitness function values. This combined work of classification and feature selection is briefly represented as IGOA-EOA-DBNC. Twenty datasets are selected for testing the performance regarding elapsed time and accuracy, which gives better results.


Sign in / Sign up

Export Citation Format

Share Document