A Novel Community Detection Based Genetic Algorithm for Feature Selection

Abstract The selection of features is an essential data preprocessing stage in data mining. The core principle of feature selection seems to be to pick a subset of possible features by excluding features with almost no predictive information as well as highly associated redundant features. In the past several years, a variety of meta-heuristic methods were introduced to eliminate redundant and irrelevant features as much as possible from high-dimensional datasets. Among the main disadvantages of present meta-heuristic based approaches is that they are often neglecting the correlation between a set of selected features. In this article, for the purpose of feature selection, the authors propose a genetic algorithm based on community detection, which functions in three steps. The feature similarities are calculated in the first step. The features are classified by community detection algorithms into clusters throughout the second step. In the third step, features are picked by a genetic algorithm with a new community-based repair operation. Nine benchmark classification problems were analyzed in terms of the performance of the presented approach. Also, the authors have compared the efficiency of the proposed approach with the findings from four available algorithms for feature selection. The findings indicate that the new approach continuously yields improved classification accuracy.

Download Full-text

A novel community detection based genetic algorithm for feature selection

Journal Of Big Data ◽

10.1186/s40537-020-00398-3 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Mehrdad Rostami ◽

Kamal Berahmand ◽

Saman Forouzandeh

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Community Detection ◽

Classification Problems ◽

Community Based ◽

Detection Algorithms ◽

Repair Operation ◽

New Feature ◽

New Community ◽

High Dimensional Datasets

AbstractThe feature selection is an essential data preprocessing stage in data mining. The core principle of feature selection seems to be to pick a subset of possible features by excluding features with almost no predictive information as well as highly associated redundant features. In the past several years, a variety of meta-heuristic methods were introduced to eliminate redundant and irrelevant features as much as possible from high-dimensional datasets. Among the main disadvantages of present meta-heuristic based approaches is that they are often neglecting the correlation between a set of selected features. In this article, for the purpose of feature selection, the authors propose a genetic algorithm based on community detection, which functions in three steps. The feature similarities are calculated in the first step. The features are classified by community detection algorithms into clusters throughout the second step. In the third step, features are picked by a genetic algorithm with a new community-based repair operation. Nine benchmark classification problems were analyzed in terms of the performance of the presented approach. Also, the authors have compared the efficiency of the proposed approach with the findings from four available algorithms for feature selection. Comparing the performance of the proposed method with three new feature selection methods based on PSO, ACO, and ABC algorithms on three classifiers showed that the accuracy of the proposed method is on average 0.52% higher than the PSO, 1.20% higher than ACO, and 1.57 higher than the ABC algorithm.

Download Full-text

A New Approach for Wrapper Feature Selection Using Genetic Algorithm for Big Data

Proceedings in Adaptation, Learning and Optimization - Intelligent and Evolutionary Systems ◽

10.1007/978-3-319-27000-5_6 ◽

2015 ◽

pp. 75-83 ◽

Cited By ~ 2

Author(s):

Waad Bouaguel

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Big Data ◽

New Approach ◽

Wrapper Feature Selection

Download Full-text

From Community Detection to Mentor Selection in Rating-Free Collaborative Filtering

Advances in Multimedia ◽

10.1155/2011/852518 ◽

2011 ◽

Vol 2011 ◽

pp. 1-19

Author(s):

Armelle Brun ◽

Sylvain Castagnos ◽

Anne Boyer

Keyword(s):

Collaborative Filtering ◽

Recommender Systems ◽

Community Detection ◽

Local Community ◽

State Of The Art ◽

New Approach ◽

Detection Algorithms ◽

Local Community Detection ◽

The Web ◽

Filtering Approach

The number of items that users can now access when navigating on the Web is so huge that these might feel lost. Recommender systems are a way to cope with this profusion of data by suggesting items that fit the users needs. One of the most popular techniques for recommender systems is the collaborative filtering approach that relies on the preferences of items expressed by users, usually under the form of ratings. In the absence of ratings, classical collaborative filtering techniques cannot be applied. Fortunately, the behavior of users, such as their consultations, can be collected. In this paper, we present a new approach to perform collaborative filtering when no rating is available but when user consultations are known. We propose to take inspiration from local community detection algorithms to form communities of users and deduce the set of mentors of a given user. We adapt one state-of-the-art algorithm so as to fit the characteristics of collaborative filtering. Experiments conducted show that the precision achieved is higher then the baseline that does not perform any mentor selection. In addition, our model almost offsets the absence of ratings by exploiting a reduced set of mentors.

Download Full-text

Isolation-based feature Selection for Unsupervised Outlier Detection

Annual Conference of the PHM Society ◽

10.36001/phmconf.2019.v11i1.824 ◽

2019 ◽

Vol 11 (1) ◽

Author(s):

Qibo Yang ◽

Jaskaran Singh ◽

Jay Lee

Keyword(s):

Feature Selection ◽

Outlier Detection ◽

Simulated Data ◽

Support Vector ◽

Detection Algorithms ◽

Complex Interactions ◽

Laplacian Score ◽

High Dimensional Datasets ◽

Isolation Forest ◽

Unsupervised Outlier Detection

For high-dimensional datasets, bad features and complex interactions between features can cause high computational costs and make outlier detection algorithms inefficient. Most feature selection methods are designed for supervised classification and regression, and limited works are specifically for unsupervised outlier detection. This paper proposes a novel isolation-based feature selection (IBFS) method for unsupervised outlier detection. It is based on the training process of isolation forest. When a point of a feature is used to split the data, the imbalanced distribution of split data is measured and used to quantify how strong this feature can detect outliers. We also compare the proposed method with variance, Laplacian score and kurtosis. These methods are benchmarked on simulated data to show their characteristics. Then we evaluate the performance using one-class support vector machine, isolation forest and local outlier factor on several real-word datasets. The evaluation results show that the proposed method can improve the performance of isolation forest, and its results are similar to and sometimes better than another useful outlier indicator: kurtosis, which demonstrate the effectiveness of the proposed method. We also notice that sometimes variance and Laplacian score has similar performance on the datasets.

Download Full-text

A Master Slave Parallel Genetic Algorithm for Feature Selection in High Dimensional Datasets

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4184.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 379-384

Keyword(s):

Genetic Algorithm ◽

Genetic Algorithms ◽

Feature Selection ◽

Information Gain ◽

Optimal Number ◽

Good Choice ◽

High Dimensional ◽

Parallel Genetic Algorithm ◽

Efficient Manner ◽

High Dimensional Datasets

Feature Selection in High Dimensional Datasets is a combinatorial problem as it selects the optimal subsets from N dimensional data having 2N possible subsets. Genetic Algorithms are generally a good choice for feature selection in large datasets, though for some high dimensional problems it may take varied amount of time - few seconds, few hours or even few days. Therefore, it is important to use Genetic Algorithms that can give quality results in reasonably acceptable time limit. For this purpose, it is becoming necessary to implement Genetic Algorithms in an efficient manner. In this paper, a Master Slave Parallel Genetic Algorithm is implemented as a Feature Selection procedure to diminish the time intricacies of sequential genetic algorithm. This paper describes the speed gains in parallel Master-Slave Genetic Algorithm and also discusses the theoretical analysis of optimal number of slaves required for an efficient master slave implementation. The experiments are performed on three high-dimensional gene expression data. As Genetic Algorithm is a wrapper technique and takes more time to find the importance of any feature, Information Gain technique is used first as pre-processing task to remove the irrelevant features.

Download Full-text

A Combination of Shuffled Frog-Leaping Algorithm and Genetic Algorithm for Gene Selection

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2008.p0218 ◽

2008 ◽

Vol 12 (3) ◽

pp. 218-226 ◽

Cited By ~ 5

Author(s):

Cheng-San Yang ◽

◽

Li-Yeh Chuang ◽

Chao-Hsuan Ke ◽

Cheng-Hong Yang ◽

...

Keyword(s):

Gene Expression ◽

Genetic Algorithm ◽

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Expression Profiles ◽

Classification Problems ◽

Shuffled Frog Leaping Algorithm ◽

Shuffled Frog Leaping

Microarray data referencing to gene expression profiles provides valuable answers to a variety of problems, and contributes to advances in clinical medicine. The application of microarray data to the classification of cancer types has recently assumed increasing importance. The classification of microarray data samples involves feature selection, whose goal is to identify subsets of differentially expressed gene potentially relevant for distinguishing sample classes and classifier design. We propose an efficient evolutionary approach for selecting gene subsets from gene expression data that effectively achieves higher accuracy for classification problems. Our proposal combines a shuffled frog-leaping algorithm (SFLA) and a genetic algorithm (GA), and chooses genes (features) related to classification. The K-nearest neighbor (KNN) with leave-one-out cross validation (LOOCV) is used to evaluate classification accuracy. We apply a novel hybrid approach based on SFLA-GA and KNN classification and compare 11 classification problems from the literature. Experimental results show that classification accuracy obtained using selected features was higher than the accuracy of datasets without feature selection.

Download Full-text

An Effective Community Detection Method Based on Improved Genetic Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.568-570.852 ◽

2014 ◽

Vol 568-570 ◽

pp. 852-857

Author(s):

Lu Wang ◽

Yong Quan Liang ◽

Qi Jia Tian ◽

Jie Yang ◽

Chao Song ◽

...

Keyword(s):

Genetic Algorithm ◽

Community Structure ◽

Community Detection ◽

Detection Method ◽

Initial Population ◽

Crossover Operator ◽

Improved Genetic Algorithm ◽

Mutation Process ◽

Effective Community ◽

New Community

Detecting community structure from complex networks has triggered considerable attention in several application domains. This paper proposes a new community detection method based on improved genetic algorithm (named CDIGA), which tries to find the best community structure by maximizing the network modularity. String encoding is used to realize genetic representation. Parts of nodes assign their community identifiers to all of their neighbors to ensure the convergence of the algorithm and eliminate unnecessary iterations when initial population is created. Crossover operator and mutation operator are improved too, one-way crossover strategy is introduced to crossover process, the Connect validity of mutation node is ensured in mutation process. We compared it with three other algorithms in computer generated networks and real world networks, Experiment Results show that the improved algorithm is highly effective for discovering community structure.

Download Full-text

Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes

Expert Systems with Applications ◽

10.1016/j.eswa.2010.10.041 ◽

2011 ◽

Vol 38 (5) ◽

pp. 5197-5204 ◽

Cited By ~ 92

Author(s):

Mingyuan Zhao ◽

Chong Fu ◽

Luping Ji ◽

Ke Tang ◽

Mingtian Zhou

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Support Vector Machines ◽

Parameter Optimization ◽

Support Vector ◽

New Approach ◽

Vector Machines

Download Full-text

Dimensionality Reduction Algorithms on High Dimensional Datasets

EMITTER International Journal of Engineering Technology ◽

10.24003/emitter.v2i2.24 ◽

2014 ◽

Vol 2 (2) ◽

Cited By ~ 3

Author(s):

Iwan Syarif

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Dimensionality Reduction ◽

Particle Swarm ◽

Classification Problem ◽

Classification Performance ◽

High Dimensional ◽

Selection Algorithms ◽

High Dimensional Datasets ◽

Better Than

Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA‐ reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets.Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA), Particle Swarm Optmization (PSO).

Download Full-text

Feature selection of the armature winding broken coils in synchronous motor using genetic algorithm and mahalanobis distance

Archives of Metallurgy and Materials ◽

10.2478/v10172-012-0091-7 ◽

2012 ◽

Vol 57 (3) ◽

pp. 829-835 ◽

Cited By ~ 1

Author(s):

Z. Głowacz ◽

J. Kozik

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Mahalanobis Distance ◽

Distance Measure ◽

Synchronous Motor ◽

Medical Diagnostics ◽

Motor Current ◽

Feature Spaces ◽

Multidimensional Feature Spaces ◽

Selection Of

The paper describes a procedure for automatic selection of symptoms accompanying the break in the synchronous motor armature winding coils. This procedure, called the feature selection, leads to choosing from a full set of features describing the problem, such a subset that would allow the best distinguishing between healthy and damaged states. As the features the spectra components amplitudes of the motor current signals were used. The full spectra of current signals are considered as the multidimensional feature spaces and their subspaces are tested. Particular subspaces are chosen with the aid of genetic algorithm and their goodness is tested using Mahalanobis distance measure. The algorithm searches for such a subspaces for which this distance is the greatest. The algorithm is very efficient and, as it was confirmed by research, leads to good results. The proposed technique is successfully applied in many other fields of science and technology, including medical diagnostics.

Download Full-text