Obtaining stable feature selection with heuristic algorithms from density-based feature groups

Software quality modeling is the process of using software metrics from previous iterations of development to locate potentially faulty modules in current under-development code. This has become an important part of the software development process, allowing practitioners to focus development efforts where they are most needed. One difficulty encountered in software quality modeling is the problem of high dimensionality, where the number of available software metrics is too large for a classifier to work well. In this case, many of the metrics may be redundant or irrelevant to defect prediction results, thereby selecting a subset of software metrics that are the best predictors becomes important. This process is called feature (metric) selection. There are three major forms of feature selection: filter-based feature rankers, which uses statistical measures to assign a score to each feature and present the user with a ranked list; filter-based feature subset evaluation, which uses statistical measures on feature subsets to find the best feature subset; and wrapper-based subset selection, which builds classification models using different subsets to find the one which maximizes performance. Software practitioners are interested in which feature selection methods are best at providing the most stable feature subset in the face of changes to the data (here, the addition or removal of instances). In this study we select feature subsets using fifteen feature selection methods and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the stability of the feature selection methods. We evaluate the stability of feature selection methods on a pair of subsamples generated by our fixed-overlap partitions algorithm. Four different levels of overlap are considered in this study. 13 software metric datasets from two real-world software projects are used in this study. Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability. In addition, as the overlap of partitions increased, the stability of the feature selection strategies increased.

Download Full-text

A variance reduction framework for stable feature selection

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.11152 ◽

2012 ◽

Vol 5 (5) ◽

pp. 428-445 ◽

Cited By ~ 15

Author(s):

Yue Han ◽

Lei Yu

Keyword(s):

Feature Selection ◽

Variance Reduction ◽

Stable Feature

Download Full-text

A self-adaptive multi-objective feature selection approach for classification problems

Integrated Computer-Aided Engineering ◽

10.3233/ica-210664 ◽

2021 ◽

pp. 1-19

Author(s):

Yu Xue ◽

Haokai Zhu ◽

Ferrante Neri

Keyword(s):

Feature Selection ◽

High Performance ◽

Heuristic Algorithms ◽

Classification Performance ◽

Classification Problems ◽

Main Concept ◽

Detection Mechanism ◽

Multi Objective ◽

Feature Selection Approach ◽

Self Adaptive

In classification tasks, feature selection (FS) can reduce the data dimensionality and may also improve classification accuracy, both of which are commonly treated as the two objectives in FS problems. Many meta-heuristic algorithms have been applied to solve the FS problems and they perform satisfactorily when the problem is relatively simple. However, once the dimensionality of the datasets grows, their performance drops dramatically. This paper proposes a self-adaptive multi-objective genetic algorithm (SaMOGA) for FS, which is designed to maintain a high performance even when the dimensionality of the datasets grows. The main concept of SaMOGA lies in the dynamic selection of five different crossover operators in different evolution process by applying a self-adaptive mechanism. Meanwhile, a search stagnation detection mechanism is also proposed to prevent premature convergence. In the experiments, we compare SaMOGA with five multi-objective FS algorithms on sixteen datasets. According to the experimental results, SaMOGA yields a set of well converged and well distributed solutions on most data sets, indicating that SaMOGA can guarantee classification performance while removing many features, and the advantage over its counterparts is more obvious when the dimensionality of datasets grows.

Download Full-text

A new method to improve feature selection with meta-heuristic algorithm and chaos theory

Computer Engineering and Applications Journal ◽

10.18495/comengapp.v7i1.225 ◽

2018 ◽

Vol 7 (1) ◽

pp. 9-24

Author(s):

Mohammad Masoud Javidi

Keyword(s):

Feature Selection ◽

Evolutionary Algorithms ◽

Random Process ◽

Chaos Theory ◽

Heuristic Algorithms ◽

Large Data ◽

Data Set ◽

Suitable Alternative ◽

Optimal Subset ◽

Fields Of Study

Finding a subset of features from a large data set is a problem that arises in many fields of study. It is important to have an effective subset of features that is selected for the system to provide acceptable performance. This will lead us in a direction that to use meta-heuristic algorithms to find the optimal subset of features. The performance of evolutionary algorithms is dependent on many parameters which have significant impact on its performance, and these algorithms usually use a random process to set parameters. The nature of chaos is apparently random and unpredictable; however it also deterministic, it can suitable alternative instead of random process in meta-heuristic algorithms

Download Full-text

Obtaining stable feature selection with heuristic algorithms from density-based feature groups

Evaluation of Meta-Heuristic Algorithms for Stable Feature Selection

Stable feature selection using MRMR algorithm

A stable feature selection approach for optimizing traffic classification based on adaptive threshold

A Fuzzy Aggregation based Ensemble Framework for Accurate and Stable Feature Selection

Margin Based Sample Weighting for Stable Feature Selection

Stable feature selection based on instance learning, redundancy elimination and efficient subsets fusion

On the Stability of Feature Selection Methods in Software Quality Prediction: An Empirical Investigation

A variance reduction framework for stable feature selection

A self-adaptive multi-objective feature selection approach for classification problems

A new method to improve feature selection with meta-heuristic algorithm and chaos theory

Export Citation Format