Effective Evolutionary Multilabel Feature Selection under a Budget Constraint

Multilabel feature selection involves the selection of relevant features from multilabeled datasets, resulting in improved multilabel learning accuracy. Evolutionary search-based multilabel feature selection methods have proved useful for identifying a compact feature subset by successfully improving the accuracy of multilabel classification. However, conventional methods frequently violate budget constraints or result in inefficient searches due to ineffective exploration of important features. In this paper, we present an effective evolutionary search-based feature selection method for multilabel classification with a budget constraint. The proposed method employs a novel exploration operation to enhance the search capabilities of a traditional genetic search, resulting in improved multilabel classification. Empirical studies using 20 real-world datasets demonstrate that the proposed method outperforms conventional multilabel feature selection methods.

Download Full-text

Evolutionary Multilabel Feature Selection Using Promising Feature Subset Generation

Journal of Sensors ◽

10.1155/2018/3419213 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12

Author(s):

Jaesung Lee ◽

Wangduk Seo ◽

Ho Han ◽

Dae-Won Kim

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Search Space ◽

Sensor Data ◽

Limited Resources ◽

Feature Subset ◽

Selection Methods ◽

Multilabel Learning ◽

Information Harvesting ◽

Generation Procedure

Recent progress in the development of sensor devices improves information harvesting and allows complex but intelligent applications based on learning hidden relations between collected sensor data and objectives. In this scenario, multilabel feature selection can play an important role in achieving better learning accuracy when constrained with limited resources. However, existing multilabel feature selection methods are search-ineffective because generated feature subsets frequently include unimportant features. In addition, only a few feature subsets compared to the search space are considered, yielding feature subsets with low multilabel learning accuracy. In this study, we propose an effective multilabel feature selection method based on a novel feature subset generation procedure. Experimental results demonstrate that the proposed method can identify better feature subsets than conventional methods.

Download Full-text

Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising

Data Technologies and Applications ◽

10.1108/dta-09-2021-0233 ◽

2022 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Deepti Sisodia ◽

Dilip Singh Sisodia

Keyword(s):

Feature Selection ◽

Online Advertising ◽

Feature Selection Method ◽

Majority Voting ◽

Feature Subset ◽

Relevant Feature ◽

Selection Methods ◽

Content Type ◽

Optimal Feature Subset ◽

Optimal Feature

PurposeThe problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection.Design/methodology/approachTo overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models.FindingsEmpirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification.Originality/valueThe FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.

Download Full-text

The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v8i0.96 ◽

2016 ◽

Vol 8 ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Fatemeh Alighardashi ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Filter Method ◽

Selection Methods ◽

Software Projects ◽

Software Fault Prediction ◽

Software Fault

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

Constructing an Emotion Estimation Model Based on EEG/HRV Indexes Using Feature Extraction and Feature Selection Algorithms

Sensors ◽

10.3390/s21092910 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2910

Author(s):

Kei Suzuki ◽

Tipporn Laohakangvalvit ◽

Ryota Matsubara ◽

Midori Sugaya

Keyword(s):

Feature Selection ◽

Single Channel ◽

Feature Selection Method ◽

Classification Model ◽

Features Selection ◽

Selection Methods ◽

Emotion Classification ◽

Model Based ◽

Physiological Indexes ◽

Emotion Estimation

In human emotion estimation using an electroencephalogram (EEG) and heart rate variability (HRV), there are two main issues as far as we know. The first is that measurement devices for physiological signals are expensive and not easy to wear. The second is that unnecessary physiological indexes have not been removed, which is likely to decrease the accuracy of machine learning models. In this study, we used single-channel EEG sensor and photoplethysmography (PPG) sensor, which are inexpensive and easy to wear. We collected data from 25 participants (18 males and 7 females) and used a deep learning algorithm to construct an emotion classification model based on Arousal–Valence space using several feature combinations obtained from physiological indexes selected based on our criteria including our proposed feature selection methods. We then performed accuracy verification, applying a stratified 10-fold cross-validation method to the constructed models. The results showed that model accuracies are as high as 90% to 99% by applying the features selection methods we proposed, which suggests that a small number of physiological indexes, even from inexpensive sensors, can be used to construct an accurate emotion classification model if an appropriate feature selection method is applied. Our research results contribute to the improvement of an emotion classification model with a higher accuracy, less cost, and that is less time consuming, which has the potential to be further applied to various areas of applications.

Download Full-text

Predicting the Severity of Bug Reports Based on Feature Selection

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194018500158 ◽

2018 ◽

Vol 28 (04) ◽

pp. 537-558 ◽

Cited By ~ 4

Author(s):

Wenjie Liu ◽

Shanshan Wang ◽

Xin Chen ◽

He Jiang

Keyword(s):

Feature Selection ◽

Software Maintenance ◽

Feature Selection Method ◽

Selection Methods ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Bug Reports ◽

Single Feature ◽

Bug Report ◽

Severity Prediction

In software maintenance process, it is a fairly important activity to predict the severity of bug reports. However, manually identifying the severity of bug reports is a tedious and time-consuming task. So developing automatic judgment methods for predicting the severity of bug reports has become an urgent demand. In general, a bug report contains a lot of descriptive natural language texts, thus resulting in a high-dimensional feature set which poses serious challenges to traditionally automatic methods. Therefore, we attempt to use automatic feature selection methods to improve the performance of the severity prediction of bug reports. In this paper, we introduce a ranking-based strategy to improve existing feature selection algorithms and propose an ensemble feature selection algorithm by combining existing ones. In order to verify the performance of our method, we run experiments over the bug reports of Eclipse and Mozilla and conduct comparisons with eight commonly used feature selection methods. The experiment results show that the ranking-based strategy can effectively improve the performance of the severity prediction of bug reports by up to 54.76% on average in terms of [Formula: see text]-measure, and it also can significantly reduce the dimension of the feature set. Meanwhile, the ensemble feature selection method can get better results than a single feature selection algorithm.

Download Full-text

Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

BioMed Research International ◽

10.1155/2015/703768 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 11

Author(s):

Jin-Jia Wang ◽

Fang Xue ◽

Hui Li

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Group Lasso ◽

High Dimensional ◽

Test Accuracy ◽

Gradient Descent Method ◽

Feature Subset ◽

Eeg Signals ◽

Sparse Group Lasso ◽

Selection Of

Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs). Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

Download Full-text

MRFGRO: a hybrid meta-heuristic feature selection method for screening COVID-19 using deep features

Scientific Reports ◽

10.1038/s41598-021-02731-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Arijit Dey ◽

Soham Chattopadhyay ◽

Pawan Kumar Singh ◽

Ali Ahmadian ◽

Massimiliano Ferrara ◽

...

Keyword(s):

Feature Selection ◽

Golden Ratio ◽

Medical Image Analysis ◽

Feature Selection Method ◽

World Health ◽

Significant Feature ◽

Feature Subset ◽

Upper Respiratory Tract ◽

Global Pandemic ◽

Health Organization

AbstractCOVID-19 is a respiratory disease that causes infection in both lungs and the upper respiratory tract. The World Health Organization (WHO) has declared it a global pandemic because of its rapid spread across the globe. The most common way for COVID-19 diagnosis is real-time reverse transcription-polymerase chain reaction (RT-PCR) which takes a significant amount of time to get the result. Computer based medical image analysis is more beneficial for the diagnosis of such disease as it can give better results in less time. Computed Tomography (CT) scans are used to monitor lung diseases including COVID-19. In this work, a hybrid model for COVID-19 detection has developed which has two key stages. In the first stage, we have fine-tuned the parameters of the pre-trained convolutional neural networks (CNNs) to extract some features from the COVID-19 affected lungs. As pre-trained CNNs, we have used two standard CNNs namely, GoogleNet and ResNet18. Then, we have proposed a hybrid meta-heuristic feature selection (FS) algorithm, named as Manta Ray Foraging based Golden Ratio Optimizer (MRFGRO) to select the most significant feature subset. The proposed model is implemented over three publicly available datasets, namely, COVID-CT dataset, SARS-COV-2 dataset, and MOSMED dataset, and attains state-of-the-art classification accuracies of 99.15%, 99.42% and 95.57% respectively. Obtained results confirm that the proposed approach is quite efficient when compared to the local texture descriptors used for COVID-19 detection from chest CT-scan images.

Download Full-text

Feature selection method based on Menger curvature and LDA theory for a P300 brain-computer interface

Journal of Neural Engineering ◽

10.1088/1741-2552/ac42b4 ◽

2021 ◽

Author(s):

ShuRui Li ◽

Jing Jin ◽

Ian Daly ◽

Chang Liu ◽

Andrzej Cichocki

Keyword(s):

Feature Selection ◽

Brain Computer Interface ◽

Feature Selection Method ◽

Event Related Potentials ◽

Selection Method ◽

Computer Interface ◽

Feature Subset ◽

Linear Discriminant ◽

Related Potentials ◽

Menger Curvature

Abstract Brain–computer interface (BCI) systems decode electroencephalogram signals to establish a channel for direct interaction between the human brain and the external world without the need for muscle or nerve control. The P300 speller, one of the most widely used BCI applications, presents a selection of characters to the user and performs character recognition by identifying P300 event-related potentials from the EEG. Such P300-based BCI systems can reach good levels of accuracy but are difficult to use in day-to-day life due to redundancy and noisy signal. A room for improvement should be considered. We propose a novel hybrid feature selection method for the P300-based BCI system to address the problem of feature redundancy, which combines the Menger curvature and linear discriminant analysis. First, selected strategies are applied separately to a given dataset to estimate the gain for application to each feature. Then, each generated value set is ranked in descending order and judged by a predefined criterion to be suitable in classification models. The intersection of the two approaches is then evaluated to identify an optimal feature subset. The proposed method is evaluated using three public datasets, i.e., BCI Competition III dataset II, BNCI Horizon dataset, and EPFL dataset. Experimental results indicate that compared with other typical feature selection and classification methods, our proposed method has better or comparable performance. Additionally, our proposed method can achieve the best classification accuracy after all epochs in three datasets. In summary, our proposed method provides a new way to enhance the performance of the P300-based BCI speller.

Download Full-text

A New Feature Selection Method for Text Classification Based on Independent Feature Space Search

Mathematical Problems in Engineering ◽

10.1155/2020/6076272 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14 ◽

Cited By ~ 3

Author(s):

Yong Liu ◽

Shenggen Ju ◽

Junfeng Wang ◽

Chong Su

Keyword(s):

Feature Selection ◽

Text Classification ◽

Predictive Accuracy ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

The Other ◽

Feature Subset ◽

Search Range ◽

Text Documents

Feature selection method is designed to select the representative feature subsets from the original feature set by different evaluation of feature relevance, which focuses on reducing the dimension of the features while maintaining the predictive accuracy of a classifier. In this study, we propose a feature selection method for text classification based on independent feature space search. Firstly, a relative document-term frequency difference (RDTFD) method is proposed to divide the features in all text documents into two independent feature sets according to the features’ ability to discriminate the positive and negative samples, which has two important functions: one is to improve the high class correlation of the features and reduce the correlation between the features and the other is to reduce the search range of feature space and maintain appropriate feature redundancy. Secondly, the feature search strategy is used to search the optimal feature subset in independent feature space, which can improve the performance of text classification. Finally, we evaluate several experiments conduced on six benchmark corpora, the experimental results show the RDTFD method based on independent feature space search is more robust than the other feature selection methods.

Download Full-text