scholarly journals A Multi-Objective Multi-Label Feature Selection Algorithm Based on Shapley Value

Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 1094
Author(s):  
Hongbin Dong ◽  
Jing Sun ◽  
Xiaohang Sun

Multi-label learning is dedicated to learning functions so that each sample is labeled with a true label set. With the increase of data knowledge, the feature dimensionality is increasing. However, high-dimensional information may contain noisy data, making the process of multi-label learning difficult. Feature selection is a technical approach that can effectively reduce the data dimension. In the study of feature selection, the multi-objective optimization algorithm has shown an excellent global optimization performance. The Pareto relationship can handle contradictory objectives in the multi-objective problem well. Therefore, a Shapley value-fused feature selection algorithm for multi-label learning (SHAPFS-ML) is proposed. The method takes multi-label criteria as the optimization objectives and the proposed crossover and mutation operators based on Shapley value are conducive to identifying relevant, redundant and irrelevant features. The comparison of experimental results on real-world datasets reveals that SHAPFS-ML is an effective feature selection method for multi-label classification, which can reduce the classification algorithm’s computational complexity and improve the classification accuracy.

2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Zilin Zeng ◽  
Hongjun Zhang ◽  
Rui Zhang ◽  
Youliang Zhang

Feature interaction has gained considerable attention recently. However, many feature selection methods considering interaction are only designed for categorical features. This paper proposes a mixed feature selection algorithm based on neighborhood rough sets that can be used to search for interacting features. In this paper, feature relevance, feature redundancy, and feature interaction are defined in the framework of neighborhood rough sets, the neighborhood interaction weight factor reflecting whether a feature is redundant or interactive is proposed, and a neighborhood interaction weight based feature selection algorithm (NIWFS) is brought forward. To evaluate the performance of the proposed algorithm, we compare NIWFS with other three feature selection algorithms, including INTERACT, NRS, and NMI, in terms of the classification accuracies and the number of selected features with C4.5 and IB1. The results from ten real world datasets indicate that NIWFS not only deals with mixed datasets directly, but also reduces the dimensionality of feature space with the highest average accuracies.


Author(s):  
Wenjie Liu ◽  
Shanshan Wang ◽  
Xin Chen ◽  
He Jiang

In software maintenance process, it is a fairly important activity to predict the severity of bug reports. However, manually identifying the severity of bug reports is a tedious and time-consuming task. So developing automatic judgment methods for predicting the severity of bug reports has become an urgent demand. In general, a bug report contains a lot of descriptive natural language texts, thus resulting in a high-dimensional feature set which poses serious challenges to traditionally automatic methods. Therefore, we attempt to use automatic feature selection methods to improve the performance of the severity prediction of bug reports. In this paper, we introduce a ranking-based strategy to improve existing feature selection algorithms and propose an ensemble feature selection algorithm by combining existing ones. In order to verify the performance of our method, we run experiments over the bug reports of Eclipse and Mozilla and conduct comparisons with eight commonly used feature selection methods. The experiment results show that the ranking-based strategy can effectively improve the performance of the severity prediction of bug reports by up to 54.76% on average in terms of [Formula: see text]-measure, and it also can significantly reduce the dimension of the feature set. Meanwhile, the ensemble feature selection method can get better results than a single feature selection algorithm.


2021 ◽  
Vol 336 ◽  
pp. 08008
Author(s):  
Tao Xie

In order to improve the detection rate and speed of intrusion detection system, this paper proposes a feature selection algorithm. The algorithm uses information gain to rank the features in descending order, and then uses a multi-objective genetic algorithm to gradually search the ranking features to find the optimal feature combination. We classified the Kddcup98 dataset into five classes, DOS, PROBE, R2L, and U2R, and conducted numerous experiments on each class. Experimental results show that for each class of attack, the proposed algorithm can not only speed up the feature selection, but also significantly improve the detection rate of the algorithm.


Author(s):  
Chunyong Yin ◽  
Luyu Ma ◽  
Lu Feng

Intrusion detection is a kind of security mechanism which is used to detect attacks and intrusion behaviors. Due to the low accuracy and the high false positive rate of the existing clonal selection algorithms applied to intrusion detection, in this paper, we proposed a feature selection method for improved clonal algorithm. The improved method detects the intrusion behavior by selecting the best individual overall and clones them. Experimental results show that the feature selection algorithm is better than the traditional feature selection algorithm on the different classifiers, and it is shown that the final detection results are better than traditional clonal algorithm with 99.6% accuracy and 0.1% false positive rate.


Author(s):  
J. V. D. Prasad ◽  
A. Raghuvira Pratap ◽  
Babu Sallagundla

With the rapid increase in number of clinical data and hence the prediction and analysing data becomes very difficult. With the help of various machine learning models, it becomes easy to work on these huge data. A machine learning model faces lots of challenges; one among the challenge is feature selection. In this research work, we propose a novel feature selection method based on statistical procedures to increase the performance of the machine learning model. Furthermore, we have tested the feature selection algorithm in liver disease classification dataset and the results obtained shows the efficiency of the proposed method.


2020 ◽  
Author(s):  
Esra Sarac Essiz ◽  
Murat Oturakci

Abstract As a nature-inspired algorithm, artificial bee colony (ABC) is an optimization algorithm that is inspired by the search behaviour of honey bees. The main aim of this study is to examine the effects of the ABC-based feature selection algorithm on classification performance for cyberbullying, which has become a significant worldwide social issue in recent years. With this purpose, the classification performance of the proposed ABC-based feature selection method is compared with three different traditional methods such as information gain, ReliefF and chi square. Experimental results present that ABC-based feature selection method outperforms than three traditional methods for the detection of cyberbullying. The Macro averaged F_measure of the data set is increased from 0.659 to 0.8 using proposed ABC-based feature selection method.


Author(s):  
Zinat Ansari

Background: Health economics are amongst academic fields which can aid in ameliorating conditions so as to perform better decisions in regards to the economy such as determining cash prices. The prediction of ending cash is fundamental for internal and external users and can come quite handy in terms of health economics. The most important purpose of financial reporting is the presentation of information to predict ending cash. Ergo, the aim of the research is to predict ending cash value using feature selection and MLR method from 2010-2012. Methods: A feature selection algorithm (Best-First, Greedy-Stepwise and Ranker) was employed in this research to nominate relevant data that affect ending cash. Results: Based on the results of the deployed feature selection method, the following features were indicated as the most relevant in terms of determine ending cash: interest payments for loans, dividends received from short and long term deposits, total net flow of investment activities, net increase (decrease) in cash and beginning cash based on best-first (CFS-Subset-Evaluation) and Greedy-Stepwise (CFS-Subset-Evaluation). Net out flow, dividends, dividends paid, interest payments for loans and dividends received deposits for short and long term were the most important data as indicated by the Ranker (Info-Gain-Attribute-Evaluation, Gain-Ratio-Attribute-Evaluation and Symmetricer-Attribute-Evaluation). According to Ranker (Principal-Components and Relifef-FAttribute-Evaluation) the best data for determining ending cash include beginning cash, interest payments for loans, dividends, net increase (decrease) in cash and dividends received from short and long term deposits. The findings were also indicative of a positive and highly significant correlation between dividends received from short and long term deposits and beginning cash (1.00**), with a significance level of 0.01, whereas the observed correlation between interest payments for loans and ending cash (0.999**), at a significance level of 0.01 was negatively significant. Conclusions: The present research attempted to reduce the volume of data required for predicting end cash by means of employing a feature selection so as to save both precious money and time.


2020 ◽  
Vol 59 (04/05) ◽  
pp. 151-161
Author(s):  
Yuchen Fei ◽  
Fengyu Zhang ◽  
Chen Zu ◽  
Mei Hong ◽  
Xingchen Peng ◽  
...  

Abstract Background An accurate and reproducible method to delineate tumor margins is of great importance in clinical diagnosis and treatment. In nasopharyngeal carcinoma (NPC), due to limitations such as high variability, low contrast, and discontinuous boundaries in presenting soft tissues, tumor margin can be extremely difficult to identify in magnetic resonance imaging (MRI), increasing the challenge of NPC segmentation task. Objectives The purpose of this work is to develop a semiautomatic algorithm for NPC image segmentation with minimal human intervention, while it is also capable of delineating tumor margins with high accuracy and reproducibility. Methods In this paper, we propose a novel feature selection algorithm for the identification of the margin of NPC image, named as modified random forest recursive feature selection (MRF-RFS). Specifically, to obtain a more discriminative feature subset for segmentation, a modified recursive feature selection method is applied to the original handcrafted feature set. Moreover, we combine the proposed feature selection method with the classical random forest (RF) in the training stage to take full advantage of its intrinsic property (i.e., feature importance measure). Results To evaluate the segmentation performance, we verify our method on the T1-weighted MRI images of 18 NPC patients. The experimental results demonstrate that the proposed MRF-RFS method outperforms the baseline methods and deep learning methods on the task of segmenting NPC images. Conclusion The proposed method could be effective in NPC diagnosis and useful for guiding radiation therapy.


2014 ◽  
Vol 1030-1032 ◽  
pp. 1709-1712
Author(s):  
Kai Min Song ◽  
Xun Yi Ren

Through the research on the flow identification algorithm based on statistical feature, this paper puts forward the statistical feature selection algorithm in order to reduce the number of features in identification, increase the speed of the flow identification, the experimental results show that the algorithm can effectively reduce the amount of features, improve the efficiency of identification.


Sign in / Sign up

Export Citation Format

Share Document