scholarly journals A Mixed Feature Selection Method Considering Interaction

2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Zilin Zeng ◽  
Hongjun Zhang ◽  
Rui Zhang ◽  
Youliang Zhang

Feature interaction has gained considerable attention recently. However, many feature selection methods considering interaction are only designed for categorical features. This paper proposes a mixed feature selection algorithm based on neighborhood rough sets that can be used to search for interacting features. In this paper, feature relevance, feature redundancy, and feature interaction are defined in the framework of neighborhood rough sets, the neighborhood interaction weight factor reflecting whether a feature is redundant or interactive is proposed, and a neighborhood interaction weight based feature selection algorithm (NIWFS) is brought forward. To evaluate the performance of the proposed algorithm, we compare NIWFS with other three feature selection algorithms, including INTERACT, NRS, and NMI, in terms of the classification accuracies and the number of selected features with C4.5 and IB1. The results from ten real world datasets indicate that NIWFS not only deals with mixed datasets directly, but also reduces the dimensionality of feature space with the highest average accuracies.

Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 1094
Author(s):  
Hongbin Dong ◽  
Jing Sun ◽  
Xiaohang Sun

Multi-label learning is dedicated to learning functions so that each sample is labeled with a true label set. With the increase of data knowledge, the feature dimensionality is increasing. However, high-dimensional information may contain noisy data, making the process of multi-label learning difficult. Feature selection is a technical approach that can effectively reduce the data dimension. In the study of feature selection, the multi-objective optimization algorithm has shown an excellent global optimization performance. The Pareto relationship can handle contradictory objectives in the multi-objective problem well. Therefore, a Shapley value-fused feature selection algorithm for multi-label learning (SHAPFS-ML) is proposed. The method takes multi-label criteria as the optimization objectives and the proposed crossover and mutation operators based on Shapley value are conducive to identifying relevant, redundant and irrelevant features. The comparison of experimental results on real-world datasets reveals that SHAPFS-ML is an effective feature selection method for multi-label classification, which can reduce the classification algorithm’s computational complexity and improve the classification accuracy.


Author(s):  
Wenjie Liu ◽  
Shanshan Wang ◽  
Xin Chen ◽  
He Jiang

In software maintenance process, it is a fairly important activity to predict the severity of bug reports. However, manually identifying the severity of bug reports is a tedious and time-consuming task. So developing automatic judgment methods for predicting the severity of bug reports has become an urgent demand. In general, a bug report contains a lot of descriptive natural language texts, thus resulting in a high-dimensional feature set which poses serious challenges to traditionally automatic methods. Therefore, we attempt to use automatic feature selection methods to improve the performance of the severity prediction of bug reports. In this paper, we introduce a ranking-based strategy to improve existing feature selection algorithms and propose an ensemble feature selection algorithm by combining existing ones. In order to verify the performance of our method, we run experiments over the bug reports of Eclipse and Mozilla and conduct comparisons with eight commonly used feature selection methods. The experiment results show that the ranking-based strategy can effectively improve the performance of the severity prediction of bug reports by up to 54.76% on average in terms of [Formula: see text]-measure, and it also can significantly reduce the dimension of the feature set. Meanwhile, the ensemble feature selection method can get better results than a single feature selection algorithm.


Author(s):  
Chunyong Yin ◽  
Luyu Ma ◽  
Lu Feng

Intrusion detection is a kind of security mechanism which is used to detect attacks and intrusion behaviors. Due to the low accuracy and the high false positive rate of the existing clonal selection algorithms applied to intrusion detection, in this paper, we proposed a feature selection method for improved clonal algorithm. The improved method detects the intrusion behavior by selecting the best individual overall and clones them. Experimental results show that the feature selection algorithm is better than the traditional feature selection algorithm on the different classifiers, and it is shown that the final detection results are better than traditional clonal algorithm with 99.6% accuracy and 0.1% false positive rate.


Author(s):  
Luís Cavique ◽  
Armando B. Mendes ◽  
Matthias Funk ◽  
Jorge M. A. Santos

A paremiologic (study of proverbs) case is presented as part of a wider project based on data collected among the Azorean population. Given the considerable distance between the Azores islands, the authors present the hypothesis that there are significant differences in the proverbs from each island, thus permitting the identification of the native island of the interviewee based on his or her knowledge of proverbs. In this chapter, a feature selection algorithm that combines Rough Sets and the Logical Analysis of Data (LAD) is presented. The algorithm named LAID (Logical Analysis of Inconsistent Data) deals with noisy data, and the authors believe that an important link was established between the two different schools with similar approaches. The algorithm was applied to a real world dataset based on data collected using thousands of interviews of Azoreans, involving an initial set of twenty-two thousand Portuguese proverbs.


Author(s):  
J. V. D. Prasad ◽  
A. Raghuvira Pratap ◽  
Babu Sallagundla

With the rapid increase in number of clinical data and hence the prediction and analysing data becomes very difficult. With the help of various machine learning models, it becomes easy to work on these huge data. A machine learning model faces lots of challenges; one among the challenge is feature selection. In this research work, we propose a novel feature selection method based on statistical procedures to increase the performance of the machine learning model. Furthermore, we have tested the feature selection algorithm in liver disease classification dataset and the results obtained shows the efficiency of the proposed method.


2020 ◽  
Author(s):  
Esra Sarac Essiz ◽  
Murat Oturakci

Abstract As a nature-inspired algorithm, artificial bee colony (ABC) is an optimization algorithm that is inspired by the search behaviour of honey bees. The main aim of this study is to examine the effects of the ABC-based feature selection algorithm on classification performance for cyberbullying, which has become a significant worldwide social issue in recent years. With this purpose, the classification performance of the proposed ABC-based feature selection method is compared with three different traditional methods such as information gain, ReliefF and chi square. Experimental results present that ABC-based feature selection method outperforms than three traditional methods for the detection of cyberbullying. The Macro averaged F_measure of the data set is increased from 0.659 to 0.8 using proposed ABC-based feature selection method.


Author(s):  
Zinat Ansari

Background: Health economics are amongst academic fields which can aid in ameliorating conditions so as to perform better decisions in regards to the economy such as determining cash prices. The prediction of ending cash is fundamental for internal and external users and can come quite handy in terms of health economics. The most important purpose of financial reporting is the presentation of information to predict ending cash. Ergo, the aim of the research is to predict ending cash value using feature selection and MLR method from 2010-2012. Methods: A feature selection algorithm (Best-First, Greedy-Stepwise and Ranker) was employed in this research to nominate relevant data that affect ending cash. Results: Based on the results of the deployed feature selection method, the following features were indicated as the most relevant in terms of determine ending cash: interest payments for loans, dividends received from short and long term deposits, total net flow of investment activities, net increase (decrease) in cash and beginning cash based on best-first (CFS-Subset-Evaluation) and Greedy-Stepwise (CFS-Subset-Evaluation). Net out flow, dividends, dividends paid, interest payments for loans and dividends received deposits for short and long term were the most important data as indicated by the Ranker (Info-Gain-Attribute-Evaluation, Gain-Ratio-Attribute-Evaluation and Symmetricer-Attribute-Evaluation). According to Ranker (Principal-Components and Relifef-FAttribute-Evaluation) the best data for determining ending cash include beginning cash, interest payments for loans, dividends, net increase (decrease) in cash and dividends received from short and long term deposits. The findings were also indicative of a positive and highly significant correlation between dividends received from short and long term deposits and beginning cash (1.00**), with a significance level of 0.01, whereas the observed correlation between interest payments for loans and ending cash (0.999**), at a significance level of 0.01 was negatively significant. Conclusions: The present research attempted to reduce the volume of data required for predicting end cash by means of employing a feature selection so as to save both precious money and time.


2010 ◽  
Vol 4 (8) ◽  
Author(s):  
Vandar Kuzhali Jagannathan ◽  
Rajendran Govind ◽  
Srinivasan V ◽  
Siva Kumar Ganapathi

Sign in / Sign up

Export Citation Format

Share Document