Feature relevance term variation for multi-label feature selection

Author(s):  
Ping Zhang ◽  
Wanfu Gao
2015 ◽  
Vol 1 (311) ◽  
Author(s):  
Mariusz Kubus

Feature selection methods are usually classified into three groups: filters, wrappers and embedded methods. The second important criterion of their classification is an individual or multivariate approach to evaluation of the feature relevance. The chessboard problem is an illustrative example, where two variables which have no individual influence on the dependent variable can be essential to separate the classes. The classifiers which deal well with such data structure are sensitive to irrelevant variables. The generalization error increases with the number of noisy variables. We discuss the feature selection methods in the context of chessboard-like structure in the data with numerous irrelevant variables.


2020 ◽  
Vol 10 (15) ◽  
pp. 5170
Author(s):  
José Alberto Hernández-Muriel ◽  
Jhon Bryan Bermeo-Ulloa ◽  
Mauricio Holguin-Londoño ◽  
Andrés Marino Álvarez-Meza ◽  
Álvaro Angel Orozco-Gutiérrez

Nowadays, bearings installed in industrial electric motors are constituted as the primary mode of a failure affecting the global energy consumption. Since industries’ energy demand has a growing tendency, interest for efficient maintenance in electric motors is decisive. Vibration signals from bearings are employed commonly as a non-invasive approach to support fault diagnosis and severity evaluation of rotating machinery. However, vibration-based diagnosis poses a challenge concerning the signal properties, e.g., highly dynamic and non-stationary. Here, we introduce a knowledge-based tool to analyze multiple health conditions in bearings. Our approach includes a stochastic feature selection method, termed Stochastic Feature Selection (SFS), highlighting and interpreting relevant multi-domain attributes (time, frequency, and time–frequency) related to the bearing faults discriminability. In particular, a relief-F-based ranking and a Hidden Markov Model are trained under a windowing scheme to achieve our SFS. Obtained results in a public database demonstrate that our proposal is competitive compared to state-of-the-art algorithms concerning both the number of features selected and the classification accuracy.


Author(s):  
ALEXSEY LIAS-RODRÍGUEZ ◽  
GUILLERMO SANCHEZ-DIAZ

Typical testors are useful tools for feature selection and for determining feature relevance in supervised classification problems. Nowadays, computing all typical testors of a training matrix is very expensive; all reported algorithms have exponential complexity depending on the number of columns in the matrix. In this paper, we introduce the faster algorithm BR (Boolean Recursive), called fast-BR algorithm, that is based on elimination of gaps and reduction of columns. Fast-BR algorithm is designed to generate all typical testors from a training matrix, requiring a reduced number of operations. Experimental results using this fast implementation and the comparison with other state-of-the-art related algorithms that generate typical testors are presented.


Author(s):  
Jia Zhang ◽  
Yidong Lin ◽  
Min Jiang ◽  
Shaozi Li ◽  
Yong Tang ◽  
...  

Information theoretical based methods have attracted a great attention in recent years, and gained promising results to deal with multi-label data with high dimensionality. However, most of the existing methods are either directly transformed from heuristic single-label feature selection methods or inefficient in exploiting labeling information. Thus, they may not be able to get an optimal feature selection result shared by multiple labels. In this paper, we propose a general global optimization framework, in which feature relevance, label relevance (i.e., label correlation), and feature redundancy are taken into account, thus facilitating multi-label feature selection. Moreover, the proposed method has an excellent mechanism for utilizing inherent properties of multi-label learning. Specially, we provide a formulation to extend the proposed method with label-specific features. Empirical studies on twenty multi-label data sets reveal the effectiveness and efficiency of the proposed method. Our implementation of the proposed method is available online at: https://jiazhang-ml.pub/GRRO-master.zip.


Symmetry ◽  
2019 ◽  
Vol 11 (7) ◽  
pp. 858 ◽  
Author(s):  
Jun Liang ◽  
Liang Hou ◽  
Zhenhua Luan ◽  
Weiping Huang

Feature interaction is a newly proposed feature relevance relationship, but the unintentional removal of interactive features can result in poor classification performance for this relationship. However, traditional feature selection algorithms mainly focus on detecting relevant and redundant features while interactive features are usually ignored. To deal with this problem, feature relevance, feature redundancy and feature interaction are redefined based on information theory. Then a new feature selection algorithm named CMIFSI (Conditional Mutual Information based Feature Selection considering Interaction) is proposed in this paper, which makes use of conditional mutual information to estimate feature redundancy and interaction, respectively. To verify the effectiveness of our algorithm, empirical experiments are conducted to compare it with other several representative feature selection algorithms. The results on both synthetic and benchmark datasets indicate that our algorithm achieves better results than other methods in most cases. Further, it highlights the necessity of dealing with feature interaction.


Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1617
Author(s):  
Lingbo Gao ◽  
Yiqiang Wang ◽  
Yonghao Li ◽  
Ping Zhang ◽  
Liang Hu

With the rapid growth of the Internet, the curse of dimensionality caused by massive multi-label data has attracted extensive attention. Feature selection plays an indispensable role in dimensionality reduction processing. Many researchers have focused on this subject based on information theory. Here, to evaluate feature relevance, a novel feature relevance term (FR) that employs three incremental information terms to comprehensively consider three key aspects (candidate features, selected features, and label correlations) is designed. A thorough examination of the three key aspects of FR outlined above is more favorable to capturing the optimal features. Moreover, we employ label-related feature redundancy as the label-related feature redundancy term (LR) to reduce unnecessary redundancy. Therefore, a designed multi-label feature selection method that integrates FR with LR is proposed, namely, Feature Selection combining three types of Conditional Relevance (TCRFS). Numerous experiments indicate that TCRFS outperforms the other 6 state-of-the-art multi-label approaches on 13 multi-label benchmark data sets from 4 domains.


2020 ◽  
Vol 50 (4) ◽  
pp. 1272-1288 ◽  
Author(s):  
Wanfu Gao ◽  
Liang Hu ◽  
Ping Zhang

Sign in / Sign up

Export Citation Format

Share Document