scholarly journals Multi-label feature selection based on logistic regression and manifold learning

Author(s):  
Yao Zhang ◽  
Yingcang Ma ◽  
Xiaofei Yang

Like traditional single label learning, multi-label learning is also faced with the problem of dimensional disaster.Feature selection is an effective technique for dimensionality reduction and learning efficiency improvement of high-dimensional data. In this paper, Logistic regression, manifold learning and sparse regularization were combined to construct a joint framework for multi-label feature selection (LMFS). Firstly, the sparsity of the eigenweight matrix is constrained by the $L_{2,1}$-norm. Secondly, the feature manifold and label manifold can constrain the feature weight matrix to make it fit the data information and label information better. An iterative updating algorithm is designed and the convergence of the algorithm is proved.Finally, the LMFS algorithm is compared with DRMFS, SCLS and other algorithms on eight classical multi-label data sets. The experimental results show the effectiveness of LMFS algorithm.

Genes ◽  
2020 ◽  
Vol 11 (7) ◽  
pp. 717
Author(s):  
Garba Abdulrauf Sharifai ◽  
Zurinahni Zainol

The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.


2013 ◽  
Vol 274 ◽  
pp. 161-164 ◽  
Author(s):  
Wei Pan ◽  
Pei Jun Ma ◽  
Xiao Hong Su

Feature selection is an preprocessing step in pattern analysis and machine learning. In this paper, we design a algorithm for feature subset. We present L1-norm regularization technique for sparse feature weight. Margin loss are introduced to evaluate features, and we employs gradient descent to search the optimal solution to maximize margin. The proposed technique is tested on UCI data sets. Compared with four margin based loss functions for SVM, the proposed technique is effective and efficient.


Author(s):  
Srinivas Kolli Et. al.

Clustering is the most complex in multi/high dimensional data because of sub feature selection from overall features present in categorical data sources. Sub set feature be the aggressive approach to decrease feature dimensionality in mining of data, identification of patterns. Main aim behind selection of feature with respect to selection of optimal feature and decrease the redundancy. In-order to compute with redundant/irrelevant features in high dimensional sample data exploration based on feature selection calculation with data granular described in this document. Propose aNovel Granular Feature Multi-variant Clustering based Genetic Algorithm (NGFMCGA) model to evaluate the performance results in this implementation. This model main consists two phases, in first phase, based on theoretic graph grouping procedure divide features into different clusters, in second phase, select strongly  representative related feature from each cluster with respect to matching of subset of features. Features present in this concept are independent because of features select from different clusters, proposed approach clustering have high probability in processing and increasing the quality of independent and useful features.Optimal subset feature selection improves accuracy of clustering and feature classification, performance of proposed approach describes better accuracy with respect to optimal subset selection is applied on publicly related data sets and it is compared with traditional supervised evolutionary approaches


Author(s):  
Wei Zheng ◽  
Xiaofeng Zhu ◽  
Yonghua Zhu ◽  
Shichao Zhang

Feature selection is an indispensable preprocessing procedure for high-dimensional data analysis,but previous feature selection methods usually ignore sample diversity (i.e., every sample has individual contribution for the model construction) andhave limited ability to deal with incomplete datasets where a part of training samples have unobserved data. To address these issues, in this paper, we firstly propose a robust feature selectionframework to relieve the influence of outliers, andthen introduce an indicator matrix to avoid unobserved data to take participation in numerical computation of feature selection so that both our proposed feature selection framework and exiting feature selection frameworks are available to conductfeature selection on incomplete data sets. We further propose a new optimization algorithm to optimize the resulting objective function as well asprove our algorithm to converge fast. Experimental results on both real and artificial incompletedata sets demonstrated that our proposed methodoutperformed the feature selection methods undercomparison in terms of clustering performance.  


Author(s):  
S. Vijaya Rani ◽  
G. N. K Suresh Babu

It is a big challenge to safeguard a network and data due to various network threats and attacks in a network system. Intrusion detection system is an effective technique to negotiate the issues of network security by utilizing various network classifiers. It detects malicious attacks. The data sets available in the study of intrusion detection system were DARPA, KDD 1999 cup, NSL_KDD, DEFCON, ISCX-UNB, KDD 1999 cup data set is the best and old data set for research purpose on intrusion detection. The data is preprocessed, normalized and trained by BPN algorithm. Further the normalized data is discretized using Entropy discretization and feature selection carried out by quick reduct methods. After feature selection, the concerned feature from normalized data is processed through BPN for better accuracy and efficiency of the system.


2015 ◽  
Vol 27 (11) ◽  
pp. 2411-2422 ◽  
Author(s):  
Charles K. Fisher ◽  
Pankaj Mehta

Identifying small subsets of features that are relevant for prediction and classification tasks is a central problem in machine learning and statistics. The feature selection task is especially important, and computationally difficult, for modern data sets where the number of features can be comparable to or even exceed the number of samples. Here, we show that feature selection with Bayesian inference takes a universal form and reduces to calculating the magnetizations of an Ising model under some mild conditions. Our results exploit the observation that the evidence takes a universal form for strongly regularizing priors—priors that have a large effect on the posterior probability even in the infinite data limit. We derive explicit expressions for feature selection for generalized linear models, a large class of statistical techniques that includes linear and logistic regression. We illustrate the power of our approach by analyzing feature selection in a logistic regression-based classifier trained to distinguish between the letters B and D in the notMNIST data set.


Sign in / Sign up

Export Citation Format

Share Document