Multi-label feature selection based on logistic regression and manifold learning

Mapping Intimacies ◽

10.20944/preprints202107.0341.v1 ◽

2021 ◽

Author(s):

Yao Zhang ◽

Yingcang Ma ◽

Xiaofei Yang

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

Manifold Learning ◽

High Dimensional ◽

Data Sets ◽

Efficiency Improvement ◽

Effective Technique ◽

Sparse Regularization ◽

Label Information ◽

Feature Weight

Like traditional single label learning, multi-label learning is also faced with the problem of dimensional disaster.Feature selection is an effective technique for dimensionality reduction and learning efficiency improvement of high-dimensional data. In this paper, Logistic regression, manifold learning and sparse regularization were combined to construct a joint framework for multi-label feature selection (LMFS). Firstly, the sparsity of the eigenweight matrix is constrained by the $L_{2,1}$-norm. Secondly, the feature manifold and label manifold can constrain the feature weight matrix to make it fit the data information and label information better. An iterative updating algorithm is designed and the convergence of the algorithm is proved.Finally, the LMFS algorithm is compared with DRMFS, SCLS and other algorithms on eight classical multi-label data sets. The experimental results show the effectiveness of LMFS algorithm.

Download Full-text

Fully Bayesian logistic regression with hyper-LASSO priors for high-dimensional feature selection

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2018.1490418 ◽

2018 ◽

Vol 88 (14) ◽

pp. 2827-2851 ◽

Cited By ~ 2

Author(s):

Longhai Li ◽

Weixin Yao

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

High Dimensional ◽

Fully Bayesian ◽

Bayesian Logistic Regression

Download Full-text

Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm

Genes ◽

10.3390/genes11070717 ◽

2020 ◽

Vol 11 (7) ◽

pp. 717

Author(s):

Garba Abdulrauf Sharifai ◽

Zurinahni Zainol

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Imbalanced Data ◽

High Dimensional ◽

Data Sets ◽

Biomedical Data ◽

Data Set ◽

Grasshopper Optimization Algorithm ◽

Imbalanced Class ◽

Grasshopper Optimization

The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.

Download Full-text

Large Margin Feature Selection for Support Vector Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.274.161 ◽

2013 ◽

Vol 274 ◽

pp. 161-164 ◽

Cited By ~ 1

Author(s):

Wei Pan ◽

Pei Jun Ma ◽

Xiao Hong Su

Keyword(s):

Feature Selection ◽

Optimal Solution ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

L1 Norm ◽

A Algorithm ◽

Regularization Technique ◽

Feature Weight ◽

Selection For

Feature selection is an preprocessing step in pattern analysis and machine learning. In this paper, we design a algorithm for feature subset. We present L1-norm regularization technique for sparse feature weight. Margin loss are introduced to evaluate features, and we employs gradient descent to search the optimal solution to maximize margin. The proposed technique is tested on UCI data sets. Compared with four margin based loss functions for SVM, the proposed technique is effective and efficient.

Download Full-text

A Novel Granularity Optimal Feature Selection based on Multi-Variant Clustering for High Dimensional Data

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.2031 ◽

2021 ◽

Vol 12 (3) ◽

pp. 5051-5062

Author(s):

Srinivas Kolli Et. al.

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

Classification Performance ◽

High Dimensional ◽

Second Phase ◽

Data Sets ◽

Aggressive Approach ◽

Related Data ◽

Optimal Feature ◽

Selection Of

Clustering is the most complex in multi/high dimensional data because of sub feature selection from overall features present in categorical data sources. Sub set feature be the aggressive approach to decrease feature dimensionality in mining of data, identification of patterns. Main aim behind selection of feature with respect to selection of optimal feature and decrease the redundancy. In-order to compute with redundant/irrelevant features in high dimensional sample data exploration based on feature selection calculation with data granular described in this document. Propose aNovel Granular Feature Multi-variant Clustering based Genetic Algorithm (NGFMCGA) model to evaluate the performance results in this implementation. This model main consists two phases, in first phase, based on theoretic graph grouping procedure divide features into different clusters, in second phase, select strongly representative related feature from each cluster with respect to matching of subset of features. Features present in this concept are independent because of features select from different clusters, proposed approach clustering have high probability in processing and increasing the quality of independent and useful features.Optimal subset feature selection improves accuracy of clustering and feature classification, performance of proposed approach describes better accuracy with respect to optimal subset selection is applied on publicly related data sets and it is compared with traditional supervised evolutionary approaches

Download Full-text

Robust Feature Selection on Incomplete Data

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/443 ◽

2018 ◽

Cited By ~ 1

Author(s):

Wei Zheng ◽

Xiaofeng Zhu ◽

Yonghua Zhu ◽

Shichao Zhang

Keyword(s):

Feature Selection ◽

Incomplete Data ◽

High Dimensional ◽

Data Sets ◽

Selection Methods ◽

Limited Ability ◽

Training Samples ◽

Indicator Matrix ◽

Selection Framework ◽

Incomplete Datasets

Feature selection is an indispensable preprocessing procedure for high-dimensional data analysis,but previous feature selection methods usually ignore sample diversity (i.e., every sample has individual contribution for the model construction) andhave limited ability to deal with incomplete datasets where a part of training samples have unobserved data. To address these issues, in this paper, we firstly propose a robust feature selectionframework to relieve the influence of outliers, andthen introduce an indicator matrix to avoid unobserved data to take participation in numerical computation of feature selection so that both our proposed feature selection framework and exiting feature selection frameworks are available to conductfeature selection on incomplete data sets. We further propose a new optimization algorithm to optimize the resulting objective function as well asprove our algorithm to converge fast. Experimental results on both real and artificial incompletedata sets demonstrated that our proposed methodoutperformed the feature selection methods undercomparison in terms of clustering performance.

Download Full-text

A novel feature selection scheme for high-dimensional data sets: four-Staged Feature Selection

Journal of Applied Statistics ◽

10.1080/02664763.2015.1092112 ◽

2015 ◽

Vol 43 (6) ◽

pp. 1140-1154 ◽

Cited By ~ 6

Author(s):

Ayça Çakmak Pehlivanlı

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Data Sets ◽

Selection Scheme

Download Full-text

Robust ensemble feature selection for high dimensional data sets

2013 International Conference on High Performance Computing & Simulation (HPCS) ◽

10.1109/hpcsim.2013.6641406 ◽

2013 ◽

Cited By ~ 14

Author(s):

Afef Ben Brahim ◽

Mohamed Limam

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Data Sets ◽

Selection For

Download Full-text

Multi-label feature selection based on logistic regression and manifold learning

Applied Intelligence ◽

10.1007/s10489-021-03008-8 ◽

2022 ◽

Author(s):

Yao Zhang ◽

Yingcang Ma ◽

Xiaofei Yang

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

Manifold Learning

Download Full-text

Intrusion Detection Using Back Propagation Neural Network and Quick Reduct Algorithms

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit183891 ◽

2018 ◽

pp. 317-325

Author(s):

S. Vijaya Rani ◽

G. N. K Suresh Babu

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Data Sets ◽

Effective Technique ◽

Data Set ◽

Quick Reduct

It is a big challenge to safeguard a network and data due to various network threats and attacks in a network system. Intrusion detection system is an effective technique to negotiate the issues of network security by utilizing various network classifiers. It detects malicious attacks. The data sets available in the study of intrusion detection system were DARPA, KDD 1999 cup, NSL_KDD, DEFCON, ISCX-UNB, KDD 1999 cup data set is the best and old data set for research purpose on intrusion detection. The data is preprocessed, normalized and trained by BPN algorithm. Further the normalized data is discretized using Entropy discretization and feature selection carried out by quick reduct methods. After feature selection, the concerned feature from normalized data is processed through BPN for better accuracy and efficiency of the system.

Download Full-text

Bayesian Feature Selection with Strongly Regularizing Priors Maps to the Ising Model

Neural Computation ◽

10.1162/neco_a_00780 ◽

2015 ◽

Vol 27 (11) ◽

pp. 2411-2422 ◽

Cited By ~ 4

Author(s):

Charles K. Fisher ◽

Pankaj Mehta

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

Ising Model ◽

Posterior Probability ◽

Linear Models ◽

Data Sets ◽

Data Set ◽

Universal Form ◽

Classification Tasks ◽

Selection For

Identifying small subsets of features that are relevant for prediction and classification tasks is a central problem in machine learning and statistics. The feature selection task is especially important, and computationally difficult, for modern data sets where the number of features can be comparable to or even exceed the number of samples. Here, we show that feature selection with Bayesian inference takes a universal form and reduces to calculating the magnetizations of an Ising model under some mild conditions. Our results exploit the observation that the evidence takes a universal form for strongly regularizing priors—priors that have a large effect on the posterior probability even in the infinite data limit. We derive explicit expressions for feature selection for generalized linear models, a large class of statistical techniques that includes linear and logistic regression. We illustrate the power of our approach by analyzing feature selection in a logistic regression-based classifier trained to distinguish between the letters B and D in the notMNIST data set.

Download Full-text