Assessing feature selection method performance with class imbalance data

It is significant but challenging to explore a subset of robust biomarkers to distinguish cancer from normal samples on high-dimensional imbalanced cancer biological omics data. Although many feature selection methods addressing high dimensionality and class imbalance have been proposed, they rarely pay attention to the fact that most classes will dominate the final decision-making when the dataset is imbalanced, leading to instability when it expands downstream tasks. Because of causality invariance, causal relationship inference is considered an effective way to improve machine learning performance and stability. This paper proposes a Causality-inspired Least Angle Nonlinear Distributed (CLAND) feature selection method, consisting of two branches with a class-wised branch and a sample-wised branch representing two deconfounder strategies, respectively. We compared the performance of CLAND with other advanced feature selection methods in transcriptional data of six cancer types with different imbalance ratios. The genes selected by CLAND have superior accuracy, stability, and generalization in the downstream classification tasks, indicating potential causality for identifying cancer samples. Furthermore, these genes have also been demonstrated to play an essential role in cancer initiation and progression through reviewing the literature.

Download Full-text

Feature Selection Method from Multiclass Text with Class Imbalance Problem

Journal of the Korean Institute of Industrial Engineers ◽

10.7232/jkiie.2019.45.2.093 ◽

2019 ◽

Vol 45 (2) ◽

pp. 93-100

Author(s):

Minji Seo ◽

Gilseung Ahn ◽

Sun Hur

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Class Imbalance ◽

Selection Method ◽

Class Imbalance Problem ◽

Imbalance Problem

Download Full-text

Feature Selection Method Based on Weighted Mutual Information for Imbalanced Data

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194018500341 ◽

2018 ◽

Vol 28 (08) ◽

pp. 1177-1194 ◽

Cited By ~ 2

Author(s):

Kewen Li ◽

Mingxiao Yu ◽

Lu Liu ◽

Timing Li ◽

Jiannan Zhai

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Clustering Algorithm ◽

Feature Selection Method ◽

Class Imbalance ◽

Imbalanced Data ◽

Selection Method ◽

Class Imbalance Problem ◽

Minority Class ◽

Imbalance Problem

The class imbalance problem has negative effects on the performance of feature selection in imbalanced data. Traditional feature selection algorithms always study on the balanced class distribution of the data and improve the overall classification accuracy for the optimization goal, which tends to be overwhelmed by the large classes, ignoring the small ones. This paper proposes a novel feature selection method based on the weighted mutual information (WMI) for the imbalanced data, defined as WMI algorithm. The WMI algorithm assigns different weights to the samples based on the fuzzy c-means (FCM) clustering algorithm and then calculates the mutual information based on the weight of each sample. This paper used the AUC as the evaluation criterion of the selected feature. At last, four unbalanced datasets from NASA software defect datasets are used to validate the proposed approach. Experimental results show that the proposed method achieves higher prediction accuracy of both minority class and majority class.

Download Full-text

A Proposal of New Feature Selection Method Sensitive to Outliers and Correlation

10.1101/2021.03.11.434934 ◽

2021 ◽

Author(s):

Mert Demirarslan ◽

Aslı Suner

Keyword(s):

Feature Selection ◽

Ensemble Learning ◽

Learning Algorithms ◽

Feature Selection Method ◽

Class Imbalance ◽

Disease Diagnosis ◽

Selection Method ◽

T Score ◽

Score Method ◽

New Feature

AbstractIn disease diagnosis classification, ensemble learning algorithms enable strong and successful models by training more than one learning function simultaneously. This study aimed to eliminate the irrelevant variable problem with the proposed new feature selection method and compare the ensemble learning algorithms’ classification performances after eliminating the problems such as missing observation, classroom noise, and class imbalance that may occur in the disease diagnosis data. According to the findings obtained; In the preprocessed data, it was seen that the classification performance of the algorithms was higher than the raw version of the data. When the algorithms’ classification performances for the new proposed advanced t-Score and the old t-Score method were compared, the feature selection made with the proposed method showed statistically higher performance in all data sets and all algorithms compared to the old t-Score method (p = 0.0001).

Download Full-text

Improvement of feature selection method in spam filtering

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.02812 ◽

2009 ◽

Vol 29 (10) ◽

pp. 2812-2815

Author(s):

Yang-zhu LU ◽

Xin-you ZHANG ◽

Yu QI

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Spam Filtering

Download Full-text

Feature Selection for Histopathological Image Classification using levy Flight Salp Swarm Optimizer

Recent Patents on Computer Science ◽

10.2174/2213275912666181210165129 ◽

2019 ◽

Vol 12 (4) ◽

pp. 329-337 ◽

Cited By ~ 2

Author(s):

Venubabu Rachapudi ◽

Golagani Lavanya Devi

Keyword(s):

Feature Selection ◽

Image Classification ◽

Feature Selection Method ◽

Selection Method ◽

Lévy Flight ◽

Levy Flight ◽

Local Optima ◽

Histopathological Image ◽

Surf Features ◽

Histopathological Image Classification

Background: An efficient feature selection method for Histopathological image classification plays an important role to eliminate irrelevant and redundant features. Therefore, this paper proposes a new levy flight salp swarm optimizer based feature selection method. Methods: The proposed levy flight salp swarm optimizer based feature selection method uses the levy flight steps for each follower salp to deviate them from local optima. The best solution returns the relevant and non-redundant features, which are fed to different classifiers for efficient and robust image classification. Results: The efficiency of the proposed levy flight salp swarm optimizer has been verified on 20 benchmark functions. The anticipated scheme beats the other considered meta-heuristic approaches. Furthermore, the anticipated feature selection method has shown better reduction in SURF features than other considered methods and performed well for histopathological image classification. Conclusion: This paper proposes an efficient levy flight salp Swarm Optimizer by modifying the step size of follower salp. The proposed modification reduces the chances of sticking into local optima. Furthermore, levy flight salp Swarm Optimizer has been utilized in the selection of optimum features from SURF features for the histopathological image classification. The simulation results validate that proposed method provides optimal values and high classification performance in comparison to other methods.

Download Full-text

The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v8i0.96 ◽

2016 ◽

Vol 8 ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Fatemeh Alighardashi ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Filter Method ◽

Selection Methods ◽

Software Projects ◽

Software Fault Prediction ◽

Software Fault

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.

Download Full-text

A lazy feature selection method for multi-label classification

Intelligent Data Analysis ◽

10.3233/ida-194878 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-34

Author(s):

Rafael B. Pereira ◽

Alexandre Plastino ◽

Bianca Zadrozny ◽

Luiz H.C. Merschmann

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Video Classification ◽

Classification Problems ◽

Class Label ◽

New Feature ◽

Feature Selection Techniques ◽

Biomolecular Analysis

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text