A graph-based feature selection method for learning to rank using spectral clustering for redundancy minimization and biased PageRank for relevance analysis

This paper addresses the feature selection problem in learning to rank (LTR). We propose a graph-based feature selection method, named FS-SCPR, which comprises four steps: (i) use ranking information to assess the similarity between features and construct an undirected feature similarity graph; (ii) apply spectral clustering to cluster features using eigenvectors of matrices extracted from the graph; (iii) utilize biased PageRank to assign a relevance score with respect to the ranking problem to each feature by incorporating each feature?s ranking performance as preference to bias the PageRank computation; and (iv) apply optimization to select the feature from each cluster with both the highest relevance score and most information of the features in the cluster. We also develop a new LTR for information retrieval (IR) approach that first exploits FS-SCPR as a preprocessor to determine discriminative and useful features and then employs Ranking SVM to derive a ranking model with the selected features. An evaluation, conducted using the LETOR benchmark datasets, demonstrated the competitive performance of our approach compared to representative feature selection methods and state-of-the-art LTR methods.

Download Full-text

A New Feature Selection Method Based on a Self-Variant Genetic Algorithm Applied to Android Malware Detection

Symmetry ◽

10.3390/sym13071290 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1290

Author(s):

Le Wang ◽

Yuelin Gao ◽

Shanshan Gao ◽

Xin Yong

Keyword(s):

Feature Selection ◽

Population Size ◽

Feature Selection Method ◽

Classification Problem ◽

Selection Method ◽

Classification Problems ◽

Feature Selection Problem ◽

Android Malware ◽

Android Malware Detection ◽

Mutation Operators

In solving classification problems in the field of machine learning and pattern recognition, the pre-processing of data is particularly important. The processing of high-dimensional feature datasets increases the time and space complexity of computer processing and reduces the accuracy of classification models. Hence, the proposal of a good feature selection method is essential. This paper presents a new algorithm for solving feature selection, retaining the selection and mutation operators from traditional genetic algorithms. On the one hand, the global search capability of the algorithm is ensured by changing the population size, on the other hand, finding the optimal mutation probability for solving the feature selection problem based on different population sizes. During the iteration of the algorithm, the population size does not change, no matter how many transformations are made, and is the same as the initialized population size; this spatial invariance is physically defined as symmetry. The proposed method is compared with other algorithms and validated on different datasets. The experimental results show good performance of the algorithm, in addition to which we apply the algorithm to a practical Android software classification problem and the results also show the superiority of the algorithm.

Download Full-text

Fuzzy Rank Based Parallel Online Feature Selection Method using Multiple Sliding Windows

Open Computer Science ◽

10.1515/comp-2020-0169 ◽

2021 ◽

Vol 11 (1) ◽

pp. 275-287

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Streaming Data ◽

Selection Methods ◽

Sliding Windows ◽

Real World Applications ◽

Benchmark Datasets ◽

Online Feature Selection ◽

Online Streaming

Abstract Nowadays, in real-world applications, the dimensions of data are generated dynamically, and the traditional batch feature selection methods are not suitable for streaming data. So, online streaming feature selection methods gained more attention but the existing methods had demerits like low classification accuracy, fails to avoid redundant and irrelevant features, and a higher number of features selected. In this paper, we propose a parallel online feature selection method using multiple sliding-windows and fuzzy fast-mRMR feature selection analysis, which is used for selecting minimum redundant and maximum relevant features, and also overcomes the drawbacks of existing online streaming feature selection methods. To increase the performance speed of the proposed method parallel processing is used. To evaluate the performance of the proposed online feature selection method k-NN, SVM, and Decision Tree Classifiers are used and compared against the state-of-the-art online feature selection methods. Evaluation metrics like Accuracy, Precision, Recall, F1-Score are used on benchmark datasets for performance analysis. From the experimental analysis, it is proved that the proposed method has achieved more than 95% accuracy for most of the datasets and performs well over other existing online streaming feature selection methods and also, overcomes the drawbacks of the existing methods.

Download Full-text

Graph-based Feature Selection Method for Learning to Rank

2020 the 6th International Conference on Communication and Information Processing ◽

10.1145/3442555.3442567 ◽

2020 ◽

Author(s):

Jen-Yuan Yeh ◽

Cheng-Jung Tsai

Keyword(s):

Feature Selection ◽

Learning To Rank ◽

Feature Selection Method ◽

Selection Method

Download Full-text

Best Features Selection for Biomedical Data Classification Using Seven Spot Ladybird Optimization Algorithm

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch021 ◽

2020 ◽

pp. 407-421

Author(s):

Noria Bidi ◽

Zakaria Elberrichi

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Adaptive Algorithm ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Biomedical Data ◽

Features Selection ◽

Benchmark Datasets

This article presents a new adaptive algorithm called FS-SLOA (Feature Selection-Seven Spot Ladybird Optimization Algorithm) which is a meta-heuristic feature selection method based on the foraging behavior of a seven spot ladybird. The new efficient technique has been applied to find the best subset features, which achieves the highest accuracy in classification using three classifiers: the Naive Bayes (NB), the Nearest Neighbors (KNN) and the Support Vector Machine (SVM). The authors' proposed approach has been experimented on four well-known benchmark datasets (Wisconsin Breast cancer, Pima Diabetes, Mammographic Mass, and Dermatology datasets) taken from the UCI machine learning repository. Experimental results prove that the classification accuracy of FS-SLOA is the best performing for different datasets.

Download Full-text

Best Features Selection for Biomedical Data Classification Using Seven Spot Ladybird Optimization Algorithm

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2018070104 ◽

2018 ◽

Vol 9 (3) ◽

pp. 75-87 ◽

Cited By ~ 1

Author(s):

Noria Bidi ◽

Zakaria Elberrichi

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Feature Selection Method ◽

Nearest Neighbors ◽

Selection Method ◽

Support Vector ◽

Biomedical Data ◽

Features Selection ◽

Benchmark Datasets ◽

Selection For

Download Full-text

The pertinent single-attribute-based classifier for small datasets classification

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i3.pp3227-3234 ◽

2020 ◽

Vol 10 (3) ◽

pp. 3227

Author(s):

Mona Jamjoom

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Information Gain ◽

Selection Criterion ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Benchmark Datasets ◽

Single Attribute ◽

Ratio Measure

Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attribute-based-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).

Download Full-text

A feature selection method based on minimum redundancy maximum relevance for learning to rank

2015 AI & Robotics (IRANOPEN) ◽

10.1109/rios.2015.7270735 ◽

2015 ◽

Cited By ~ 2

Author(s):

Mehrnoush Barani Shirzad ◽

Mohammad Reza Keyvanpour

Keyword(s):

Feature Selection ◽

Learning To Rank ◽

Feature Selection Method ◽

Selection Method ◽

Minimum Redundancy Maximum Relevance

Download Full-text

Improvement of feature selection method in spam filtering

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.02812 ◽

2009 ◽

Vol 29 (10) ◽

pp. 2812-2815

Author(s):

Yang-zhu LU ◽

Xin-you ZHANG ◽

Yu QI

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Spam Filtering

Download Full-text

Feature Selection for Histopathological Image Classification using levy Flight Salp Swarm Optimizer

Recent Patents on Computer Science ◽

10.2174/2213275912666181210165129 ◽

2019 ◽

Vol 12 (4) ◽

pp. 329-337 ◽

Cited By ~ 2

Author(s):

Venubabu Rachapudi ◽

Golagani Lavanya Devi

Keyword(s):

Feature Selection ◽

Image Classification ◽

Feature Selection Method ◽

Selection Method ◽

Lévy Flight ◽

Levy Flight ◽

Local Optima ◽

Histopathological Image ◽

Surf Features ◽

Histopathological Image Classification

Background: An efficient feature selection method for Histopathological image classification plays an important role to eliminate irrelevant and redundant features. Therefore, this paper proposes a new levy flight salp swarm optimizer based feature selection method. Methods: The proposed levy flight salp swarm optimizer based feature selection method uses the levy flight steps for each follower salp to deviate them from local optima. The best solution returns the relevant and non-redundant features, which are fed to different classifiers for efficient and robust image classification. Results: The efficiency of the proposed levy flight salp swarm optimizer has been verified on 20 benchmark functions. The anticipated scheme beats the other considered meta-heuristic approaches. Furthermore, the anticipated feature selection method has shown better reduction in SURF features than other considered methods and performed well for histopathological image classification. Conclusion: This paper proposes an efficient levy flight salp Swarm Optimizer by modifying the step size of follower salp. The proposed modification reduces the chances of sticking into local optima. Furthermore, levy flight salp Swarm Optimizer has been utilized in the selection of optimum features from SURF features for the histopathological image classification. The simulation results validate that proposed method provides optimal values and high classification performance in comparison to other methods.

Download Full-text

The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v8i0.96 ◽

2016 ◽

Vol 8 ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Fatemeh Alighardashi ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Filter Method ◽

Selection Methods ◽

Software Projects ◽

Software Fault Prediction ◽

Software Fault

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.

Download Full-text