sequential feature selection
Recently Published Documents


TOTAL DOCUMENTS

44
(FIVE YEARS 19)

H-INDEX

8
(FIVE YEARS 2)

Author(s):  
Tsehay Admassu Assegie ◽  
Ravulapalli Lakshmi Tulasi ◽  
Vadivel Elanangai ◽  
Napa Komal Kumar

Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedded and chi-square feature selection on test set. The experimental result evidently shows that sequential feature selection outperforms as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection. Overall, sequential feature selection achieves better accuracy of 98.3% as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection.


2021 ◽  
Vol 10 (6) ◽  
pp. 3501-3506
Author(s):  
S. J. Sushma ◽  
Tsehay Admassu Assegie ◽  
D. C. Vinutha ◽  
S. Padmashree

Irrelevant feature in heart disease dataset affects the performance of binary classification model. Consequently, eliminating irrelevant and redundant feature (s) from training set with feature selection algorithm significantly improves the performance of classification model on heart disease detection. Sequential feature selection (SFS) is successful algorithm to improve the performance of classification model on heart disease detection and reduces the computational time complexity. In this study, sequential feature selection (SFS) algorithm is implemented for improving the classifier performance on heart disease detection by removing irrelevant features and training a model on optimal features. Furthermore, exhaustive and permutation based feature selection algorithm are implemented and compared with SFS algorithm. The implemented and existing feature selection algorithms are evaluated using real world Pima Indian heart disease dataset and result appears to prove that the SFS algorithm outperforms as compared to exhaustive and permutation based feature selection algorithm. Overall, the result looks promising and more effective heart disease detection model is developed with accuracy of 99.3%.


2021 ◽  
Vol 16 (2) ◽  
pp. 35-49
Author(s):  
Adamaria Perrotta ◽  
◽  
Georgios Bliatsios ◽  

Peer-to-Peer (P2P) lending is an online lending process allowing individuals to obtain or concede loans without the interference of traditional financial intermediaries. It has grown quickly the last years, with some platforms reaching billions of dollars of loans in principal in a short amount of time. Since each loan is associated with the probability of loss due to a borrower's failure, this paper addresses the borrower's default prediction problem in the P2P financial ecosystem. The main assumption, which makes this study different from the available literature, is that borrowers sharing the same homeownership status display similar risk profile, thus a model per segment should be developed. We estimate the Probability of Default (PD) of a borrower by using Logistic Regression (LR) coupled with Weight of Evidence encoding. The features set is identified via the Sequential Feature Selection (SFS). We compare the forward against the backward SFS, in terms of the Area Under the Curve (AUC), and we choose the one that maximizes this statistic. Finally, we compare the results of the chosen LR approach against two other popular Machine Learning (ML) techniques: the k Nearest Neighbors (k-NN) and the Random Forest (RF).


2021 ◽  
Vol 15 (1) ◽  
pp. 1-20
Author(s):  
Knitchepon Chotchantarakun ◽  
Ohm Sornil

In the past few decades, the large amount of available data has become a major challenge in data mining and machine learning. Feature selection is a significant preprocessing step for selecting the most informative features by removing irrelevant and redundant features, especially for large datasets. These selected features play an important role in information searching and enhancing the performance of machine learning models. In this research, we propose a new technique called One-level Forward Multi-level Backward Selection (OFMB). The proposed algorithm consists of two phases. The first phase aims to create preliminarily selected subsets. The second phase provides an improvement on the previous result by an adaptive multi-level backward searching technique. Hence, the idea is to apply an improvement step during the feature addition and an adaptive search method on the backtracking step. We have tested our algorithm on twelve standard UCI datasets based on k-nearest neighbor and naive Bayes classifiers. Their accuracy was then compared with some popular methods. OFMB showed better results than the other sequential forward searching techniques for most of the tested datasets.


Author(s):  
Kai Sheng Ooi ◽  
ZhiYuan Chen ◽  
Phaik Eong Poh ◽  
Jian Cui

Abstract Biological oxygen demand (BOD5) is an indicator used to monitor water quality. However, the standard process of measuring BOD5 is time consuming and could delay crucial mitigation works in the event of pollution. To solve this problem, this study employed multiple machine learning (ML) methods such as random forest (RF), support vector regression (SVR) and multilayer perceptron (MLP) to train a best model that can accurately predict the BOD5 values in water samples based on other physical and chemical properties of the water. The training parameters were optimized using genetic algorithm (GA) and feature selection was done using sequential feature selection (SFS) method. The proposed machine learning framework was firstly tested on the public dataset (Waterbase). MLP method produced the best model, with R2 score of 0.7672791942775417, relative MSE and relative MAE of approximately 15%. Feature importance calculations indicated that CODCr, Ammonium and Nitrate are features that highly correlates to BOD5. In the field study with a small private dataset consisting of water samples collected from two different lakes in Jiangsu Province of China, the trained model was found to have similar range of prediction error (around 15%), similar relative MAE (around 14%) and achieved about 6% better relative MSE.


Author(s):  
Gunjan Ansari ◽  
Shilpi Gupta ◽  
Niraj Singhal

The analysis of the online data posted on various e-commerce sites is required to improve consumer experience and thus enhance global business. The increase in the volume of social media content in the recent years led to the problem of overfitting in review classification. Thus, there arises a need to select relevant features to reduce computational cost and improve classifier performance. This chapter investigates various statistical feature selection methods that are time efficient but result in selection of few redundant features. To overcome this issue, wrapper methods such as sequential feature selection (SFS) and recursive feature elimination (RFE) are employed for selection of optimal feature set. The empirical analysis was conducted on movie review dataset using three different classifiers and the results depict that SVM could achieve f-measure of 96% with only 8% selected features using RFE method.


2020 ◽  
Vol 24 (6) ◽  
pp. 1345-1364
Author(s):  
Bassel Ali ◽  
Koichi Moriyama ◽  
Wasin Kalintha ◽  
Masayuki Numao ◽  
Ken-Ichi Fukui

Data collection plays an important role in business agility; data can prove valuable and provide insights for important features. However, conventional data collection methods can be costly and time-consuming. This paper proposes a hybrid system R-EDML that combines a sequential feature selection performed by Reinforcement Learning (RL) with the evolutionary feature prioritization of Evolutionary Distance Metric Learning (EDML) in a clustering process. The goal is to reduce the features while maintaining or increasing the accuracy leading to less time complexity and future data collection time and cost reduction. In this method, features represented by the diagonal elements of EDML matrices are prioritized using a differential evolution algorithm. Further, a selection control strategy using RL is learned by sequentially inserting and evaluating the prioritized elements. The outcome offers the best accuracy R-EDML matrix with the least number of elements. Diagonal R-EDML focusing on the diagonal elements is compared with EDML and conventional feature selection. Full Matrix R-EDML focusing on the diagonal and non-diagonal elements is tested and compared with Information-Theoretic Metric Learning. Moreover, R-EDML policy is tested for each EDML generation and across all generations. Results show a significant decrease in the number of features while maintaining or increasing accuracy.


Sign in / Sign up

Export Citation Format

Share Document