sequential feature selection Latest Research Papers

Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedded and chi-square feature selection on test set. The experimental result evidently shows that sequential feature selection outperforms as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection. Overall, sequential feature selection achieves better accuracy of 98.3% as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection.

Download Full-text

An improved feature selection approach for chronic heart disease detection

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i6.3001 ◽

2021 ◽

Vol 10 (6) ◽

pp. 3501-3506

Author(s):

S. J. Sushma ◽

Tsehay Admassu Assegie ◽

D. C. Vinutha ◽

S. Padmashree

Keyword(s):

Feature Selection ◽

Heart Disease ◽

Binary Classification ◽

Classification Model ◽

Computational Time ◽

Disease Detection ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Detection Model ◽

Sequential Feature Selection

Irrelevant feature in heart disease dataset affects the performance of binary classification model. Consequently, eliminating irrelevant and redundant feature (s) from training set with feature selection algorithm significantly improves the performance of classification model on heart disease detection. Sequential feature selection (SFS) is successful algorithm to improve the performance of classification model on heart disease detection and reduces the computational time complexity. In this study, sequential feature selection (SFS) algorithm is implemented for improving the classifier performance on heart disease detection by removing irrelevant features and training a model on optimal features. Furthermore, exhaustive and permutation based feature selection algorithm are implemented and compared with SFS algorithm. The implemented and existing feature selection algorithms are evaluated using real world Pima Indian heart disease dataset and result appears to prove that the SFS algorithm outperforms as compared to exhaustive and permutation based feature selection algorithm. Overall, the result looks promising and more effective heart disease detection model is developed with accuracy of 99.3%.

Download Full-text

Why segmentation matters: a Machine Learning approach for predicting loan defaults in the Peer-to-Peer (P2P) Financial Ecosystem

Risk Management Magazine ◽

10.47473/2020rmm0089 ◽

2021 ◽

Vol 16 (2) ◽

pp. 35-49

Author(s):

Adamaria Perrotta ◽

◽

Georgios Bliatsios ◽

Keyword(s):

Machine Learning ◽

Area Under The Curve ◽

Peer To Peer ◽

Weight Of Evidence ◽

K Nearest Neighbors ◽

Default Prediction ◽

Sequential Feature Selection ◽

Loan Defaults ◽

The One ◽

Online Lending

Peer-to-Peer (P2P) lending is an online lending process allowing individuals to obtain or concede loans without the interference of traditional financial intermediaries. It has grown quickly the last years, with some platforms reaching billions of dollars of loans in principal in a short amount of time. Since each loan is associated with the probability of loss due to a borrower's failure, this paper addresses the borrower's default prediction problem in the P2P financial ecosystem. The main assumption, which makes this study different from the available literature, is that borrowers sharing the same homeownership status display similar risk profile, thus a model per segment should be developed. We estimate the Probability of Default (PD) of a borrower by using Logistic Regression (LR) coupled with Weight of Evidence encoding. The features set is identified via the Sequential Feature Selection (SFS). We compare the forward against the backward SFS, in terms of the Area Under the Curve (AUC), and we choose the one that maximizes this statistic. Finally, we compare the results of the chosen LR approach against two other popular Machine Learning (ML) techniques: the k Nearest Neighbors (k-NN) and the Random Forest (RF).

Download Full-text

Adaptive Multi-level Backward Tracking for Sequential Feature Selection

Journal of ICT Research and Applications ◽

10.5614/itbj.ict.res.appl.2021.15.1.1 ◽

2021 ◽

Vol 15 (1) ◽

pp. 1-20

Author(s):

Knitchepon Chotchantarakun ◽

Ohm Sornil

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Nearest Neighbor ◽

Second Phase ◽

K Nearest Neighbor ◽

Backward Tracking ◽

Sequential Feature Selection ◽

Multi Level ◽

A New Technique ◽

Two Phases

In the past few decades, the large amount of available data has become a major challenge in data mining and machine learning. Feature selection is a significant preprocessing step for selecting the most informative features by removing irrelevant and redundant features, especially for large datasets. These selected features play an important role in information searching and enhancing the performance of machine learning models. In this research, we propose a new technique called One-level Forward Multi-level Backward Selection (OFMB). The proposed algorithm consists of two phases. The first phase aims to create preliminarily selected subsets. The second phase provides an improvement on the previous result by an adaptive multi-level backward searching technique. Hence, the idea is to apply an improvement step during the feature addition and an adaptive search method on the backtracking step. We have tested our algorithm on twelve standard UCI datasets based on k-nearest neighbor and naive Bayes classifiers. Their accuracy was then compared with some popular methods. OFMB showed better results than the other sequential forward searching techniques for most of the tested datasets.

Download Full-text

BOD5 Prediction Using machine learning methods

Water Science & Technology Water Supply ◽

10.2166/ws.2021.202 ◽

2021 ◽

Author(s):

Kai Sheng Ooi ◽

ZhiYuan Chen ◽

Phaik Eong Poh ◽

Jian Cui

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Water Samples ◽

Chemical Properties ◽

Oxygen Demand ◽

Support Vector ◽

Learning Framework ◽

Sequential Feature Selection ◽

Similar Range ◽

Physical And Chemical

Abstract Biological oxygen demand (BOD5) is an indicator used to monitor water quality. However, the standard process of measuring BOD5 is time consuming and could delay crucial mitigation works in the event of pollution. To solve this problem, this study employed multiple machine learning (ML) methods such as random forest (RF), support vector regression (SVR) and multilayer perceptron (MLP) to train a best model that can accurately predict the BOD5 values in water samples based on other physical and chemical properties of the water. The training parameters were optimized using genetic algorithm (GA) and feature selection was done using sequential feature selection (SFS) method. The proposed machine learning framework was firstly tested on the public dataset (Waterbase). MLP method produced the best model, with R2 score of 0.7672791942775417, relative MSE and relative MAE of approximately 15%. Feature importance calculations indicated that CODCr, Ammonium and Nitrate are features that highly correlates to BOD5. In the field study with a small private dataset consisting of water samples collected from two different lakes in Jiangsu Province of China, the trained model was found to have similar range of prediction error (around 15%), similar relative MAE (around 14%) and achieved about 6% better relative MSE.

Download Full-text

Clustering-based Sequential Feature Selection Approach for High Dimensional Data Classification

Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications ◽

10.5220/0010259501220132 ◽

2021 ◽

Author(s):

M. Alimoussa ◽

A. Porebski ◽

N. Vandenbroucke ◽

R. Thami ◽

S. El Fkihi

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

Data Classification ◽

High Dimensional ◽

Selection Approach ◽

Sequential Feature Selection ◽

Feature Selection Approach

Download Full-text

Natural Language Processing in Online Reviews

Advances in Business Information Systems and Analytics - Natural Language Processing for Global and Local Business ◽

10.4018/978-1-7998-4240-8.ch003 ◽

2021 ◽

pp. 40-64

Author(s):

Gunjan Ansari ◽

Shilpi Gupta ◽

Niraj Singhal

Keyword(s):

Feature Selection ◽

Language Processing ◽

Computational Cost ◽

Online Reviews ◽

Recursive Feature Elimination ◽

Consumer Experience ◽

Global Business ◽

Online Data ◽

Sequential Feature Selection ◽

Selection Of

The analysis of the online data posted on various e-commerce sites is required to improve consumer experience and thus enhance global business. The increase in the volume of social media content in the recent years led to the problem of overfitting in review classification. Thus, there arises a need to select relevant features to reduce computational cost and improve classifier performance. This chapter investigates various statistical feature selection methods that are time efficient but result in selection of few redundant features. To overcome this issue, wrapper methods such as sequential feature selection (SFS) and recursive feature elimination (RFE) are employed for selection of optimal feature set. The empirical analysis was conducted on movie review dataset using three different classifiers and the results depict that SVM could achieve f-measure of 96% with only 8% selected features using RFE method.

Download Full-text

Cardiac arrhythmia classification using sequential feature selection and decision tree classifier method

International Journal of Innovative Computing and Applications ◽

10.1504/ijica.2021.116653 ◽

2021 ◽

Vol 12 (4) ◽

pp. 175

Author(s):

S. Durga ◽

Esther Daniel ◽

S. Deepa Kanmani ◽

Jinsa Mary Philip

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Cardiac Arrhythmia ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Sequential Feature Selection

Download Full-text

Cardiac arrhythmia classification using sequential feature selection and decision tree classifier method

International Journal of Innovative Computing and Applications ◽

10.1504/ijica.2021.10038617 ◽

2021 ◽

Vol 12 (4) ◽

pp. 175

Author(s):

S. Durga ◽

Esther Daniel ◽

Jinsa Mary Philip ◽

S. Deepa Kanmani

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Cardiac Arrhythmia ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Sequential Feature Selection

Download Full-text

Reinforcement learning based metric filtering for evolutionary distance metric learning

Intelligent Data Analysis ◽

10.3233/ida-194887 ◽

2020 ◽

Vol 24 (6) ◽

pp. 1345-1364

Author(s):

Bassel Ali ◽

Koichi Moriyama ◽

Wasin Kalintha ◽

Masayuki Numao ◽

Ken-Ichi Fukui

Keyword(s):

Feature Selection ◽

Reinforcement Learning ◽

Data Collection ◽

Metric Learning ◽

Differential Evolution Algorithm ◽

Evolutionary Distance ◽

Distance Metric Learning ◽

Distance Metric ◽

Business Agility ◽

Sequential Feature Selection

Data collection plays an important role in business agility; data can prove valuable and provide insights for important features. However, conventional data collection methods can be costly and time-consuming. This paper proposes a hybrid system R-EDML that combines a sequential feature selection performed by Reinforcement Learning (RL) with the evolutionary feature prioritization of Evolutionary Distance Metric Learning (EDML) in a clustering process. The goal is to reduce the features while maintaining or increasing the accuracy leading to less time complexity and future data collection time and cost reduction. In this method, features represented by the diagonal elements of EDML matrices are prioritized using a differential evolution algorithm. Further, a selection control strategy using RL is learned by sequentially inserting and evaluating the prioritized elements. The outcome offers the best accuracy R-EDML matrix with the least number of elements. Diagonal R-EDML focusing on the diagonal elements is compared with EDML and conventional feature selection. Full Matrix R-EDML focusing on the diagonal and non-diagonal elements is tested and compared with Information-Theoretic Metric Learning. Moreover, R-EDML policy is tested for each EDML generation and across all generations. Results show a significant decrease in the number of features while maintaining or increasing accuracy.

Download Full-text

sequential feature selection
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Exploring the performance of feature selection method using breast cancer dataset

An improved feature selection approach for chronic heart disease detection

Why segmentation matters: a Machine Learning approach for predicting loan defaults in the Peer-to-Peer (P2P) Financial Ecosystem

Adaptive Multi-level Backward Tracking for Sequential Feature Selection

BOD5 Prediction Using machine learning methods

Clustering-based Sequential Feature Selection Approach for High Dimensional Data Classification

Natural Language Processing in Online Reviews

Cardiac arrhythmia classification using sequential feature selection and decision tree classifier method

Cardiac arrhythmia classification using sequential feature selection and decision tree classifier method

Reinforcement learning based metric filtering for evolutionary distance metric learning

Export Citation Format

sequential feature selectionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Exploring the performance of feature selection method using breast cancer dataset

An improved feature selection approach for chronic heart disease detection

Why segmentation matters: a Machine Learning approach for predicting loan defaults in the Peer-to-Peer (P2P) Financial Ecosystem

Adaptive Multi-level Backward Tracking for Sequential Feature Selection

BOD5 Prediction Using machine learning methods

Clustering-based Sequential Feature Selection Approach for High Dimensional Data Classification

Natural Language Processing in Online Reviews

Cardiac arrhythmia classification using sequential feature selection and decision tree classifier method

Cardiac arrhythmia classification using sequential feature selection and decision tree classifier method

Reinforcement learning based metric filtering for evolutionary distance metric learning

sequential feature selection
Recently Published Documents