Filter Variable Selection Algorithm Using Risk Ratios for Dimensionality Reduction of Healthcare Data for Classification

This research developed and tested a filter algorithm that serves to reduce the feature space in healthcare datasets. The algorithm binarizes the dataset, and then separately evaluates the risk ratio of each predictor with the response, and outputs ratios that represent the association between a predictor and the class attribute. The value of the association translates to the importance rank of the corresponding predictor in determining the outcome. Using Random Forest and Logistic regression classification, the performance of the developed algorithm was compared against the regsubsets and varImp functions, which are unsupervised methods of variable selection. Equally, the proposed algorithm was compared with the supervised Fisher score and Pearson’s correlation feature selection methods. Different datasets were used for the experiment, and, in the majority of the cases, the predictors selected by the new algorithm outperformed those selected by the existing algorithms. The proposed filter algorithm is therefore a reliable alternative for variable ranking in data mining classification tasks with a dichotomous response.

Download Full-text

A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results

Journal of Clinical Epidemiology ◽

10.1016/j.jclinepi.2015.10.002 ◽

2016 ◽

Vol 71 ◽

pp. 76-85 ◽

Cited By ~ 53

Author(s):

Farideh Bagherzadeh-Khiabani ◽

Azra Ramezankhani ◽

Fereidoun Azizi ◽

Farzad Hadaegh ◽

Ewout W. Steyerberg ◽

...

Keyword(s):

Data Mining ◽

Feature Selection ◽

Variable Selection ◽

Prediction Models ◽

Selection Methods ◽

Clinical Prediction ◽

Clinical Prediction Models ◽

Selection For

Download Full-text

Variable Reduction and Variable Selection Methods Using Small, Medium and Large Datasets: A Forecast Comparison for the PEEIs

SSRN Electronic Journal ◽

10.2139/ssrn.2444421 ◽

2014 ◽

Author(s):

George Kapetanios ◽

Massimiliano Giuseppe Marcellino ◽

Fotis Papailias

Keyword(s):

Variable Selection ◽

Large Datasets ◽

Selection Methods ◽

Variable Reduction ◽

Forecast Comparison

Download Full-text

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes

Journal of Clinical and Translational Science ◽

10.1017/cts.2020.556 ◽

2020 ◽

pp. 1-31

Author(s):

Bethany J. Wolf ◽

Yunyun Jiang ◽

Silvia H. Wilson ◽

Jim C. Oates

Keyword(s):

Variable Selection ◽

Binary Outcomes ◽

Selection Methods

Download Full-text

Mahalanobis Distance Based Similarity Regression Learning of NIRS for Quality Assurance of Tobacco Product with Different Variable Selection Methods

Spectrochimica Acta Part A Molecular and Biomolecular Spectroscopy ◽

10.1016/j.saa.2020.119364 ◽

2020 ◽

pp. 119364

Author(s):

Juan Huo ◽

Yuping Ma ◽

Huaiqi Li ◽

Changtong Lu ◽

Chenggang Li ◽

...

Keyword(s):

Quality Assurance ◽

Variable Selection ◽

Mahalanobis Distance ◽

Tobacco Product ◽

Selection Methods ◽

Regression Learning

Download Full-text

Combined performance of screening and variable selection methods in ultra-high dimensional data in predicting time-to-event outcomes

Diagnostic and Prognostic Research ◽

10.1186/s41512-018-0043-4 ◽

2018 ◽

Vol 2 (1) ◽

Cited By ~ 6

Author(s):

Lira Pi ◽

Susan Halabi

Keyword(s):

Variable Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Methods ◽

Time To Event

Download Full-text

Predicting the Severity of Bug Reports Based on Feature Selection

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194018500158 ◽

2018 ◽

Vol 28 (04) ◽

pp. 537-558 ◽

Cited By ~ 4

Author(s):

Wenjie Liu ◽

Shanshan Wang ◽

Xin Chen ◽

He Jiang

Keyword(s):

Feature Selection ◽

Software Maintenance ◽

Feature Selection Method ◽

Selection Methods ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Bug Reports ◽

Single Feature ◽

Bug Report ◽

Severity Prediction

In software maintenance process, it is a fairly important activity to predict the severity of bug reports. However, manually identifying the severity of bug reports is a tedious and time-consuming task. So developing automatic judgment methods for predicting the severity of bug reports has become an urgent demand. In general, a bug report contains a lot of descriptive natural language texts, thus resulting in a high-dimensional feature set which poses serious challenges to traditionally automatic methods. Therefore, we attempt to use automatic feature selection methods to improve the performance of the severity prediction of bug reports. In this paper, we introduce a ranking-based strategy to improve existing feature selection algorithms and propose an ensemble feature selection algorithm by combining existing ones. In order to verify the performance of our method, we run experiments over the bug reports of Eclipse and Mozilla and conduct comparisons with eight commonly used feature selection methods. The experiment results show that the ranking-based strategy can effectively improve the performance of the severity prediction of bug reports by up to 54.76% on average in terms of [Formula: see text]-measure, and it also can significantly reduce the dimension of the feature set. Meanwhile, the ensemble feature selection method can get better results than a single feature selection algorithm.

Download Full-text

An Expert System Based on Fisher Score and LS-SVM for Cardiac Arrhythmia Diagnosis

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/849674 ◽

2013 ◽

Vol 2013 ◽

pp. 1-6 ◽

Cited By ~ 19

Author(s):

Ersen Yılmaz

Keyword(s):

Expert System ◽

Cardiac Arrhythmia ◽

Feature Space ◽

Support Vector ◽

Feature Subset ◽

Fisher Score ◽

Data Set ◽

Second Stage ◽

Vector Machines ◽

Two Stages

An expert system having two stages is proposed for cardiac arrhythmia diagnosis. In the first stage, Fisher score is used for feature selection to reduce the feature space dimension of a data set. The second stage is classification stage in which least squares support vector machines classifier is performed by using the feature subset selected in the first stage to diagnose cardiac arrhythmia. Performance of the proposed expert system is evaluated by using an arrhythmia data set which is taken from UCI machine learning repository.

Download Full-text