Feature selection for multiple binary classification problems

The objectives of this Perspective paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to develop tools to advance personalized cancer therapy. As an illustration of the possibilities, a new algorithm for sparse regression is presented and is applied to predict the time to tumour recurrence in ovarian cancer. A new algorithm for sparse feature selection in classification problems is presented, and its validation in endometrial cancer is briefly discussed. Some open problems are also presented.

Download Full-text

Multiple similarly effective solutions exist for biomedical feature selection and classification problems

Scientific Reports ◽

10.1038/s41598-017-13184-8 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 9

Author(s):

Jiamei Liu ◽

Cheng Xu ◽

Weifeng Yang ◽

Yayun Shu ◽

Weiwei Zheng ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Association Studies ◽

Binary Classification ◽

Learning Algorithms ◽

Optimal Solution ◽

Machine Learning Algorithms ◽

Disease Classification ◽

Genome Wide Association Studies ◽

Classification Problems

Abstract Binary classification is a widely employed problem to facilitate the decisions on various biomedical big data questions, such as clinical drug trials between treated participants and controls, and genome-wide association studies (GWASs) between participants with or without a phenotype. A machine learning model is trained for this purpose by optimizing the power of discriminating samples from two groups. However, most of the classification algorithms tend to generate one locally optimal solution according to the input dataset and the mathematical presumptions of the dataset. Here we demonstrated from the aspects of both disease classification and feature selection that multiple different solutions may have similar classification performances. So the existing machine learning algorithms may have ignored a horde of fishes by catching only a good one. Since most of the existing machine learning algorithms generate a solution by optimizing a mathematical goal, it may be essential for understanding the biological mechanisms for the investigated classification question, by considering both the generated solution and the ignored ones.

Download Full-text

Mutual information based input feature selection for classification problems

Decision Support Systems ◽

10.1016/j.dss.2012.08.014 ◽

2012 ◽

Vol 54 (1) ◽

pp. 691-698 ◽

Cited By ~ 30

Author(s):

Shuang Cang ◽

Hongnian Yu

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Classification Problems ◽

Input Feature ◽

Selection For

Download Full-text

Feature selection using correlation fractal dimension: Issues and applications in binary classification problems

Applied Soft Computing ◽

10.1016/j.asoc.2007.03.007 ◽

2008 ◽

Vol 8 (1) ◽

pp. 555-563 ◽

Cited By ~ 17

Author(s):

S. Durga Bhavani ◽

T. Sobha Rani ◽

Raju S. Bapi

Keyword(s):

Feature Selection ◽

Fractal Dimension ◽

Binary Classification ◽

Classification Problems

Download Full-text

A discrete particle swarm optimization method for feature selection in binary classification problems

European Journal of Operational Research ◽

10.1016/j.ejor.2010.02.032 ◽

2010 ◽

Vol 206 (3) ◽

pp. 528-539 ◽

Cited By ~ 242

Author(s):

Alper Unler ◽

Alper Murat

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Binary Classification ◽

Particle Swarm ◽

Optimization Method ◽

Discrete Particle Swarm Optimization ◽

Classification Problems ◽

Swarm Optimization ◽

Discrete Particle ◽

Particle Swarm Optimization Method

Download Full-text

Exploring Symmetry of Binary Classification Performance Metrics

Symmetry ◽

10.3390/sym11010047 ◽

2019 ◽

Vol 11 (1) ◽

pp. 47 ◽

Cited By ~ 1

Author(s):

Amalia Luque ◽

Alejandro Carrasco ◽

Alejandro Martín ◽

Juan Ramón Lama

Keyword(s):

Performance Metrics ◽

Binary Classification ◽

Confusion Matrix ◽

Full Range ◽

Classification Performance ◽

Classification Problems ◽

Performance Metric ◽

Selection For ◽

Proper Performance ◽

Insight Into

Selecting the proper performance metric constitutes a key issue for most classification problems in the field of machine learning. Although the specialized literature has addressed several topics regarding these metrics, their symmetries have yet to be systematically studied. This research focuses on ten metrics based on a binary confusion matrix and their symmetric behaviour is formally defined under all types of transformations. Through simulated experiments, which cover the full range of datasets and classification results, the symmetric behaviour of these metrics is explored by exposing them to hundreds of simple or combined symmetric transformations. Cross-symmetries among the metrics and statistical symmetries are also explored. The results obtained show that, in all cases, three and only three types of symmetries arise: labelling inversion (between positive and negative classes); scoring inversion (concerning good and bad classifiers); and the combination of these two inversions. Additionally, certain metrics have been shown to be independent of the imbalance in the dataset and two cross-symmetries have been identified. The results regarding their symmetries reveal a deeper insight into the behaviour of various performance metrics and offer an indicator to properly interpret their values and a guide for their selection for certain specific applications.

Download Full-text

Feature Selection Algorithm Using Relative Odds for Data Mining Classification

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch005 ◽

2020 ◽

pp. 81-106 ◽

Cited By ~ 3

Author(s):

Donald Douglas Atsa'am

Keyword(s):

Feature Selection ◽

Binary Classification ◽

Initial Step ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Classification Problems ◽

Odds Ratios ◽

Relative Odds ◽

Importance Ranking ◽

Selection Algorithms

A filter feature selection algorithm is developed and its performance tested. In the initial step, the algorithm dichotomizes the dataset then separately computes the association between each predictor and the class variable using relative odds (odds ratios). The value of the odds ratios becomes the importance ranking of the corresponding explanatory variable in determining the output. Logistic regression classification is deployed to test the performance of the new algorithm in comparison with three existing feature selection algorithms: the Fisher index, Pearson's correlation, and the varImp function. A number of experimental datasets are employed, and in most cases, the subsets selected by the new algorithm produced models with higher classification accuracy than the subsets suggested by the existing feature selection algorithms. Therefore, the proposed algorithm is a reliable alternative in filter feature selection for binary classification problems.

Download Full-text