Binary Sine Cosine Algorithms for Feature Selection from Medical Data

Due to high-dimensional feature and strong correlation of features, the classification accuracy of medical data is not as good enough as expected. feature selection is a common algorithm to solve this problem, and selects effective features by reducing the dimensionality of high-dimensional data. However, traditional feature selection algorithms have the blindness of threshold setting and the search algorithms are liable to fall into a local optimal solution. Based on it, this paper proposes a hybrid feature selection algorithm combining ReliefF and Particle swarm optimization. The algorithm is mainly divided into three parts: Firstly, the ReliefF is used to calculate the feature weight, and the features are ranked by the weight. Then ranking feature is grouped according to the density equalization, where the density of features in each group is the same. Finally, the Particle Swarm Optimization algorithm is used to search the ranking feature groups, and the feature selection is performed according to a new fitness function. Experimental results show that the random forest has the highest classification accuracy on the features selected. More importantly, it has the least number of features. In addition, experimental results on 2 medical datasets show that the average accuracy of random forest reaches 90.20%, which proves that the hybrid algorithm has a certain application value.

Download Full-text

An Optimal Categorization of Feature Selection Methods for Knowledge Discovery

Data Mining ◽

10.4018/978-1-4666-2455-9.ch005 ◽

2013 ◽

pp. 92-106

Author(s):

Harleen Kaur ◽

Ritu Chauhan ◽

M. Alam

Keyword(s):

Data Mining ◽

Feature Selection ◽

Discriminant Analysis ◽

Medical Data ◽

Stepwise Discriminant Analysis ◽

Selection Methods ◽

Medical Databases ◽

Active Research ◽

Potential Improvement ◽

Large Effort

With the continuous availability of massive experimental medical data has given impetus to a large effort in developing mathematical, statistical and computational intelligent techniques to infer models from medical databases. Feature selection has been an active research area in pattern recognition, statistics, and data mining communities. However, there have been relatively few studies on preprocessing data used as input for data mining systems in medical data. In this chapter, the authors focus on several feature selection methods as to their effectiveness in preprocessing input medical data. They evaluate several feature selection algorithms such as Mutual Information Feature Selection (MIFS), Fast Correlation-Based Filter (FCBF) and Stepwise Discriminant Analysis (STEPDISC) with machine learning algorithm naive Bayesian and Linear Discriminant analysis techniques. The experimental analysis of feature selection technique in medical databases has enable the authors to find small number of informative features leading to potential improvement in medical diagnosis by reducing the size of data set, eliminating irrelevant features, and decreasing the processing time.

Download Full-text

Undiagnosed samples aided rough set feature selection for medical data

2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing ◽

10.1109/pdgc.2012.6449895 ◽

2012 ◽

Author(s):

Donghai Guan ◽

Weiwei Yuan ◽

Zilong Jin ◽

Sungyoung Lee

Keyword(s):

Feature Selection ◽

Rough Set ◽

Medical Data ◽

Selection For

Download Full-text

A Genetic Programming Approach Applied to Feature Selection from Medical Data

Practical Applications of Computational Biology and Bioinformatics, 12th International Conference - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-98702-6_24 ◽

2018 ◽

pp. 200-207 ◽

Cited By ~ 1

Author(s):

José A. Castellanos-Garzón ◽

Juan Ramos ◽

Yeray Mezquita Martín ◽

Juan F. de Paz ◽

Ernesto Costa

Keyword(s):

Feature Selection ◽

Genetic Programming ◽

Medical Data ◽

Programming Approach

Download Full-text

Improved Measures of Redundancy and Relevance for mRMR Feature Selection

Computers ◽

10.3390/computers8020042 ◽

2019 ◽

Vol 8 (2) ◽

pp. 42 ◽

Cited By ~ 1

Author(s):

Insik Jo ◽

Sangbum Lee ◽

Sejong Oh

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Computing Time ◽

Performance Comparison ◽

Medical Data ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Learning Tasks ◽

R Value ◽

Measure Of Performance

Many biological or medical data have numerous features. Feature selection is one of the data preprocessing steps that can remove the noise from data as well as save the computing time when the dataset has several hundred thousand or more features. Another goal of feature selection is improving the classification accuracy in machine learning tasks. Minimum Redundancy Maximum Relevance (mRMR) is a well-known feature selection algorithm that selects features by calculating redundancy between features and relevance between features and class vector. mRMR adopts mutual information theory to measure redundancy and relevance. In this research, we propose a method to improve the performance of mRMR feature selection. We apply Pearson’s correlation coefficient as a measure of redundancy and R-value as a measure of relevance. To compare original mRMR and the proposed method, features were selected using both of two methods from various datasets, and then we performed a classification test. The classification accuracy was used as a measure of performance comparison. In many cases, the proposed method showed higher accuracy than original mRMR.

Download Full-text

Comparison of feature selection algorithms for medical data

2012 International Symposium on Innovations in Intelligent Systems and Applications ◽

10.1109/inista.2012.6247011 ◽

2012 ◽

Cited By ~ 7

Author(s):

H. Dag ◽

K. E. Sayin ◽

I. Yenidogan ◽

S. Albayrak ◽

C. Acar

Keyword(s):

Feature Selection ◽

Medical Data ◽

Selection Algorithms

Download Full-text

Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2016.07.004 ◽

2016 ◽

Vol 72 ◽

pp. 1-11 ◽

Cited By ~ 15

Author(s):

Sebastian Pölsterl ◽

Sailesh Conjeti ◽

Nassir Navab ◽

Amin Katouzian

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Survival Analysis ◽

Medical Data ◽

High Dimensional

Download Full-text

Cost-sensitive feature selection in medical data analysis with trace ratio criterion

2014 12th International Conference on Signal Processing (ICSP) ◽

10.1109/icosp.2014.7015169 ◽

2014 ◽

Cited By ~ 1

Author(s):

Chao Li ◽

Cen Shi ◽

Huan Zhang ◽

Chun Hui ◽

Kin-Man Lam ◽

...

Keyword(s):

Feature Selection ◽

Data Analysis ◽

Medical Data ◽

Ratio Criterion ◽

Trace Ratio

Download Full-text

A Hybrid Approach for Cases Classification of Medical Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.496-500.1965 ◽

2014 ◽

Vol 496-500 ◽

pp. 1965-1970

Author(s):

Xiao Yu Chen ◽

Bo Liu ◽

Xin Xia

Keyword(s):

Feature Selection ◽

Hybrid Approach ◽

Disease Diagnosis ◽

Medical Data ◽

Emission Computed Tomography ◽

Proton Emission ◽

Clinical Theory ◽

Hybrid Classification ◽

C5.0 Decision Tree

Classification of cases has been widely applied in medicine, and it is helpful to disease diagnosis to a great extent. At present, the classification of medical cases is performed by physicians subjectively based on clinical theory and knowledge, which may hinder the diagnosis and treatment in some extent. In this paper, a hybrid classification approach (HCA) is proposed for medical data, it consists of two parts, including feature selection and classification. In feature selection, critical features are selected from the original features through linear correlation. Based on the selected features, cases are classified by C5.0 decision tree. And the proposed approach is evaluated through four medical datasets of diabetes, cardiac Single Proton Emission Computed Tomography (SPECT) images, lung cancer, and hepatitis survival for demonstration. On the four datasets, HCA shows a better construction for obviously higher classification accuracies, and it also outperforms some typical integrated classification methods.

Download Full-text