Feature Selection via Coalitional Game Theory

We present and study the contribution-selection algorithm (CSA), a novel algorithm for feature selection. The algorithm is based on the multiperturbation shapley analysis (MSA), a framework that relies on game theory to estimate usefulness. The algorithm iteratively estimates the usefulness of features and selects them accordingly, using either forward selection or backward elimination. It can optimize various performance measures over unseen data such as accuracy, balanced error rate, and area under receiver-operator-characteristic curve. Empirical comparison with several other existing feature selection methods shows that the backward elimination variant of CSA leads to the most accurate classification results on an array of data sets.

Download Full-text

Feature Selection and Analysis EEG Signals with Sequential Forward Selection Algorithm and Different Classifiers

2020 28th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu49456.2020.9302482 ◽

2020 ◽

Author(s):

Sule Bekiryazici ◽

Ahmet Demir ◽

Gunes Yilmaz

Keyword(s):

Feature Selection ◽

Selection Algorithm ◽

Forward Selection ◽

Eeg Signals ◽

Sequential Forward Selection

Download Full-text

A novel feature selection algorithm based on damping oscillation theory

PLoS ONE ◽

10.1371/journal.pone.0255307 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255307

Author(s):

Fujun Wang ◽

Xing Wang

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Euclidean Distance ◽

Oscillation Theory ◽

Feature Subset Selection ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

Selection Algorithm ◽

Filter Model

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data

BMC Genomics ◽

10.1186/s12864-020-07038-3 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Da Xu ◽

Jialin Zhang ◽

Hanxiao Xu ◽

Yusen Zhang ◽

Wei Chen ◽

...

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Therapeutic Targets ◽

Genomic Data ◽

Data Sets ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Model Learning ◽

Data Set ◽

Multi Scale

Abstract Background The small number of samples and the curse of dimensionality hamper the better application of deep learning techniques for disease classification. Additionally, the performance of clustering-based feature selection algorithms is still far from being satisfactory due to their limitation in using unsupervised learning methods. To enhance interpretability and overcome this problem, we developed a novel feature selection algorithm. In the meantime, complex genomic data brought great challenges for the identification of biomarkers and therapeutic targets. The current some feature selection methods have the problem of low sensitivity and specificity in this field. Results In this article, we designed a multi-scale clustering-based feature selection algorithm named MCBFS which simultaneously performs feature selection and model learning for genomic data analysis. The experimental results demonstrated that MCBFS is robust and effective by comparing it with seven benchmark and six state-of-the-art supervised methods on eight data sets. The visualization results and the statistical test showed that MCBFS can capture the informative genes and improve the interpretability and visualization of tumor gene expression and single-cell sequencing data. Additionally, we developed a general framework named McbfsNW using gene expression data and protein interaction data to identify robust biomarkers and therapeutic targets for diagnosis and therapy of diseases. The framework incorporates the MCBFS algorithm, network recognition ensemble algorithm and feature selection wrapper. McbfsNW has been applied to the lung adenocarcinoma (LUAD) data sets. The preliminary results demonstrated that higher prediction results can be attained by identified biomarkers on the independent LUAD data set, and we also structured a drug-target network which may be good for LUAD therapy. Conclusions The proposed novel feature selection method is robust and effective for gene selection, classification, and visualization. The framework McbfsNW is practical and helpful for the identification of biomarkers and targets on genomic data. It is believed that the same methods and principles are extensible and applicable to other different kinds of data sets.

Download Full-text

Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

Information Discovery and Delivery ◽

10.1108/idd-09-2018-0045 ◽

2019 ◽

Vol 47 (3) ◽

pp. 154-170

Author(s):

Janani Balakumar ◽

S. Vijayarani Mohan

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Classification ◽

Support Vector ◽

Data Sets ◽

Selection Algorithm ◽

Data Set ◽

Content Type ◽

Benchmark Data ◽

Bee Colony

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.

Download Full-text

Data Mining Performance of Toddler Nutrition Classification Based on Family Nutrition Awareness and Human Development Index

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e4573.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 1591-1596

Keyword(s):

Data Mining ◽

Feature Selection ◽

Health Workers ◽

Data Mining Algorithm ◽

Forward Selection ◽

Central Java ◽

Backward Elimination ◽

Best Parameter ◽

Using Data ◽

Family Nutrition

Nutrition problems that occurred in districts/cities of Central Java province from 2015-2017 were only 1 district city that did not have nutritional problems (good category) in 2015.The rest had acute, chronic or acute chronic nutrition problems. The search for the most influential attributes in toddler nutrition problems using data mining is expected to help health workers to focus more on solving problems based on classification in the area.Therefore, improving the nutritional status of the community can be accelerated. The best parameter search from the selection of features and data mining algorithm using the Optimize Parameters (Grid) operator found in Rapidminer.The feature selection models used are Backward Elimination, Forward Selection, and Optimize Selection. The datamining algorithm used is Naive Bayes, Decision Tree, k-NN, and Neural Network.The merging of the feature selection model and the datamining algorithm resulted in 12 algorithm models used in this study.The best model that was processed using test data with the highest accuracy of 74.19% was obtained from backward-neural network elimination. The attribute that is not very influential based on the model obtained is the condition of the mother who died.

Download Full-text

Radiomics Is Effective for Distinguishing Coronavirus Disease 2019 Pneumonia From Influenza Virus Pneumonia

Frontiers in Public Health ◽

10.3389/fpubh.2021.663965 ◽

2021 ◽

Vol 9 ◽

Author(s):

Liaoyi Lin ◽

Jinjin Liu ◽

Qingshan Deng ◽

Na Li ◽

Jingye Pan ◽

...

Keyword(s):

Influenza Virus ◽

Characteristic Curve ◽

Operator Method ◽

Training Sample ◽

Receiver Operator Characteristic Curve ◽

Stepwise Logistic Regression ◽

Data Sets ◽

Validation Data ◽

Virus Pneumonia ◽

Selection Operator

Objectives: To develop and validate a radiomics model for distinguishing coronavirus disease 2019 (COVID-19) pneumonia from influenza virus pneumonia.Materials and Methods: A radiomics model was developed on the basis of 56 patients with COVID-19 pneumonia and 90 patients with influenza virus pneumonia in this retrospective study. Radiomics features were extracted from CT images. The radiomics features were reduced by the Max-Relevance and Min-Redundancy algorithm and the least absolute shrinkage and selection operator method. The radiomics model was built using the multivariate backward stepwise logistic regression. A nomogram of the radiomics model was established, and the decision curve showed the clinical usefulness of the radiomics nomogram.Results: The radiomics features, consisting of nine selected features, were significantly different between COVID-19 pneumonia and influenza virus pneumonia in both training and validation data sets. The receiver operator characteristic curve of the radiomics model showed good discrimination in the training sample [area under the receiver operating characteristic curve (AUC), 0.909; 95% confidence interval (CI), 0.859–0.958] and in the validation sample (AUC, 0.911; 95% CI, 0.753–1.000). The nomogram was established and had good calibration. Decision curve analysis showed that the radiomics nomogram was clinically useful.Conclusions: The radiomics model has good performance for distinguishing COVID-19 pneumonia from influenza virus pneumonia and may aid in the diagnosis of COVID-19 pneumonia.

Download Full-text

UNSUPERVISED FEATURE SELECTION USING INCREMENTAL LEAST SQUARES

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622011004671 ◽

2011 ◽

Vol 10 (06) ◽

pp. 967-987 ◽

Cited By ~ 14

Author(s):

RONG LIU ◽

ROBERT RALLO ◽

YORAM COHEN

Keyword(s):

Feature Selection ◽

Least Squares ◽

Selection Process ◽

Real Life ◽

Feature Selection Method ◽

Least Square ◽

Feature Subset ◽

Selection Algorithm ◽

Forward Selection ◽

Unsupervised Feature Selection

An unsupervised feature selection method is proposed for analysis of datasets of high dimensionality. The least square error (LSE) of approximating the complete dataset via a reduced feature subset is proposed as the quality measure for feature selection. Guided by the minimization of the LSE, a kernel least squares forward selection algorithm (KLS-FS) is developed that is capable of both linear and non-linear feature selection. An incremental LSE computation is designed to accelerate the selection process and, therefore, enhances the scalability of KLS-FS to high-dimensional datasets. The superiority of the proposed feature selection algorithm, in terms of keeping principal data structures, learning performances in classification and clustering applications, and robustness, is demonstrated using various real-life datasets of different sizes and dimensions.

Download Full-text

Feature Selection Based on Ant Colony Optimization for Image Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.319.337 ◽

2013 ◽

Vol 319 ◽

pp. 337-342

Author(s):

Li Tu ◽

Li Zhi Yang

Keyword(s):

Feature Selection ◽

Image Classification ◽

Ant Colony Optimization ◽

Classification Accuracy ◽

Image Data ◽

Ant Colony ◽

Data Sets ◽

Classification Rules ◽

Selection Algorithm ◽

Artificial Ants

In this paper, a feature selection algorithm based on ant colony optimization (ACO) is presented to construct classification rules for image classification. Most existing ACO-based algorithms use the graph with O(n2) edges. In contrast, the artificial ants in the proposed algorithm FSC-ACO traverse on a feature graph with only O(n) edges. During the process of feature selection, ants construct the classification rules for each class according to the improved pheromone and heuristic functions. FSC-ACO improves the qualities of rules depend on the classification accuracy and the length of rules. The experimental results on both standard and real image data sets show that the proposed algorithm can outperform the other related methods with fewer features in terms of speed, recall and classification accuracy.

Download Full-text

Feature Selection Based on Machine Learning in MRIs for Hippocampal Segmentation

Computational and Mathematical Methods in Medicine ◽

10.1155/2015/814104 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 18

Author(s):

Sabina Tangaro ◽

Nicola Amoroso ◽

Massimo Brescia ◽

Stefano Cavuoti ◽

Andrea Chincarini ◽

...

Keyword(s):

Feature Selection ◽

Neurodegenerative Diseases ◽

Structural Changes ◽

Brain Magnetic Resonance Imaging ◽

Independent Set ◽

Filter Method ◽

Forward Selection ◽

Backward Elimination ◽

Mri Scans ◽

Embedded Method

Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic resonance imaging (MRI) scans can show these variations and therefore can be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust, and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach; for each voxel a number of local features were calculated. In this paper, we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) sequential forward selection and (iii) sequential backward elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects. The resulting segmentations were compared with manual reference labelling. By using only 23 feature for each voxel (sequential backward elimination) we obtained comparable state-of-the-art performances with respect to the standard tool FreeSurfer.

Download Full-text

Feature Selection with Sequential Forward Selection Algorithm from Emotion Estimation based on EEG Signals

Sakarya University Journal of Science ◽

10.16984/saufenbilder.501799 ◽

2019 ◽

pp. 1096-1105 ◽

Cited By ~ 1

Author(s):

Talha Burak ALAKUŞ ◽

İbrahim TÜRKOĞLU

Keyword(s):

Feature Selection ◽

Selection Algorithm ◽

Forward Selection ◽

Eeg Signals ◽

Sequential Forward Selection ◽

Emotion Estimation

Download Full-text