An Optimal Categorization of Feature Selection Methods for Knowledge Discovery

2011 ◽

pp. 94-108 ◽

Cited By ~ 4

Author(s):

Harleen Kaur ◽

Ritu Chauhan ◽

M. Alam

Keyword(s):

Data Mining ◽

Feature Selection ◽

Discriminant Analysis ◽

Medical Data ◽

Stepwise Discriminant Analysis ◽

Selection Methods ◽

Medical Databases ◽

Active Research ◽

Potential Improvement ◽

Large Effort

With the continuous availability of massive experimental medical data has given impetus to a large effort in developing mathematical, statistical and computational intelligent techniques to infer models from medical databases. Feature selection has been an active research area in pattern recognition, statistics, and data mining communities. However, there have been relatively few studies on preprocessing data used as input for data mining systems in medical data. In this chapter, the authors focus on several feature selection methods as to their effectiveness in preprocessing input medical data. They evaluate several feature selection algorithms such as Mutual Information Feature Selection (MIFS), Fast Correlation-Based Filter (FCBF) and Stepwise Discriminant Analysis (STEPDISC) with machine learning algorithm naive Bayesian and Linear Discriminant analysis techniques. The experimental analysis of feature selection technique in medical databases has enable the authors to find small number of informative features leading to potential improvement in medical diagnosis by reducing the size of data set, eliminating irrelevant features, and decreasing the processing time.

Download Full-text

BETTER ALTERNATIVES FOR STEPWISE DISCRIMINANT ANALYSIS

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.311.02 ◽

2015 ◽

Vol 1 (311) ◽

Author(s):

Katarzyna Stąpor

Keyword(s):

Feature Selection ◽

Discriminant Analysis ◽

Tabu Search ◽

Stepwise Discriminant Analysis ◽

Selection Methods ◽

Discrimination Power ◽

Statistical Software ◽

Software Packages ◽

Benchmark Datasets

Discriminant Analysis can best be defined as a technique which allows the classification of an individual into several dictinctive populations on the basis of a set of measurements. Stepwise discriminant analysis (SDA) is concerned with selecting the most important variables whilst retaining the highest discrimination power possible. The process of selecting a smaller number of variables is often necessary for a variety number of reasons. In the existing statistical software packages SDA is based on the classic feature selection methods. Many problems with such stepwise procedures have been identified. In this work the new method based on the metaheuristic strategy tabu search will be presented together with the experimental results conducted on the selected benchmark datasets. The results are promising.

Download Full-text

Dimension Reduction for Objects Composed of Vector Sets

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2017-0012 ◽

2017 ◽

Vol 27 (1) ◽

pp. 169-180 ◽

Cited By ~ 1

Author(s):

Marton Szemenyei ◽

Ferenc Vajda

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Discriminant Analysis ◽

Probability Distribution ◽

Dimension Reduction ◽

Pose Estimation ◽

Real World ◽

Single Object ◽

Real World Datasets

Abstract Dimension reduction and feature selection are fundamental tools for machine learning and data mining. Most existing methods, however, assume that objects are represented by a single vectorial descriptor. In reality, some description methods assign unordered sets or graphs of vectors to a single object, where each vector is assumed to have the same number of dimensions, but is drawn from a different probability distribution. Moreover, some applications (such as pose estimation) may require the recognition of individual vectors (nodes) of an object. In such cases it is essential that the nodes within a single object remain distinguishable after dimension reduction. In this paper we propose new discriminant analysis methods that are able to satisfy two criteria at the same time: separating between classes and between the nodes of an object instance. We analyze and evaluate our methods on several different synthetic and real-world datasets.

Download Full-text

Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures

BioMed Research International ◽

10.1155/2019/2497509 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Suyan Tian ◽

Chi Wang ◽

Bing Wang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Gene Selection ◽

Selection Process ◽

Biological Knowledge ◽

Expression Data ◽

Selection Methods ◽

Its Gene ◽

Active Research

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable.

Download Full-text

Benchmarking relief-based feature selection methods for bioinformatics data mining

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2018.07.015 ◽

2018 ◽

Vol 85 ◽

pp. 168-188 ◽

Cited By ~ 45

Author(s):

Ryan J. Urbanowicz ◽

Randal S. Olson ◽

Peter Schmitt ◽

Melissa Meeker ◽

Jason H. Moore

Keyword(s):

Data Mining ◽

Feature Selection ◽

Selection Methods

Download Full-text

Feature Selection Methods on Biological Knowledge Discovery and Data Mining: A Survey

2014 25th International Workshop on Database and Expert Systems Applications ◽

10.1109/dexa.2014.26 ◽

2014 ◽

Cited By ~ 4

Author(s):

Hanen Mhamdi ◽

Faouzi Mhamdi

Keyword(s):

Data Mining ◽

Feature Selection ◽

Knowledge Discovery ◽

Biological Knowledge ◽

Selection Methods

Download Full-text

Application of Feature Selection Methods in Educational Data Mining

International Journal of Computer Applications ◽

10.5120/18048-8951 ◽

2014 ◽

Vol 103 (2) ◽

pp. 34-38 ◽

Cited By ~ 5

Author(s):

Anal Acharya ◽

Devadatta Sinha

Keyword(s):

Data Mining ◽

Feature Selection ◽

Educational Data Mining ◽

Selection Methods

Download Full-text

Klasifikasi Diabetes Menggunakan Model Pembelajaran Ensemble Blending

Jurnal ULTIMATICS ◽

10.31937/ti.v10i1.709 ◽

2018 ◽

Vol 10 (1) ◽

pp. 11-15

Author(s):

Vinnia Kemala Putri ◽

Felix Indra Kurniadi

Keyword(s):

Data Mining ◽

Ensemble Classifier ◽

Medical Data ◽

Support Vector ◽

Prediction Problem ◽

Medical Databases ◽

Data Mining Approach ◽

Blending Method ◽

Diabetes Prediction ◽

Index Terms

Diabetes mellitus is one of the deadliest disease and it is increasing in occurrence through the world. This can be prevented by conducting early diagnosis and treatment. However, in developing countries, less than half of people with diabetes are diagnosed correctly which lead to lose of human lives. In this Big Data era, medical databases have enormous quantities of data about their patients. But this medical data may contain noise and a lot of useless information which may mislead the expert in making a decision for medical diagnosis. Data mining is a technique to that is very effective for medical applications for identifying patterns and extracting useful information for databases. This paper proposed a data mining approach using an ensemble blending method to tackle a diabetes prediction problem in Pima Indian Diabetes Dataset. We proposed a blending ensemble classifier approach using a combination of Decision Tree and Logistic Regression as base classifiers, and Support Vector Machine as a top blender classifier. Our approach reached accuracy of 81% and F1-score of 0.81 proves to be higher when compared with basic classifier without combination. Index Terms—diabetes, ensemble, data mining

Download Full-text

Improving the Intrusion Detection using Discriminative Machine Learning Approach and Improve the Time Complexity by Data Mining Feature Selection Methods

International Journal of Computer Applications ◽

10.5120/13209-0587 ◽

2013 ◽

Vol 76 (1) ◽

pp. 5-11 ◽

Cited By ~ 14

Author(s):

Karan Bajaj ◽

Amit Arora

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Intrusion Detection ◽

Time Complexity ◽

Learning Approach ◽

Selection Methods ◽

Machine Learning Approach

Download Full-text

An improved Random Forest based on Feature Selection and Feature weighting for case retrieval in CBR system Application to medical data

International Journal of Software Innovation ◽

10.4018/ijsi.293265 ◽

2022 ◽

Vol 10 (1) ◽

pp. 0-0

Keyword(s):

Feature Selection ◽

Random Forest ◽

Medical Data ◽

Feature Weighting ◽

Diagnostic Process ◽

Case Based Reasoning ◽

Medical Databases ◽

Medical Diagnostic ◽

Past Experiences ◽

Retrieval Phase

: The medical diagnostic process works very similarly to the Case Based Reasoning (CBR) cycle scheme. CBR is a problem solving approach based on the reuse of past experiences called cases. To improve the performance of the retrieval phase, a Random Forest (RF) model is proposed, in this respect we used this algorithm in three different ways (three different algorithms): Classic Random Forest (CRF) algorithm, Random Forest with Feature Selection (RF_FS) algorithm where we selected the most important attributes and deleted the less important ones and Weighted Random Forest (WRF) algorithm where we weighted the most important attributes by giving them more weight. We did this by multiplying the entropy with the weight corresponding to each attribute.We tested our three algorithms CRF, RF_FS and WRF with CBR on data from 11 medical databases and compared the results they produced. We found that WRF and RF_FS give better results than CRF. The experiemental results show the performance and robustess of the proposed approach.

Download Full-text