A NOVEL FEATURE SELECTION ALGORITHM WITH SUPERVISED MUTUAL INFORMATION FOR CLASSIFICATION

JAGANATHAN PALANICHAMY; KUPPUCHAMY RAMASAMY

doi:10.1142/s0218213013500279

A NOVEL FEATURE SELECTION ALGORITHM WITH SUPERVISED MUTUAL INFORMATION FOR CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500279 ◽

2013 ◽

Vol 22 (04) ◽

pp. 1350027

Author(s):

JAGANATHAN PALANICHAMY ◽

KUPPUCHAMY RAMASAMY

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class A ◽

Selection Algorithms ◽

The Relationship ◽

Class Variable

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.

Download Full-text

A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information

Computational Intelligence and Neuroscience ◽

10.1155/2021/3569632 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Li Zhang

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Information Gain ◽

Small Sample ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class Labels ◽

Minimum Interaction ◽

Classification Information ◽

Selection Algorithms

Feature selection is the key step in the analysis of high-dimensional small sample data. The core of feature selection is to analyse and quantify the correlation between features and class labels and the redundancy between features. However, most of the existing feature selection algorithms only consider the classification contribution of individual features and ignore the influence of interfeature redundancy and correlation. Therefore, this paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS) through the study and analysis of the existing feature selection algorithm ideas and method. Firstly, redundancy and relevance between features and between features and class labels are discriminated by mutual information, conditional mutual information, and interactive mutual information. Secondly, the selected features and candidate features are dynamically weighted utilizing information gain factors. Finally, to evaluate the performance of this feature selection algorithm, NDCRFS was validated against 6 other feature selection algorithms on three classifiers, using 12 different data sets, for variability and classification metrics between the different algorithms. The experimental results show that the NDCRFS method can improve the quality of the feature subsets and obtain better classification results.

Download Full-text

Machine Learning Based Supervised Feature Selection Algorithm for Data Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9483.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 3396-3401 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Learning Algorithm ◽

Modern World ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Minimum Number ◽

Preprocessing Technique

Data Scientists focus on high dimensional data to predict and reveal some interesting patterns as well as most useful information to the modern world. Feature Selection is a preprocessing technique which improves the accuracy and efficiency of mining algorithms. There exist a numerous feature selection algorithms. Most of the algorithms failed to give better mining results as the scale increases. In this paper, feature selection for supervised algorithms in data mining are considered and given an overview of existing machine learning algorithm for supervised feature selection. This paper introduces an enhanced supervised feature selection algorithm which selects the best feature subset by eliminating irrelevant features using distance correlation and redundant features using symmetric uncertainty. The experimental results show that the proposed algorithm provides better classification accuracy and selects minimum number of features.

Download Full-text

A Feature Selection Algorithm Based on Approximate Markov Blanket and Dynamic Mutual Information

Intelligent Science and Intelligent Data Engineering - Lecture Notes in Computer Science ◽

10.1007/978-3-642-31919-8_29 ◽

2012 ◽

pp. 226-233

Author(s):

Xiaodan Wang ◽

Xu Yao ◽

Yuxi Zhang ◽

Lei Lei

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Markov Blanket

Download Full-text

Research on Spam Filtering Technology Based on New Mutual Information Feature Selection Algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/1673/1/012028 ◽

2020 ◽

Vol 1673 ◽

pp. 012028

Author(s):

Kunfu Wang ◽

Wanfeng Mao ◽

Wei Feng ◽

Hui Wang

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Spam Filtering ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

Classification Performance Improvement Using Random Subset Feature Selection Algorithm for Data Mining

Big Data Research ◽

10.1016/j.bdr.2018.02.007 ◽

2018 ◽

Vol 12 ◽

pp. 1-12 ◽

Cited By ~ 7

Author(s):

Lakshmipadmaja D ◽

B. Vishnuvardhan

Keyword(s):

Data Mining ◽

Feature Selection ◽

Performance Improvement ◽

Classification Performance ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Random Subset

Download Full-text

Towards a Lightweight Detection System for Cyber Attacks in the IoT Environment Using Corresponding Features

Electronics ◽

10.3390/electronics9010144 ◽

2020 ◽

Vol 9 (1) ◽

pp. 144 ◽

Cited By ~ 6

Author(s):

Yan Naung Soe ◽

Yaokai Feng ◽

Paulus Insap Santosa ◽

Rudy Hartanto ◽

Kouichi Sakurai

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Detection System ◽

Detection Performance ◽

Cyber Attacks ◽

Raspberry Pi ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

New Feature ◽

Iot Devices

The application of a large number of Internet of Things (IoT) devices makes our life more convenient and industries more efficient. However, it also makes cyber-attacks much easier to occur because so many IoT devices are deployed and most of them do not have enough resources (i.e., computation and storage capacity) to carry out ordinary intrusion detection systems (IDSs). In this study, a lightweight machine learning-based IDS using a new feature selection algorithm is designed and implemented on Raspberry Pi, and its performance is verified using a public dataset collected from an IoT environment. To make the system lightweight, we propose a new algorithm for feature selection, called the correlated-set thresholding on gain-ratio (CST-GR) algorithm, to select really necessary features. Because the feature selection is conducted on three specific kinds of cyber-attacks, the number of selected features can be significantly reduced, which makes the classifiers very small and fast. Thus, our detection system is lightweight enough to be implemented and carried out in a Raspberry Pi system. More importantly, as the really necessary features corresponding to each kind of attack are exploited, good detection performance can be expected. The performance of our proposal is examined in detail with different machine learning algorithms, in order to learn which of them is the best option for our system. The experiment results indicate that the new feature selection algorithm can select only very few features for each kind of attack. Thus, the detection system is lightweight enough to be implemented in the Raspberry Pi environment with almost no sacrifice on detection performance.

Download Full-text

An Improved Mutual Information-Based Feature Selection Algorithm for Text Classification

2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics ◽

10.1109/ihmsc.2013.37 ◽

2013 ◽

Author(s):

Xiao-Yu Jiang ◽

Jin Shui

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Text Classification ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

WJMI: A New Feature Selection Algorithm Based on Weighted Joint Mutual Information

Proceedings of the 3rd International Conference on Mechatronics and Industrial Informatics ◽

10.2991/icmii-15.2015.108 ◽

2015 ◽

Cited By ~ 1

Author(s):

Xiuli Qi ◽

Chengxiang Yin ◽

Kai Cheng ◽

Xianglin Liao ◽

Xingdang Kang

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

New Feature

Download Full-text

Machine Learning Based Clinical Diagnosis of Liver Patients with Instance Replacement

Journal of Mobile Multimedia ◽

10.13052/jmm1550-4646.1827 ◽

2021 ◽

Author(s):

J. V. D. Prasad ◽

A. Raghuvira Pratap ◽

Babu Sallagundla

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Research Work ◽

Feature Selection Method ◽

Learning Model ◽

Disease Classification ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Huge Data ◽

Machine Learning Model

With the rapid increase in number of clinical data and hence the prediction and analysing data becomes very difficult. With the help of various machine learning models, it becomes easy to work on these huge data. A machine learning model faces lots of challenges; one among the challenge is feature selection. In this research work, we propose a novel feature selection method based on statistical procedures to increase the performance of the machine learning model. Furthermore, we have tested the feature selection algorithm in liver disease classification dataset and the results obtained shows the efficiency of the proposed method.

Download Full-text

Classification of Diabetes using Random Forest with Feature Selection Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3595.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1295-1300 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Electronic Health Records ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Health Records

Diabetes has become a serious problem now a day. So there is a need to take serious precautions to eradicate this. To eradicate, we should know the level of occurrence. In this project we predict the level of occurrence of diabetes. We predict the level of occurrence of diabetes using Random Forest, a Machine Learning Algorithm. Using the patient’s Electronic Health Records (EHR) we can build accurate models that predict the presence of diabetes.

Download Full-text