Correlation Based Feature Selection Algorithm for Machine Learning

The application of a large number of Internet of Things (IoT) devices makes our life more convenient and industries more efficient. However, it also makes cyber-attacks much easier to occur because so many IoT devices are deployed and most of them do not have enough resources (i.e., computation and storage capacity) to carry out ordinary intrusion detection systems (IDSs). In this study, a lightweight machine learning-based IDS using a new feature selection algorithm is designed and implemented on Raspberry Pi, and its performance is verified using a public dataset collected from an IoT environment. To make the system lightweight, we propose a new algorithm for feature selection, called the correlated-set thresholding on gain-ratio (CST-GR) algorithm, to select really necessary features. Because the feature selection is conducted on three specific kinds of cyber-attacks, the number of selected features can be significantly reduced, which makes the classifiers very small and fast. Thus, our detection system is lightweight enough to be implemented and carried out in a Raspberry Pi system. More importantly, as the really necessary features corresponding to each kind of attack are exploited, good detection performance can be expected. The performance of our proposal is examined in detail with different machine learning algorithms, in order to learn which of them is the best option for our system. The experiment results indicate that the new feature selection algorithm can select only very few features for each kind of attack. Thus, the detection system is lightweight enough to be implemented in the Raspberry Pi environment with almost no sacrifice on detection performance.

Download Full-text

A NOVEL FEATURE SELECTION ALGORITHM WITH SUPERVISED MUTUAL INFORMATION FOR CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500279 ◽

2013 ◽

Vol 22 (04) ◽

pp. 1350027

Author(s):

JAGANATHAN PALANICHAMY ◽

KUPPUCHAMY RAMASAMY

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class A ◽

Selection Algorithms ◽

The Relationship ◽

Class Variable

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.

Download Full-text

Machine Learning Based Clinical Diagnosis of Liver Patients with Instance Replacement

Journal of Mobile Multimedia ◽

10.13052/jmm1550-4646.1827 ◽

2021 ◽

Author(s):

J. V. D. Prasad ◽

A. Raghuvira Pratap ◽

Babu Sallagundla

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Research Work ◽

Feature Selection Method ◽

Learning Model ◽

Disease Classification ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Huge Data ◽

Machine Learning Model

With the rapid increase in number of clinical data and hence the prediction and analysing data becomes very difficult. With the help of various machine learning models, it becomes easy to work on these huge data. A machine learning model faces lots of challenges; one among the challenge is feature selection. In this research work, we propose a novel feature selection method based on statistical procedures to increase the performance of the machine learning model. Furthermore, we have tested the feature selection algorithm in liver disease classification dataset and the results obtained shows the efficiency of the proposed method.

Download Full-text

Classification of Diabetes using Random Forest with Feature Selection Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3595.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1295-1300 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Electronic Health Records ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Health Records

Diabetes has become a serious problem now a day. So there is a need to take serious precautions to eradicate this. To eradicate, we should know the level of occurrence. In this project we predict the level of occurrence of diabetes. We predict the level of occurrence of diabetes using Random Forest, a Machine Learning Algorithm. Using the patient’s Electronic Health Records (EHR) we can build accurate models that predict the presence of diabetes.

Download Full-text

Dominant Feature Selection and Machine Learning-Based Hybrid Approach to Analyze Android Ransomware

Security and Communication Networks ◽

10.1155/2021/7035233 ◽

2021 ◽

Vol 2021 ◽

pp. 1-22

Author(s):

Tanya Gera ◽

Jaiteg Singh ◽

Abolfazl Mehbodniya ◽

Julian L. Webber ◽

Mohammad Shabaz ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Hybrid Approach ◽

Machine Learning Algorithms ◽

Dynamic Monitoring ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Detection Techniques ◽

Dominant Feature

Ransomware is a special malware designed to extort money in return for unlocking the device and personal data files. Smartphone users store their personal as well as official data on these devices. Ransomware attackers found it bewitching for their financial benefits. The financial losses due to ransomware attacks are increasing rapidly. Recent studies witness that out of 87% reported cyber-attacks, 41% are due to ransomware attacks. The inability of application-signature-based solutions to detect unknown malware has inspired many researchers to build automated classification models using machine learning algorithms. Advanced malware is capable of delaying malicious actions on sensing the emulated environment and hence posing a challenge to dynamic monitoring of applications also. Existing hybrid approaches utilize a variety of features combination for detection and analysis. The rapidly changing nature and distribution strategies are possible reasons behind the deteriorated performance of primitive ransomware detection techniques. The limitations of existing studies include ambiguity in selecting the features set. Increasing the feature set may lead to freedom of adept attackers against learning algorithms. In this work, we intend to propose a hybrid approach to identify and mitigate Android ransomware. This study employs a novel dominant feature selection algorithm to extract the dominant feature set. The experimental results show that our proposed model can differentiate between clean and ransomware with improved precision. Our proposed hybrid solution confirms an accuracy of 99.85% with zero false positives while considering 60 prominent features. Further, it also justifies the feature selection algorithm used. The comparison of the proposed method with the existing frameworks indicates its better performance.

Download Full-text

A Correlation-Based Feature Selection Algorithm for Operating Data of Nuclear Power Plants

Science and Technology of Nuclear Installations ◽

10.1155/2021/9994340 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Yuxuan He ◽

Hongxing Yu ◽

Ren Yu ◽

Jian Song ◽

Haibo Lian ◽

...

Keyword(s):

Feature Selection ◽

Power Plant ◽

Nuclear Power Plant ◽

Dimensionality Reduction ◽

Nuclear Power ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Operating Data ◽

Comparison Results ◽

Correlation Based Feature Selection

Nuclear power plant operating data are characterized by a large variety, strong coupling, and low data value density. When using machine learning techniques for fault diagnosis and other related research, feature selection enables dimensionality reduction while maintaining the physical meaning of the original features, thus improving the computational efficiency and generalization ability of the learning model. In this paper, a correlation-based feature selection algorithm is developed to implement feature selection of nuclear power plant operating data. The proposed algorithm is verified by experiments and compared with traditional correlation-based feature selection algorithms. The experiments and comparison results show that the proposed algorithm is effective in realizing the dimensionality reduction of nuclear power plant operating data.

Download Full-text

Eta Correlation Coefficient Based Feature Selection Algorithm for Machine Learning: E-Score Feature Selection Algorithm

Journal of Intelligent Systems: Theory and Applications ◽

10.38016/jista.498799 ◽

2019 ◽

Vol 2 (1) ◽

pp. 7-12

Author(s):

Muhammed Kürşad UÇAR

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Correlation Coefficient ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

Artificial bee colony feature selection algorithm combined with machine learning algorithms to predict vertical and lateral distribution of soil organic matter in South Dakota, USA

Carbon Management ◽

10.1080/17583004.2017.1330593 ◽

2017 ◽

Vol 8 (3) ◽

pp. 277-291 ◽

Cited By ~ 14

Author(s):

Ruhollah Taghizadeh-Mehrjardi ◽

Ram Neupane ◽

Kunal Sood ◽

Sandeep Kumar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Organic Matter ◽

Soil Organic Matter ◽

South Dakota ◽

Artificial Bee Colony ◽

Machine Learning Algorithms ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Bee Colony

Download Full-text

Machine Learning Based Supervised Feature Selection Algorithm for Data Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9483.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 3396-3401 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Learning Algorithm ◽

Modern World ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Minimum Number ◽

Preprocessing Technique

Data Scientists focus on high dimensional data to predict and reveal some interesting patterns as well as most useful information to the modern world. Feature Selection is a preprocessing technique which improves the accuracy and efficiency of mining algorithms. There exist a numerous feature selection algorithms. Most of the algorithms failed to give better mining results as the scale increases. In this paper, feature selection for supervised algorithms in data mining are considered and given an overview of existing machine learning algorithm for supervised feature selection. This paper introduces an enhanced supervised feature selection algorithm which selects the best feature subset by eliminating irrelevant features using distance correlation and redundant features using symmetric uncertainty. The experimental results show that the proposed algorithm provides better classification accuracy and selects minimum number of features.

Download Full-text