Machine Learning Based Supervised Feature Selection Algorithm for Data Mining

Data Scientists focus on high dimensional data to predict and reveal some interesting patterns as well as most useful information to the modern world. Feature Selection is a preprocessing technique which improves the accuracy and efficiency of mining algorithms. There exist a numerous feature selection algorithms. Most of the algorithms failed to give better mining results as the scale increases. In this paper, feature selection for supervised algorithms in data mining are considered and given an overview of existing machine learning algorithm for supervised feature selection. This paper introduces an enhanced supervised feature selection algorithm which selects the best feature subset by eliminating irrelevant features using distance correlation and redundant features using symmetric uncertainty. The experimental results show that the proposed algorithm provides better classification accuracy and selects minimum number of features.

Download Full-text

A NOVEL FEATURE SELECTION ALGORITHM WITH SUPERVISED MUTUAL INFORMATION FOR CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500279 ◽

2013 ◽

Vol 22 (04) ◽

pp. 1350027

Author(s):

JAGANATHAN PALANICHAMY ◽

KUPPUCHAMY RAMASAMY

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class A ◽

Selection Algorithms ◽

The Relationship ◽

Class Variable

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.

Download Full-text

Classification of Diabetes using Random Forest with Feature Selection Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3595.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1295-1300 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Electronic Health Records ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Health Records

Diabetes has become a serious problem now a day. So there is a need to take serious precautions to eradicate this. To eradicate, we should know the level of occurrence. In this project we predict the level of occurrence of diabetes. We predict the level of occurrence of diabetes using Random Forest, a Machine Learning Algorithm. Using the patient’s Electronic Health Records (EHR) we can build accurate models that predict the presence of diabetes.

Download Full-text

SVM and KNN Based SGO Feature Selection Algorithm for Breast Cancer Diagnosis

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4428.038620 ◽

2020 ◽

Vol 8 (2S7) ◽

pp. 2237-2240

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Subset Selection ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm

In diagnosis and prediction systems, algorithms working on datasets with a high number of dimensions tend to take more time than those with fewer dimensions. Feature subset selection algorithms enhance the efficiency of Machine Learning algorithms in prediction problems by selecting a subset of the total features and thus pruning redundancy and noise. In this article, such a feature subset selection method is proposed and implemented to diagnose breast cancer using Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) algorithms. This feature selection algorithm is based on Social Group Optimization (SGO) an evolutionary algorithm. Higher accuracy in diagnosing breast cancer is achieved using our proposed model when compared to other feature selection-based Machine Learning algorithms

Download Full-text

Classification Performance Improvement Using Random Subset Feature Selection Algorithm for Data Mining

Big Data Research ◽

10.1016/j.bdr.2018.02.007 ◽

2018 ◽

Vol 12 ◽

pp. 1-12 ◽

Cited By ~ 7

Author(s):

Lakshmipadmaja D ◽

B. Vishnuvardhan

Keyword(s):

Data Mining ◽

Feature Selection ◽

Performance Improvement ◽

Classification Performance ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Random Subset

Download Full-text

Modulation Recognition of Digital Multimedia Signal Based on Data Feature Selection

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2017070107 ◽

2017 ◽

Vol 8 (3) ◽

pp. 90-111 ◽

Cited By ~ 2

Author(s):

Hui Wang ◽

Li Li Guo ◽

Yun Lin

Keyword(s):

Feature Selection ◽

Information Entropy ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Modulation Recognition ◽

Signal Modulation ◽

Digital Multimedia ◽

Optimal Feature Subset ◽

Optimal Feature

Automatic modulation recognition is very important for the receiver design in the broadband multimedia communication system, and the reasonable signal feature extraction and selection algorithm is the key technology of Digital multimedia signal recognition. In this paper, the information entropy is used to extract the single feature, which are power spectrum entropy, wavelet energy spectrum entropy, singular spectrum entropy and Renyi entropy. And then, the feature selection algorithm of distance measurement and Sequential Feature Selection(SFS) are presented to select the optimal feature subset. Finally, the BP neural network is used to classify the signal modulation. The simulation result shows that the four-different information entropy can be used to classify different signal modulation, and the feature selection algorithm is successfully used to choose the optimal feature subset and get the best performance.

Download Full-text

Towards a Lightweight Detection System for Cyber Attacks in the IoT Environment Using Corresponding Features

Electronics ◽

10.3390/electronics9010144 ◽

2020 ◽

Vol 9 (1) ◽

pp. 144 ◽

Cited By ~ 6

Author(s):

Yan Naung Soe ◽

Yaokai Feng ◽

Paulus Insap Santosa ◽

Rudy Hartanto ◽

Kouichi Sakurai

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Detection System ◽

Detection Performance ◽

Cyber Attacks ◽

Raspberry Pi ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

New Feature ◽

Iot Devices

The application of a large number of Internet of Things (IoT) devices makes our life more convenient and industries more efficient. However, it also makes cyber-attacks much easier to occur because so many IoT devices are deployed and most of them do not have enough resources (i.e., computation and storage capacity) to carry out ordinary intrusion detection systems (IDSs). In this study, a lightweight machine learning-based IDS using a new feature selection algorithm is designed and implemented on Raspberry Pi, and its performance is verified using a public dataset collected from an IoT environment. To make the system lightweight, we propose a new algorithm for feature selection, called the correlated-set thresholding on gain-ratio (CST-GR) algorithm, to select really necessary features. Because the feature selection is conducted on three specific kinds of cyber-attacks, the number of selected features can be significantly reduced, which makes the classifiers very small and fast. Thus, our detection system is lightweight enough to be implemented and carried out in a Raspberry Pi system. More importantly, as the really necessary features corresponding to each kind of attack are exploited, good detection performance can be expected. The performance of our proposal is examined in detail with different machine learning algorithms, in order to learn which of them is the best option for our system. The experiment results indicate that the new feature selection algorithm can select only very few features for each kind of attack. Thus, the detection system is lightweight enough to be implemented in the Raspberry Pi environment with almost no sacrifice on detection performance.

Download Full-text

Machine Learning Based Clinical Diagnosis of Liver Patients with Instance Replacement

Journal of Mobile Multimedia ◽

10.13052/jmm1550-4646.1827 ◽

2021 ◽

Author(s):

J. V. D. Prasad ◽

A. Raghuvira Pratap ◽

Babu Sallagundla

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Research Work ◽

Feature Selection Method ◽

Learning Model ◽

Disease Classification ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Huge Data ◽

Machine Learning Model

With the rapid increase in number of clinical data and hence the prediction and analysing data becomes very difficult. With the help of various machine learning models, it becomes easy to work on these huge data. A machine learning model faces lots of challenges; one among the challenge is feature selection. In this research work, we propose a novel feature selection method based on statistical procedures to increase the performance of the machine learning model. Furthermore, we have tested the feature selection algorithm in liver disease classification dataset and the results obtained shows the efficiency of the proposed method.

Download Full-text

Dominant Feature Selection and Machine Learning-Based Hybrid Approach to Analyze Android Ransomware

Security and Communication Networks ◽

10.1155/2021/7035233 ◽

2021 ◽

Vol 2021 ◽

pp. 1-22

Author(s):

Tanya Gera ◽

Jaiteg Singh ◽

Abolfazl Mehbodniya ◽

Julian L. Webber ◽

Mohammad Shabaz ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Hybrid Approach ◽

Machine Learning Algorithms ◽

Dynamic Monitoring ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Detection Techniques ◽

Dominant Feature

Ransomware is a special malware designed to extort money in return for unlocking the device and personal data files. Smartphone users store their personal as well as official data on these devices. Ransomware attackers found it bewitching for their financial benefits. The financial losses due to ransomware attacks are increasing rapidly. Recent studies witness that out of 87% reported cyber-attacks, 41% are due to ransomware attacks. The inability of application-signature-based solutions to detect unknown malware has inspired many researchers to build automated classification models using machine learning algorithms. Advanced malware is capable of delaying malicious actions on sensing the emulated environment and hence posing a challenge to dynamic monitoring of applications also. Existing hybrid approaches utilize a variety of features combination for detection and analysis. The rapidly changing nature and distribution strategies are possible reasons behind the deteriorated performance of primitive ransomware detection techniques. The limitations of existing studies include ambiguity in selecting the features set. Increasing the feature set may lead to freedom of adept attackers against learning algorithms. In this work, we intend to propose a hybrid approach to identify and mitigate Android ransomware. This study employs a novel dominant feature selection algorithm to extract the dominant feature set. The experimental results show that our proposed model can differentiate between clean and ransomware with improved precision. Our proposed hybrid solution confirms an accuracy of 99.85% with zero false positives while considering 60 prominent features. Further, it also justifies the feature selection algorithm used. The comparison of the proposed method with the existing frameworks indicates its better performance.

Download Full-text

Hospital readmission prediction based on improved feature selection using grey relational analysis and LASSO

Grey Systems Theory and Application ◽

10.1108/gs-12-2020-0168 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Nor Hamizah Miswan ◽

Chee Seng Chan ◽

Chong Guan Ng

Keyword(s):

Feature Selection ◽

Grey Relational Analysis ◽

Hospital Readmission ◽

Grey System Theory ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Content Type ◽

Relational Analysis ◽

Grey Relational

PurposeThis paper develops a robust hospital readmission prediction framework by combining the feature selection algorithm and machine learning (ML) classifiers. The improved feature selection is proposed by considering the uncertainty in patient's attributes that leads to the output variable.Design/methodology/approachFirst, data preprocessing is conducted which includes how raw data is managed. Second, the impactful features are selected through feature selection process. It started with calculating the relational grade of each patient towards readmission using grey relational analysis (GRA) and the grade is used as the target values for feature selection. Then, the influenced features are selected using the Least Absolute Shrinkage and Selection Operator (LASSO) method. This proposed method is termed as Grey-LASSO feature selection. The final task is the readmission prediction using ML classifiers.FindingsThe proposed method offered good performances with a minimum feature subset up to 54–65% discarded features. Multi-Layer Perceptron with Grey-LASSO gave the best performance.Research limitations/implicationsThe performance of Grey-LASSO is justified in two readmission datasets. Further research is required to examine the generalisability to other datasets.Originality/valueIn designing the feature selection algorithm, the selection on influenced input variables was based on the integration of GRA and LASSO. Specifically, GRA is a part of the grey system theory, which was employed to analyse the relation between systems under uncertain conditions. The LASSO approach was adopted due to its ability for sparse data representation.

Download Full-text

MRF-RFS: A Modified Random Forest Recursive Feature Selection Algorithm for Nasopharyngeal Carcinoma Segmentation

Methods of Information in Medicine ◽

10.1055/s-0040-1721791 ◽

2020 ◽

Vol 59 (04/05) ◽

pp. 151-161

Author(s):

Yuchen Fei ◽

Fengyu Zhang ◽

Chen Zu ◽

Mei Hong ◽

Xingchen Peng ◽

...

Keyword(s):

Feature Selection ◽

Random Forest ◽

Nasopharyngeal Carcinoma ◽

Soft Tissues ◽

Feature Selection Method ◽

Selection Method ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Tumor Margins

Abstract Background An accurate and reproducible method to delineate tumor margins is of great importance in clinical diagnosis and treatment. In nasopharyngeal carcinoma (NPC), due to limitations such as high variability, low contrast, and discontinuous boundaries in presenting soft tissues, tumor margin can be extremely difficult to identify in magnetic resonance imaging (MRI), increasing the challenge of NPC segmentation task. Objectives The purpose of this work is to develop a semiautomatic algorithm for NPC image segmentation with minimal human intervention, while it is also capable of delineating tumor margins with high accuracy and reproducibility. Methods In this paper, we propose a novel feature selection algorithm for the identification of the margin of NPC image, named as modified random forest recursive feature selection (MRF-RFS). Specifically, to obtain a more discriminative feature subset for segmentation, a modified recursive feature selection method is applied to the original handcrafted feature set. Moreover, we combine the proposed feature selection method with the classical random forest (RF) in the training stage to take full advantage of its intrinsic property (i.e., feature importance measure). Results To evaluate the segmentation performance, we verify our method on the T1-weighted MRI images of 18 NPC patients. The experimental results demonstrate that the proposed MRF-RFS method outperforms the baseline methods and deep learning methods on the task of segmenting NPC images. Conclusion The proposed method could be effective in NPC diagnosis and useful for guiding radiation therapy.

Download Full-text