scholarly journals Eta Correlation Coefficient Based Feature Selection Algorithm for Machine Learning: E-Score Feature Selection Algorithm

2019 ◽  
Vol 2 (1) ◽  
pp. 7-12
Author(s):  
Muhammed Kürşad UÇAR
2019 ◽  
Vol 2019 ◽  
pp. 1-19 ◽  
Author(s):  
Muhammad Hammad Memon ◽  
Jian Ping Li ◽  
Amin Ul Haq ◽  
Muhammad Hunain Memon ◽  
Wang Zhou

The accurate and efficient diagnosis of breast cancer is extremely necessary for recovery and treatment in early stages in IoT healthcare environment. Internet of Things has witnessed the transition in life for the last few years which provides a way to analyze both the real-time data and past data by the emerging role of artificial intelligence and data mining techniques. The current state-of-the-art method does not effectively diagnose the breast cancer in the early stages, and most of the ladies suffered from this dangerous disease. Thus, the early detection of breast cancer significantly poses a great challenge for medical experts and researchers. To solve the problem of early-stage detection of breast cancer, we proposed machine learning-based diagnostic system which effectively classifies the malignant and benign people in the environment of IoT. In the development of our proposed system, a machine learning classifier support vector machine is used to classify the malignant and benign people. To improve the classification performance of the classification system, we used a recursive feature selection algorithm to select more suitable features from the breast cancer dataset. The training/testing splits method is applied for training and testing of the classifier for the best predictive model. Additionally, the classifier performance has been checked on by using performance evaluation metrics such as classification, specificity, sensitivity, Matthews’s correlation coefficient, F1-score, and execution time. To test the proposed method, the dataset “Wisconsin Diagnostic Breast Cancer” has been used in this research study. The experimental results demonstrate that the recursive feature selection algorithm selects the best subset of features, and the classifier SVM achieved optimal classification performance on this best subset of features. The SVM kernel linear achieved high classification accuracy (99%), specificity (99%), and sensitivity (98%), and the Matthews’s correlation coefficient is 99%. From these experimental results, we concluded that the proposed system performance is excellent due to the selection of more appropriate features that are selected by the recursive feature selection algorithm. Furthermore, we suggest this proposed system for effective and efficient early stages diagnosis of breast cancer. Thus, through this system, the recovery and treatment will be more effective for breast cancer. Lastly, the implementation of the proposed system is very reliable in all aspects of IoT healthcare for breast cancer.


Electronics ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 144 ◽  
Author(s):  
Yan Naung Soe ◽  
Yaokai Feng ◽  
Paulus Insap Santosa ◽  
Rudy Hartanto ◽  
Kouichi Sakurai

The application of a large number of Internet of Things (IoT) devices makes our life more convenient and industries more efficient. However, it also makes cyber-attacks much easier to occur because so many IoT devices are deployed and most of them do not have enough resources (i.e., computation and storage capacity) to carry out ordinary intrusion detection systems (IDSs). In this study, a lightweight machine learning-based IDS using a new feature selection algorithm is designed and implemented on Raspberry Pi, and its performance is verified using a public dataset collected from an IoT environment. To make the system lightweight, we propose a new algorithm for feature selection, called the correlated-set thresholding on gain-ratio (CST-GR) algorithm, to select really necessary features. Because the feature selection is conducted on three specific kinds of cyber-attacks, the number of selected features can be significantly reduced, which makes the classifiers very small and fast. Thus, our detection system is lightweight enough to be implemented and carried out in a Raspberry Pi system. More importantly, as the really necessary features corresponding to each kind of attack are exploited, good detection performance can be expected. The performance of our proposal is examined in detail with different machine learning algorithms, in order to learn which of them is the best option for our system. The experiment results indicate that the new feature selection algorithm can select only very few features for each kind of attack. Thus, the detection system is lightweight enough to be implemented in the Raspberry Pi environment with almost no sacrifice on detection performance.


2013 ◽  
Vol 22 (04) ◽  
pp. 1350027
Author(s):  
JAGANATHAN PALANICHAMY ◽  
KUPPUCHAMY RAMASAMY

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.


Author(s):  
J. V. D. Prasad ◽  
A. Raghuvira Pratap ◽  
Babu Sallagundla

With the rapid increase in number of clinical data and hence the prediction and analysing data becomes very difficult. With the help of various machine learning models, it becomes easy to work on these huge data. A machine learning model faces lots of challenges; one among the challenge is feature selection. In this research work, we propose a novel feature selection method based on statistical procedures to increase the performance of the machine learning model. Furthermore, we have tested the feature selection algorithm in liver disease classification dataset and the results obtained shows the efficiency of the proposed method.


Diabetes has become a serious problem now a day. So there is a need to take serious precautions to eradicate this. To eradicate, we should know the level of occurrence. In this project we predict the level of occurrence of diabetes. We predict the level of occurrence of diabetes using Random Forest, a Machine Learning Algorithm. Using the patient’s Electronic Health Records (EHR) we can build accurate models that predict the presence of diabetes.


2021 ◽  
Vol 2021 ◽  
pp. 1-22
Author(s):  
Tanya Gera ◽  
Jaiteg Singh ◽  
Abolfazl Mehbodniya ◽  
Julian L. Webber ◽  
Mohammad Shabaz ◽  
...  

Ransomware is a special malware designed to extort money in return for unlocking the device and personal data files. Smartphone users store their personal as well as official data on these devices. Ransomware attackers found it bewitching for their financial benefits. The financial losses due to ransomware attacks are increasing rapidly. Recent studies witness that out of 87% reported cyber-attacks, 41% are due to ransomware attacks. The inability of application-signature-based solutions to detect unknown malware has inspired many researchers to build automated classification models using machine learning algorithms. Advanced malware is capable of delaying malicious actions on sensing the emulated environment and hence posing a challenge to dynamic monitoring of applications also. Existing hybrid approaches utilize a variety of features combination for detection and analysis. The rapidly changing nature and distribution strategies are possible reasons behind the deteriorated performance of primitive ransomware detection techniques. The limitations of existing studies include ambiguity in selecting the features set. Increasing the feature set may lead to freedom of adept attackers against learning algorithms. In this work, we intend to propose a hybrid approach to identify and mitigate Android ransomware. This study employs a novel dominant feature selection algorithm to extract the dominant feature set. The experimental results show that our proposed model can differentiate between clean and ransomware with improved precision. Our proposed hybrid solution confirms an accuracy of 99.85% with zero false positives while considering 60 prominent features. Further, it also justifies the feature selection algorithm used. The comparison of the proposed method with the existing frameworks indicates its better performance.


Data Scientists focus on high dimensional data to predict and reveal some interesting patterns as well as most useful information to the modern world. Feature Selection is a preprocessing technique which improves the accuracy and efficiency of mining algorithms. There exist a numerous feature selection algorithms. Most of the algorithms failed to give better mining results as the scale increases. In this paper, feature selection for supervised algorithms in data mining are considered and given an overview of existing machine learning algorithm for supervised feature selection. This paper introduces an enhanced supervised feature selection algorithm which selects the best feature subset by eliminating irrelevant features using distance correlation and redundant features using symmetric uncertainty. The experimental results show that the proposed algorithm provides better classification accuracy and selects minimum number of features.


Sign in / Sign up

Export Citation Format

Share Document