Prediction of Heart Disease Using Random Forest and Rough Set Based Feature Selection

Author(s):  
Indu Yekkala ◽  
Sunanda Dixit

Data is generated by the medical industry. Often this data is of very complex nature—electronic records, handwritten scripts, etc.—since it is generated from multiple sources. Due to the Complexity and sheer volume of this data necessitates techniques that can extract insight from this data in a quick and efficient way. These insights not only diagnose the diseases but also predict and can prevent disease. One such use of these techniques is cardiovascular diseases. Heart disease or coronary artery disease (CAD) is one of the major causes of death all over the world. Comprehensive research using single data mining techniques have not resulted in an acceptable accuracy. Further research is being carried out on the effectiveness of hybridizing more than one technique for increasing accuracy in the diagnosis of heart disease. In this article, the authors worked on heart stalog dataset collected from the UCI repository, used the Random Forest algorithm and Feature Selection using rough sets to accurately predict the occurrence of heart disease

Author(s):  
Indu Yekkala ◽  
Sunanda Dixit

Data is generated by the medical industry. Often this data is of very complex nature—electronic records, handwritten scripts, etc.—since it is generated from multiple sources. Due to the Complexity and sheer volume of this data necessitates techniques that can extract insight from this data in a quick and efficient way. These insights not only diagnose the diseases but also predict and can prevent disease. One such use of these techniques is cardiovascular diseases. Heart disease or coronary artery disease (CAD) is one of the major causes of death all over the world. Comprehensive research using single data mining techniques have not resulted in an acceptable accuracy. Further research is being carried out on the effectiveness of hybridizing more than one technique for increasing accuracy in the diagnosis of heart disease. In this article, the authors worked on heart stalog dataset collected from the UCI repository, used the Random Forest algorithm and Feature Selection using rough sets to accurately predict the occurrence of heart disease


2021 ◽  
Vol 5 (1) ◽  
pp. 61-69
Author(s):  
Ievgen Nastenko ◽  
Vitaliy Maksymenko ◽  
Sergiy Potashev ◽  
Volodymyr Pavlov ◽  
Vitalii Babenko ◽  
...  

Background. Recent studies show that cardiovascular diseases, including coronary heart disease, are the leading causes of death and one of the main factors of disability worldwide. The detection of cases of this type of disease over the past 30 years has increased from 271 million to 523 million and the number of deaths – from 12.1 million to 18.6 million. Cardiovascular diseases are the main cause of death among the population of Ukraine and, according to this indicator, the country remains one of the world leaders. Coronary heart disease is the leading factor in the loss of health in Ukraine and modern diagnostic methods, including machine learning algorithms, are increasingly being used for timely detection. Objective. According to the data of speckle-tracking echocardiography using the random forest method, construct classification algorithms for diagnosing violations of the kinematics of left ventricular contractions in patients with coronary heart disease at rest, and when using an echostress test with a dobutamine test. Methods. Speckle-tracking echocardiography was used to examine 40 patients with coronary heart disease and 16 in whom no cardiac pathology was found. Echocardiography was recorded in B mode in three positions: along the long axis, in 4-chamber, and 2-chamber positions. In total, 6245 frames of the video stream were used: 1871 – without cardiac abnormalities, and 4374 – in the presence of pathology during the examination. 56 patients (2509 frames of video data) were examined without the use of a dobutamine test and 38 patients (3736 frames of video data) – using an echostress test with a dobutamine test if no disturbances were found at rest. Dobutamine doses of 10, 20, and 40 mcg were administered under the supervision of an anesthesiologist. The data of texture analysis of images were used as informative features. To build an algorithm for detecting coronary heart disease the random forest algorithm was applied. Results. At the first stage of the study, the diagnostic algorithms norma–pathology for the state of rest and dobutamine doses of 10, 20, and 40 mcg were constructed. Before applying the algorithm the samples were randomly divided into training (70%) and test (30%). The classifiers were evaluated for accuracy, sensitivity, and specificity. According to the test samples, the accuracy of diagnostic conclusions varied from 97 to 99%. At the second stage of the study, to increase the versatility of the models, the classifier was built for all images, without dividing them into dobutamine doses. The accuracy for the test samples also ranged from 96.6 to 97.8%. To construct diagnostic algorithms by the random forest method the data of texture analysis of images were used. Conclusions. High-precision classification models were obtained using the random forest algorithm. The developed models can be applied to the analysis of echocardiograms obtained in B mode on equipment that is not equipped with the speckle tracking technology.


Nowadays, heart disease is the main cause of several deaths among all other diseases. Due to the lack of resources in the medical field, the prediction of heart diseases becomes a major problem. For early diagnosis and treatment, some classification algorithms such as Decision Tree and Random Forest Algorithm are used. The data mining techniques compare the accuracy of the algorithm and predict heart diseases. The main aim of this paper is to predict heart disease based on the dataset values. In this paper we are comparing the accuracy of above two algorithms. To implement these methods the following steps are used. In first phase, a dataset of 13 attributes is collected and it was applied on classification techniques using the Decision tree and Random Forest Algorithms. Finally, the accuracy is collected for both the algorithms. In this paper we observed that random forest is generating better results than decision tree in prediction of heart diseases.


2011 ◽  
pp. 70-107 ◽  
Author(s):  
Richard Jensen

Feature selection aims to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. Rough set theory (RST) has been used as such a tool with much success. RST enables the discovery of data dependencies and the reduction of the number of attributes contained in a dataset using the data alone, requiring no additional information. This chapter describes the fundamental ideas behind RST-based approaches and reviews related feature selection methods that build on these ideas. Extensions to the traditional rough set approach are discussed, including recent selection methods based on tolerance rough sets, variable precision rough sets and fuzzy-rough sets. Alternative search mechanisms are also highly important in rough set feature selection. The chapter includes the latest developments in this area, including RST strategies based on hill-climbing, genetic algorithms and ant colony optimization.


Author(s):  
Mohammad Almseidin ◽  
AlMaha Abu Zuraiq ◽  
Mouhammd Al-kasassbeh ◽  
Nidal Alnidami

With increasing technology developments, the Internet has become everywhere and accessible by everyone. There are a considerable number of web-pages with different benefits. Despite this enormous number, not all of these sites are legitimate. There are so-called phishing sites that deceive users into serving their interests. This paper dealt with this problem using machine learning algorithms in addition to employing a novel dataset that related to phishing detection, which contains 5000 legitimate web-pages and 5000 phishing ones. In order to obtain the best results, various machine learning algorithms were tested. Then J48, Random forest, and Multilayer perceptron were chosen. Different feature selection tools were employed to the dataset in order to improve the efficiency of the models. The best result of the experiment achieved by utilizing 20 features out of 48 features and applying it to Random forest algorithm. The accuracy was 98.11%.


Sign in / Sign up

Export Citation Format

Share Document