scholarly journals A Comprehensive Performance Analysis of Various Classifier Models for Coronary Artery Disease Prediction

Cardio Vascular Diseases (CVD) is the major reason for the death of the majority of the people in the world. Earlier diagnosis of disease will reduce the mortality rate. Machine learning (ML) algorithms are giving promising results in the disease diagnosis and it is now widely accepted by medical experts as their clinical decision support system. In this work, the most popular ML models are investigated and compared with one other for heart disease prediction based on various metrics. The base classifiers such as Support Vector Machine (SVM), Logistic regression, Naïve Bayes, Decision Tree, K Nearest Neighbour are used for predicting heart disease. In this paper, bagging and boosting techniques are applied over these individual classifiers to improve the performance of the system. With the Cleveland and Statlog datasets, Naive Bayes as the individual classifier gives the maximum accuracy of 85.13%and 84.81% respectively. Bagging technique improves the accuracy of the decision tree which is identified as a weak classifier by 7% and it is a significant improvement in identifying CVD.

Author(s):  
Baranidharan Balakrishnan ◽  
Vinoth Kumar C. N. S.

Cardio Vascular Diseases (CVD) is the major reason for the death of the majority of the people in the world. Earlier diagnosis of disease will reduce the mortality rate. Machine learning (ML) algorithms are giving promising results in the disease diagnosis and it is now widely accepted by medical experts as their clinical decision support system. In this work, the most popular ML models are investigated and compared with one other for heart disease prediction based on various metrics. The base classifiers such as Support Vector Machine (SVM), Logistic regression, Naïve Bayes, Decision Tree, K Nearest Neighbour are used for predicting heart disease. In this paper, bagging and boosting techniques are applied over these individual classifiers to improve the performance of the system. With the Cleveland and Statlog datasets, Naive Bayes as the individual classifier gives the maximum accuracy of 85.13%and 84.81% respectively. Bagging technique improves the accuracy of the decision tree which is identified as a weak classifier by 7% and it is a significant improvement in identifying CVD.


Currently, data mining is playing a significant role in the healthcare system. It helps to extract the hidden pattern from the clinical dataset for further analysis. Also, it can be used to build a tool to manage the medical management system. Among the life-threatening diseases, diabetes mellitus is treated as a serious disease worldwide. Due to its mortality rate, early prediction and diagnosis are very important. Several research works are going on the mentioned issues to reduce the complications caused by diabetes as well as the mortality rate. The medical science needs to analyze an enormous quantity of clinical data for diagnosis purposes using machine learning techniques. In recent approaches, the disease datasets may contain insignificant and digressive features causing less accurate results. The aim of this paper is to analyze the existing prediction systems and hence develop a hybrid disease prediction model using the Genetic Algorithm for Naïve Bayes, Decision Tree and Support Vector Machine classifiers for better accuracy. This proposed diabetes prediction model produces the accuracies of 0.8182, 0.8052, and 0.8312 when Naïve Bayes, Decision Tree, and Support Vector Machine classifiers are used respectively. From the experimental results, it can be demonstrated that for all cases Support Vector Machine provides higher accuracy comparing to the other classifiers. In the analysis, the Pima Indian diabetes dataset is used to construct the proposed model.


2021 ◽  
Vol 10 (1) ◽  
pp. 46
Author(s):  
Maria Yousef ◽  
Prof. Khaled Batiha

These days, heart disease comes to be one of the major health problems which have affected the lives of people in the whole world. Moreover, death due to heart disease is increasing day by day. So the heart disease prediction systems play an important role in the prevention of heart problems. Where these prediction systems assist doctors in making the right decision to diagnose heart disease easily. The existing prediction systems suffering from the high dimensionality problem of selected features that increase the prediction time and decrease the performance accuracy of the prediction due to many redundant or irrelevant features. Therefore, this paper aims to provide a solution of the dimensionality problem by proposing a new mixed model for heart disease prediction based on (Naïve Bayes method, and machine learning classifiers).In this study, we proposed a new heart disease prediction model (NB-SKDR) based on the Naïve Bayes algorithm (NB) and several machine learning techniques including Support Vector Machine, K-Nearest Neighbors, Decision Tree, and Random Forest. This prediction model consists of three main phases which include: preprocessing, feature selection, and classification. The main objective of this proposed model is to improve the performance of the prediction system and finding the best subset of features. This proposed approach uses the Naïve Bayes technique based on the Bayes theorem to select the best subset of features for the next classification phase, also to handle the high dimensionality problem by avoiding unnecessary features and select only the important ones in an attempt to improve the efficiency and accuracy of classifiers. This method is able to reduce the number of features from 13 to 6 which are (age, gender, blood pressure, fasting blood sugar, cholesterol, exercise induce engine) by determining the dependency between a set of attributes. The dependent attributes are the attributes in which an attribute depends on the other attribute in deciding the value of the class attribute. The dependency between attributes is measured by the conditional probability, which can be easily computed by Bayes theorem. Moreover, in the classification phase, the proposed system uses different classification algorithms such as (DT Decision Tree, RF Random Forest, SVM Support Vector machine, KNN Nearest Neighbors) as a classifiers for predicting whether a patient has heart disease or not. The model is trained and evaluated using the Cleveland Heart Disease database, which contains 13 features and 303 samples.Different algorithms use different rules for producing different representations of knowledge. So, the selection of algorithms to build our model is based on their performance. In this work, we applied and compared several classification algorithms which are (DT, SVM, RF, and KNN) to identify the best-suited algorithm to achieve high accuracy in the prediction of heart disease. After combining the Naive Bayes method with each one of these previous classifiers the performance of these combines algorithms is evaluated by different performance metrics such as (Specificity, Sensitivity, and Accuracy). Where the experimental results show that out of these four classification models, the combination between the Naive Bayes feature selection approach and the SVM RBF classifier can predict heart disease with the highest accuracy of 98%. Finally, the proposed approach is compared with another two systems which developed based on two different approaches in the feature selection step. The first system, based on the Genetic Algorithm (GA) technique, and the second uses the Principal Component Analysis (PCA) technique. Consequently, the comparison proved that the Naive Bayes selection approach of the proposed system is better than the GA and PCA approach in terms of prediction accuracy.   


2019 ◽  
Vol 15 (2) ◽  
pp. 275-280
Author(s):  
Agus Setiyono ◽  
Hilman F Pardede

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam.  One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.


2020 ◽  
Vol 16 (2) ◽  
pp. 75
Author(s):  
Didit Widiyanto

Akurasi sebuah klasifikasi citra ditentukan oleh pengklasifikasi.  Meskipun RoI (Region of Interest) tidak menentukan secara langsung akurasi, namun RoI menentukan lingkup klasifikasi citra.   Terdapat tiga algoritma yang dapat digunakan sebagai algoritma RoI yaitu; Balanced Histogram Thresholding (BHT), algoritma Otsu, dan algoritma klasterisasi K-Means.  Paper ini meninjau algoritma Otsu dan algoritma klasterisasi K-Means yang digunakan oleh lima peneliti.  Dari ke lima peneliti; tiga peneliti menerapkan algoritma Otsu dan dua peneliti menerapkan algoritma K-Means sebagai algoritma RoI. Setelah operasi RoI, ke lima peneliti menerapkan algoritma GLCM (Gray Level Co-occurance Matrix) sebagai pengekstraksi ciri tekstur.  Hasil ekstraksi ciri diklasifikasi dengan menggunakan berbagai pengklasifikasi antara lain SVM (Support Vector Machine), Naive Bayes, dan Decision Tree. Akhirnya dengan membandingkan hasil dari ke lima peneliti, akurasi tertinggi diperoleh sebesar 100% dengan pengklasifikasi SVM menggunakan algoritma Otsu sebagai algoritma RoI, dan akurasi terendah adalah sebesar52% yang menggunakan algoritma Otsu pada kanal S dari citra HSV (Hue, Saturation Value).


Information ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 383
Author(s):  
Francis Effirim Botchey ◽  
Zhen Qin ◽  
Kwesi Hughes-Lartey

The onset of COVID-19 has re-emphasized the importance of FinTech especially in developing countries as the major powers of the world are already enjoying the advantages that come with the adoption of FinTech. Handling of physical cash has been established as a means of transmitting the novel corona virus. Again, research has established that, been unbanked raises the potential of sinking one into abject poverty. Over the years, developing countries have been piloting the various forms of FinTech, but the very one that has come to stay is the Mobile Money Transactions (MMT). As mobile money transactions attempt to gain a foothold, it faces several problems, the most important of them is mobile money fraud. This paper seeks to provide a solution to this problem by looking at machine learning algorithms based on support vector machines (kernel-based), gradient boosted decision tree (tree-based) and Naïve Bayes (probabilistic based) algorithms, taking into consideration the imbalanced nature of the dataset. Our experiments showed that the use of gradient boosted decision tree holds a great potential in combating the problem of mobile money fraud as it was able to produce near perfect results.


Diabetes is a most common disease that occurs to most of the humans now a day. The predictions for this disease are proposed through machine learning techniques. Through this method the risk factors of this disease are identified and can be prevented from increasing. Early prediction in such disease can be controlled and save human’s life. For the early predictions of this disease we collect data set having 8 attributes diabetic of 200 patients. The patients’ sugar level in the body is tested by the features of patient’s glucose content in the body and according to the age. The main Machine learning algorithms are Support vector machine (SVM), naive bayes (NB), K nearest neighbor (KNN) and Decision Tree (DT). In the exiting the Naive Bayes the accuracy levels are 66% but in the Decision tree the accuracy levels are 70 to 71%. The accuracy levels of the patients are not proper in range. But in XG boost classifiers even after the Naïve Bayes 74 Percentage and in Decision tree the accuracy levels are 89 to 90%. In the proposed system the accuracy ranges are shown properly and this is only used mostly. A dataset of 729 patients can be stored in Mongo DB and in that 129 patients repots are taken for the prediction purpose and the remaining are used for training. The training datasets are used for the prediction purposes.


2017 ◽  
Vol 3 (1) ◽  
pp. 1-6
Author(s):  
Ahmad Ilham

Masalah data kelas tidak seimbang memiliki efek buruk pada ketepatan prediksi data. Untuk menangani masalah ini, telah banyak penelitian sebelumnya menggunakan algoritma klasifikasi menangani masalah data kelas tidak seimbang. Pada penelitian ini akan menyajikan teknik under-sampling dan over-sampling untuk menangani data kelas tidak seimbang. Teknik ini akan digunakan pada tingkat preprocessing untuk menyeimbangkan kondisi kelas pada data. Hasil eksperimen menunjukkan neural network (NN) lebih unggul dari decision tree (DT), linear regression (LR), naïve bayes (NB) dan support vector machine (SVM).


Machine learning is one of the fast growing aspect in current world. Machine learning (ML) and Artificial Neural Network (ANN) are helpful in detection and diagnosis of various heart diseases. Naïve Bayes Classification is a vital approach of classification in machine learning. The heart disease consists of set of range disorders affecting the heart. It includes blood vessel problems such as irregular heart beat issues, weak heart muscles, congenital heart defects, cardio vascular disease and coronary artery disease. Coronary heart disorder is a familiar type of heart disease. It reduces the blood flow to the heart leading to a heart attack. In this paper the UCI machine learning repository data set consisting of patients suffering from heart disease is analyzed using Naïve Bayes classification and support vector machines. The classification accuracy of the patients suffering from heart disease is predicted using Naïve Bayes classification and support vector machines. Implementation is done using R language.


Sign in / Sign up

Export Citation Format

Share Document