Feature Scaled Element Balancing with Random Boosting for Heart Disease Prediction using Machine Learning

In the current scenario, the researchers are focusing towards health care project for the prediction of the disease and its type. In addition to the prediction, there exists a need to find the influencing parameter that directly related to the disease prediction. The analysis of the parameters needed to the prediction of the disease still remains a challenging issue. With this view, we focus on predicting the heart disease by applying the dataset with boosting the parameters of the dataset. The heart disease data set extracted from UCI Machine Learning Repository is used for implementation. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data preprocessing is done and the attribute relationship is identified by the correlation values. Second, the data set is fitted to random boost regressor and the important features are identified. Third, the dataset is feature scaled reduced and then fitted to random forest classifier, decision tree classifier, Naïve bayes classifier, logistic regression classifier, kernel support vector machine and KNN classifier. Fourth, the dataset is reduced with principal component analysis with five components and then fitted to the above mentioned classifiers. Fifth, the performance of the classifiers is analyzed with the metrics like accuracy, recall, fscore and precision. Experimental results shows that, the Naïve bayes classifier is more effective with the precision, Recall and Fscore of 0.89 without random boost, 0.88 with random boosting and 0.90 with principal component analysis. Experimental results show, the Naïve bayes classifier is more effective with the accuracy of 89% without random boost, 90% with random boosting and 91% with principal component analysis.

Download Full-text

Diagnosa Kerusakan Bearing Menggunakan Principal Component Analysis (PCA) dan Naïve Bayes Classifier

JURNAL SISTEM INFORMASI BISNIS ◽

10.21456/vol6iss2pp114-123 ◽

2016 ◽

Vol 6 (2) ◽

pp. 114

Author(s):

Dwi Pudyastuti ◽

Toni Prahasto ◽

Achmad Widodo

Keyword(s):

Principal Component Analysis ◽

Fault Diagnosis ◽

Naive Bayes ◽

Principal Component ◽

Component Analysis ◽

Naïve Bayes ◽

Feature Reduction ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

This research is discussing about the usage of data mining which addressed for bearing fault diagnosis. Bearing was one of the essential parts in industry machinery. Bearing was used to reduce machines frictions or could be a moving component which oppressed each other. This fault diagnosis can avoid loss and damage of other machines components. This research was started with data preprocessing using wavelet discrete transformation, feature extraction, feature reduction using Principal Component Analysis (PCA), and classification process using Naïve Bayes classifier methods. Naïve Bayes Classifier is a classification method which based on probability and Bayesian theorem. Output of these method shows that Naïve Bayes classification have a good performance which shown by a good accuracy in each data test.

Download Full-text

Naive Bayes Classifier Based on Principal Component Analysis Applied to Clinical Diagnosis Decision of ECG Data

Software Engineering and Applications ◽

10.12677/sea.2021.105067 ◽

2021 ◽

Vol 10 (05) ◽

pp. 622-633

Author(s):

杰青闵

Keyword(s):

Principal Component Analysis ◽

Clinical Diagnosis ◽

Naive Bayes ◽

Principal Component ◽

Component Analysis ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Ecg Data

Download Full-text

Implementasi Text Mining Untuk Analisis Opini Publik Terhadap Calon Presiden

Jurnal Simantec ◽

10.21107/simantec.v7i1.6528 ◽

2020 ◽

Vol 7 (1) ◽

pp. 39-47

Author(s):

Trya Sovi Kartikasari ◽

Hendry Setiawan ◽

Paulus Lucky Tirma Irawan

Keyword(s):

Principal Component Analysis ◽

Text Mining ◽

Naive Bayes ◽

Principal Component ◽

Component Analysis ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Sistem presidensial merupakan salah satu bentuk demokrasi di Indonesia. Sistem tersebut menitikberatkan pada penyelenggaraan pemilihan umum presiden dan wakilnya yang dilakukan secara langsung oleh rakyat. Tingkat terpilihnya seorang presiden dapat dilihat dari opini publik yang beredar, salah satunya pada media sosial yang juga merupakan bagian dari kampanye. Dalam penelitian ini akan dianalisa opini yang berkaitan dengan elektabilitas calon presiden dari media sosial Twitter dari media sosial Twitter menggunakan metode Naïve Bayes Classifier (NBC) dan menentukan faktor-faktor yang terbentuk dari opini menggunakan Principal Component Analysis (PCA). Data opini dari media sosial Twitter didapatkan menggunakan kata kunci “Jokowi” dan “Prabowo”. Sebagian opini tersebut dipilih sebagai data latih untuk didapatkan kelas bersentimen negatif dan positif. Setelah proses pelatihan, dilakukan proses terhadap data uji dan data validasi. Hasil akurasi untuk data uji topik Jokowi pada tweet bersentimen positif mendapatkan akurasi sebesar 88.63% dan negatif sebesar 91.06%. Sementara untuk Prabowo bersentimen positif mendapatkan akurasi sebesar 88.58% dan negatif sebesar 80.37%. Rerata akurasi untuk keseluruhan topik adalah adalah 86.89%. Untuk mendapatkan faktor pada setiap sentimen, dilakukan proses perhitungan nilai PCA. Setiap sentimen tersebut kemudian dilakukan analisis faktor oleh pakar, yakni didapatkan 20 faktor yang sudah berhasil diinterpretasikan oleh pakar.

Download Full-text

Analisis Komparasi Metode Klasifikasi Data Mining dan Reduksi Atribut Pada Data Set Penyakit Jantung

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v4i2.2080 ◽

2020 ◽

Vol 4 (2) ◽

pp. 437 ◽

Cited By ~ 1

Author(s):

Dito Putro Utomo ◽

Mesran Mesran

Keyword(s):

Data Mining ◽

Heart Disease ◽

Naive Bayes ◽

Naïve Bayes ◽

Diagnostic Process ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Data Set ◽

Pca Method

Heart disease is a disease with a high mortality rate, there are 12 million deaths each year worldwide. This is what causes the need for early diagnosis to find out the heart disease. But the process of diagnosis is quite challenging because of the complex relationship between the attributes of heart disease. So it is important to know the main attributes that are used as a decision making process or the classification process in heart disease. In this study the dataset used has 57 types of attributes in it. So that reduction is needed to shorten the diagnostic process, the reduction process can be carried out using the Principal Component Analysis (PCA) method. The PCA method itself can be combined with data mining calcification techniques to measure the accuracy of the dataset. This study compares the accuracy rate using the C5.0 algorithm and the Naïve Bayes Classifier (NBC) algorithm, the results obtained both after and before the reduction are Naïve Bayes Classifier (NBC) algorithms that have better performance than the C5.0 algorithm

Download Full-text

Attribute Balanced Leveling with Ada Boost Regressor for Predicting Heart Disease using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e5816.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 2488-2493

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Principal Component ◽

Naïve Bayes ◽

Experimental Results ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Data Set ◽

Customer Churn

The technological advancement can help the entire application field to predict the damage and to forecast the future target of the object. The wealth of the world is in the health of the people. So the technology must support the technologists in predicting the disease in advance. The machine learning is the emerging field which is used to forecast the existence of the heart disease through the values of the clinical parameters. With this view, we focus on predicting the customer churn for the banking application. This paper uses the customer churn bank modeling data set extracted from UCI Machine Learning Repository. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data is processed to find the relationship between the elements of the dataset. Second, the data set is applied for Ada Boost regressors and the important elements are identified. Third, the dataset is applied to feature scaling and then fitted to kernel support vector machine, logistic regression classifier, Naive bayes classifier, random forest classifier, decision tree classifier and KNN classifier. Fourth, the dataset is dimensionality reduced with principal component analysis with five components and then applied to the previously mentioned classifiers. Fifth, the performance of the classifiers is analyzed with the indication metrics like precision, accuracy, recall and Fscore. The implementation is carried out with python code using Anaconda Navigator. Experimental results show that, the Naïve bayes classifier is more effective with the precision of 0.90 for dataset with random boost, feature scaled and PCA. Experimental results show that, the Naïve bayes classifier is more effective with the recall of 0.91 for dataset with random boost, feature scaled and PCA. Experimental results show that, the Naïve bayes classifier is more effective with the Fscore of 0.92 for dataset with random boost, feature scaled and PCA. Experimental results show, the Naïve bayes classifier is more effective with the accuracy of 91% without random boost, 93% with random boosting and 92% with principal component analysis.

Download Full-text

Naïve Bayes Classifier with Parallel Abduction Reasoning Ensemble Principal Component Analysis for Prediction Modeling

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2016.5617 ◽

2016 ◽

Vol 13 (10) ◽

pp. 6707-6710

Author(s):

J Suganthi ◽

V Malathi

Keyword(s):

Latent Variable ◽

Naive Bayes ◽

Principal Component ◽

Naïve Bayes ◽

Abductive Reasoning ◽

Experimental Result ◽

Cancer Information ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

The classification could be a latent variable that is probabilistically relating to the discovered variables. In Bayesian algorithmic ways, logical thinking works in probabilistic mode. However PCM based parallel abductive reasoning with Naïve Bayes (NB) on cancer information could be a powerful technique to perform effective prediction in classification. Whereas whilst classifying the cancer information the strategy reads the parallel changes and predicts the severity level for supplementary treatments. Since the Bayesian classifier gives many premises for several supervised learning algorithms thereby the proposed Parallel abductive Naïve Bayes Classifier algorithm based on factor analysis of PCA enhances the granularity of prediction. The Principal components are chosen on multi-perspective domain of curator analysis dataset. Experimental result shows that it is potential to get parallel abductive classifiers that have comparatively high impact on prediction.

Download Full-text

PERFORMANCE ANALYSIS OF CLASSIFICATION ALGORITHM ON DIABETES HEALTHCARE DATASET

International Journal of Research -GRANTHAALAYAH ◽

10.29121/granthaalayah.v5.i8.2017.2229 ◽

2017 ◽

Vol 5 (8) ◽

pp. 260-266

Author(s):

Subhankar Manna ◽

Malathi G.

Keyword(s):

Decision Tree ◽

Metabolic Diseases ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Algorithm ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Data Set ◽

Id3 Algorithm

Healthcare industry collects huge amount of unclassified data every day. For an effective diagnosis and decision making, we need to discover hidden data patterns. An instance of such dataset is associated with a group of metabolic diseases that vary greatly in their range of attributes. The objective of this paper is to classify the diabetic dataset using classification techniques like Naive Bayes, ID3 and k means classification. The secondary objective is to study the performance of various classification algorithms used in this work. We propose to implement the classification algorithm using R package. This work used the dataset that is imported from the UCI Machine Learning Repository, Diabetes 130-US hospitals for years 1999-2008 Data Set. Motivation/Background: Naïve Bayes is a probabilistic classifier based on Bayes theorem. It provides useful perception for understanding many algorithms. In this paper when Bayesian algorithm applied on diabetes dataset, it shows high accuracy. Is assumes variables are independent of each other. In this paper, we construct a decision tree from diabetes dataset in which it selects attributes at each other node of the tree like graph and model, each branch represents an outcome of the test, and each node hold a class attribute. This technique separates observation into branches to construct tree. In this technique tree is split in a recursive way called recursive partitioning. Decision tree is widely used in various areas because it is good enough for dataset distribution. For example, by using ID3 (Decision tree) algorithm we get a result like they are belong to diabetes or not. Method: We will use Naïve Bayes for probabilistic classification and ID3 for decision tree. Results: The dataset is related to Diabetes dataset. There are 18 columns like – Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. and 623 rows. Naive Bayes Classifier algorithm will be used for getting the probability of having diabetes or not. Here Diabetes is the class for Diabetes data set. There are two conditions “Yes” and “No” and have some personal information about the patient like - Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. We will see the probability that for “Yes” what unit of probability and for “No” what unit of probability which is given bellow. For Example: Gender – Female have 0.4964 for “No” and 0.5581 for “Yes” and for Male 0.5035 is for “No” and 0.4418 for “Yes”. Conclusions: In this paper two algorithms had been implemented Naive Bayes Classifier algorithm and ID3 algorithm. From Naive Bayes Classifier algorithm, the probability of having diabetes has been predicted and from ID3 algorithm a decision tree has been generated.

Download Full-text

A comparison of three discrete methods for classification of heart disease data

Bangladesh Journal of Scientific and Industrial Research ◽

10.3329/bjsir.v50i4.25839 ◽

2015 ◽

Vol 50 (4) ◽

pp. 293-296 ◽

Cited By ~ 4

Author(s):

D Chaki ◽

A Das ◽

MI Zaber

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Support Vector ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

The classification of heart disease patients is of great importance in cardiovascular disease diagnosis. Numerous data mining techniques have been used so far by the researchers to aid health care professionals in the diagnosis of heart disease. For this task, many algorithms have been proposed in the previous few years. In this paper, we have studied different supervised machine learning techniques for classification of heart disease data and have performed a procedural comparison of these. We have used the C4.5 decision tree classifier, a naïve Bayes classifier, and a Support Vector Machine (SVM) classifier over a large set of heart disease data. The data used in this study is the Cleveland Clinic Foundation Heart Disease Data Set available at UCI Machine Learning Repository. We have found that SVM outperformed both naïve Bayes and C4.5 classifier, giving the best accuracy rate of correctly classifying highest number of instances. We have also found naïve Bayes classifier achieved a competitive performance though the assumption of normality of the data is strongly violated.Bangladesh J. Sci. Ind. Res. 50(4), 293-296, 2015

Download Full-text

METODE ALGORITMA NAIVE BAYES CLASSIFIER DALAM MEMPREDIKSI JADWAL BERLAYAR ANGKUTAN LAUT (FERY) BULUKUMBA KEPULAUAN SELAYAR

ILTEK : Jurnal Teknologi ◽

10.47398/iltek.v13i01.114 ◽

2019 ◽

Vol 13 (01) ◽

pp. 1886-1891

Author(s):

Rizal Syarifuddin ◽

Rosmiati Rosmiati

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Data Set

Kecelakaan laut yang mengakibatkan musibah tenggelamnya kapal laut angkutan barang dan orang diakibatkan salah satunya adalah faktor cuaca. Akses akan informasi perkiraan cuaca menjadi penting sebelum kapten kapal laut memutuskan untuk melakukan pelayaran. Oleh karena itu, penelitian ini bertujuan melakukan penghitungan menggunakan algoritma naïve bayes dalam membantu kapten kapal mengambil keputusan untuk berlayar atau tidak. Penelitian ini dilakukan pada kapal roro penyeberangan laut dari pelabuhan bira Kabupaten Bulukumba ke Pelabuhan Benteng Kepulauan Selayar. Kriteria atau atribut yang digunakan untuk mengklasifikasi diperoleh dari data badan meterologi dan geofisika terkait parameter cuaca seperti angina didaratan dan buih gelombang laut sebagai atribut. Hasil pengujian penghitungan menunjukkan bahwa data set tersebut dapat diimplementasikan pada penghitungan algorithma naïve bayes untuk dipakai mengambil keputusan untuk melakukan pelayaran.

Download Full-text

Deteksi Konten Gereflekter pada Cerita Anak Menggunakan Naïve Bayes Classifier

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v4i2.2015 ◽

2020 ◽

Vol 4 (2) ◽

pp. 318

Author(s):

Mayya Tania Wewengkang ◽

Dana Sulistiyo Kusumo ◽

Widi Astuti

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

School Level ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Test Scenario ◽

Data Set ◽

Positive Class ◽

Negative Class

Textbooks and storybooks are the ones used as a source of knowledge. When children read a book, they will try to interpret each word and sentence in it. However, it will be a problem if the book contains vulgar words and indecent sentences. For children at the elementary school level, it is not allowed. For this research, we called that content as gereflekter content. Based on these problems, this research was conducted by building a system to detect gereflekter content in the text of the child's stories that were used as a data set. A system is built by using Naïve Bayes Classifier (NBC) and then evaluated in two scenarios using accuracy, precision, and recall metrics because the characteristics of the data set are imbalanced with the amount of data in the negative class are greater than the data in the positive class. From evaluation results, test scenario produced a high average precision of 99.01%, whereas the recall value has an average of above 50%. From these two values, it can be concluded that the model built by the system has not detected the class properly, but highly trusted when it does.

Download Full-text