scholarly journals Feature Scaled Element Balancing with Random Boosting for Heart Disease Prediction using Machine Learning

2020 ◽  
Vol 8 (5) ◽  
pp. 4105-4110

In the current scenario, the researchers are focusing towards health care project for the prediction of the disease and its type. In addition to the prediction, there exists a need to find the influencing parameter that directly related to the disease prediction. The analysis of the parameters needed to the prediction of the disease still remains a challenging issue. With this view, we focus on predicting the heart disease by applying the dataset with boosting the parameters of the dataset. The heart disease data set extracted from UCI Machine Learning Repository is used for implementation. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data preprocessing is done and the attribute relationship is identified by the correlation values. Second, the data set is fitted to random boost regressor and the important features are identified. Third, the dataset is feature scaled reduced and then fitted to random forest classifier, decision tree classifier, Naïve bayes classifier, logistic regression classifier, kernel support vector machine and KNN classifier. Fourth, the dataset is reduced with principal component analysis with five components and then fitted to the above mentioned classifiers. Fifth, the performance of the classifiers is analyzed with the metrics like accuracy, recall, fscore and precision. Experimental results shows that, the Naïve bayes classifier is more effective with the precision, Recall and Fscore of 0.89 without random boost, 0.88 with random boosting and 0.90 with principal component analysis. Experimental results show, the Naïve bayes classifier is more effective with the accuracy of 89% without random boost, 90% with random boosting and 91% with principal component analysis.

2016 ◽  
Vol 6 (2) ◽  
pp. 114
Author(s):  
Dwi Pudyastuti ◽  
Toni Prahasto ◽  
Achmad Widodo

This research is discussing about the usage of data mining which addressed for bearing fault diagnosis. Bearing was one of the essential parts in industry machinery. Bearing was used to reduce machines frictions or could be a moving component which oppressed each other.  This fault diagnosis can avoid loss and damage of other machines components. This research was started with data preprocessing using wavelet discrete transformation, feature extraction, feature reduction using Principal Component Analysis (PCA), and classification process using Naïve Bayes classifier methods. Naïve Bayes Classifier is a classification method which based on probability and Bayesian theorem. Output of these method shows that Naïve Bayes classification have a good performance which shown by a good accuracy in each data test.


2020 ◽  
Vol 7 (1) ◽  
pp. 39-47
Author(s):  
Trya Sovi Kartikasari ◽  
Hendry Setiawan ◽  
Paulus Lucky Tirma Irawan

Sistem presidensial merupakan salah satu bentuk demokrasi di Indonesia. Sistem tersebut menitikberatkan pada penyelenggaraan pemilihan umum presiden dan wakilnya yang dilakukan secara langsung oleh rakyat. Tingkat terpilihnya seorang presiden dapat dilihat dari opini publik yang beredar, salah satunya pada media sosial yang juga merupakan bagian dari  kampanye. Dalam penelitian ini akan dianalisa opini yang berkaitan dengan elektabilitas calon presiden dari media sosial Twitter dari media sosial Twitter menggunakan metode Naïve Bayes Classifier (NBC) dan menentukan faktor-faktor yang terbentuk dari opini menggunakan Principal Component Analysis (PCA). Data opini dari media sosial Twitter didapatkan menggunakan kata kunci “Jokowi” dan “Prabowo”. Sebagian opini tersebut dipilih sebagai data latih untuk  didapatkan kelas bersentimen negatif dan positif. Setelah proses pelatihan, dilakukan proses terhadap data uji dan data validasi. Hasil akurasi untuk data uji topik Jokowi pada tweet bersentimen positif mendapatkan akurasi sebesar 88.63% dan negatif sebesar 91.06%. Sementara untuk Prabowo bersentimen positif mendapatkan akurasi sebesar 88.58% dan negatif sebesar 80.37%. Rerata akurasi untuk keseluruhan topik adalah adalah 86.89%. Untuk mendapatkan faktor pada setiap sentimen, dilakukan proses perhitungan nilai PCA. Setiap sentimen tersebut kemudian dilakukan analisis faktor oleh pakar, yakni didapatkan 20 faktor yang sudah berhasil diinterpretasikan oleh pakar.


2020 ◽  
Vol 4 (2) ◽  
pp. 437 ◽  
Author(s):  
Dito Putro Utomo ◽  
Mesran Mesran

Heart disease is a disease with a high mortality rate, there are 12 million deaths each year worldwide. This is what causes the need for early diagnosis to find out the heart disease. But the process of diagnosis is quite challenging because of the complex relationship between the attributes of heart disease. So it is important to know the main attributes that are used as a decision making process or the classification process in heart disease. In this study the dataset used has 57 types of attributes in it. So that reduction is needed to shorten the diagnostic process, the reduction process can be carried out using the Principal Component Analysis (PCA) method. The PCA method itself can be combined with data mining calcification techniques to measure the accuracy of the dataset. This study compares the accuracy rate using the C5.0 algorithm and the Naïve Bayes Classifier (NBC) algorithm, the results obtained both after and before the reduction are Naïve Bayes Classifier (NBC) algorithms that have better performance than the C5.0 algorithm


2020 ◽  
Vol 8 (5) ◽  
pp. 2488-2493

The technological advancement can help the entire application field to predict the damage and to forecast the future target of the object. The wealth of the world is in the health of the people. So the technology must support the technologists in predicting the disease in advance. The machine learning is the emerging field which is used to forecast the existence of the heart disease through the values of the clinical parameters. With this view, we focus on predicting the customer churn for the banking application. This paper uses the customer churn bank modeling data set extracted from UCI Machine Learning Repository. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data is processed to find the relationship between the elements of the dataset. Second, the data set is applied for Ada Boost regressors and the important elements are identified. Third, the dataset is applied to feature scaling and then fitted to kernel support vector machine, logistic regression classifier, Naive bayes classifier, random forest classifier, decision tree classifier and KNN classifier. Fourth, the dataset is dimensionality reduced with principal component analysis with five components and then applied to the previously mentioned classifiers. Fifth, the performance of the classifiers is analyzed with the indication metrics like precision, accuracy, recall and Fscore. The implementation is carried out with python code using Anaconda Navigator. Experimental results show that, the Naïve bayes classifier is more effective with the precision of 0.90 for dataset with random boost, feature scaled and PCA. Experimental results show that, the Naïve bayes classifier is more effective with the recall of 0.91 for dataset with random boost, feature scaled and PCA. Experimental results show that, the Naïve bayes classifier is more effective with the Fscore of 0.92 for dataset with random boost, feature scaled and PCA. Experimental results show, the Naïve bayes classifier is more effective with the accuracy of 91% without random boost, 93% with random boosting and 92% with principal component analysis.


2016 ◽  
Vol 13 (10) ◽  
pp. 6707-6710
Author(s):  
J Suganthi ◽  
V Malathi

The classification could be a latent variable that is probabilistically relating to the discovered variables. In Bayesian algorithmic ways, logical thinking works in probabilistic mode. However PCM based parallel abductive reasoning with Naïve Bayes (NB) on cancer information could be a powerful technique to perform effective prediction in classification. Whereas whilst classifying the cancer information the strategy reads the parallel changes and predicts the severity level for supplementary treatments. Since the Bayesian classifier gives many premises for several supervised learning algorithms thereby the proposed Parallel abductive Naïve Bayes Classifier algorithm based on factor analysis of PCA enhances the granularity of prediction. The Principal components are chosen on multi-perspective domain of curator analysis dataset. Experimental result shows that it is potential to get parallel abductive classifiers that have comparatively high impact on prediction.


2017 ◽  
Vol 5 (8) ◽  
pp. 260-266
Author(s):  
Subhankar Manna ◽  
Malathi G.

Healthcare industry collects huge amount of unclassified data every day.  For an effective diagnosis and decision making, we need to discover hidden data patterns. An instance of such dataset is associated with a group of metabolic diseases that vary greatly in their range of attributes. The objective of this paper is to classify the diabetic dataset using classification techniques like Naive Bayes, ID3 and k means classification. The secondary objective is to study the performance of various classification algorithms used in this work. We propose to implement the classification algorithm using R package. This work used the dataset that is imported from the UCI Machine Learning Repository, Diabetes 130-US hospitals for years 1999-2008 Data Set. Motivation/Background: Naïve Bayes is a probabilistic classifier based on Bayes theorem. It provides useful perception for understanding many algorithms. In this paper when Bayesian algorithm applied on diabetes dataset, it shows high accuracy. Is assumes variables are independent of each other. In this paper, we construct a decision tree from diabetes dataset in which it selects attributes at each other node of the tree like graph and model, each branch represents an outcome of the test, and each node hold a class attribute. This technique separates observation into branches to construct tree. In this technique tree is split in a recursive way called recursive partitioning. Decision tree is widely used in various areas because it is good enough for dataset distribution. For example, by using ID3 (Decision tree) algorithm we get a result like they are belong to diabetes or not. Method: We will use Naïve Bayes for probabilistic classification and ID3 for decision tree.  Results: The dataset is related to Diabetes dataset. There are 18 columns like – Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. and 623 rows. Naive Bayes Classifier algorithm will be used for getting the probability of having diabetes or not. Here Diabetes is the class for Diabetes data set. There are two conditions “Yes” and “No” and have some personal information about the patient like - Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. We will see the probability that for “Yes” what unit of probability and for “No” what unit of probability which is given bellow. For Example: Gender – Female have 0.4964 for “No” and 0.5581 for “Yes” and for Male 0.5035 is for “No” and 0.4418 for “Yes”. Conclusions: In this paper two algorithms had been implemented Naive Bayes Classifier algorithm and ID3 algorithm. From Naive Bayes Classifier algorithm, the probability of having diabetes has been predicted and from ID3 algorithm a decision tree has been generated.


2015 ◽  
Vol 50 (4) ◽  
pp. 293-296 ◽  
Author(s):  
D Chaki ◽  
A Das ◽  
MI Zaber

The classification of heart disease patients is of great importance in cardiovascular disease diagnosis. Numerous data mining techniques have been used so far by the researchers to aid health care professionals in the diagnosis of heart disease. For this task, many algorithms have been proposed in the previous few years. In this paper, we have studied different supervised machine learning techniques for classification of heart disease data and have performed a procedural comparison of these. We have used the C4.5 decision tree classifier, a naïve Bayes classifier, and a Support Vector Machine (SVM) classifier over a large set of heart disease data. The data used in this study is the Cleveland Clinic Foundation Heart Disease Data Set available at UCI Machine Learning Repository. We have found that SVM outperformed both naïve Bayes and C4.5 classifier, giving the best accuracy rate of correctly classifying highest number of instances. We have also found naïve Bayes classifier achieved a competitive performance though the assumption of normality of the data is strongly violated.Bangladesh J. Sci. Ind. Res. 50(4), 293-296, 2015


2019 ◽  
Vol 13 (01) ◽  
pp. 1886-1891
Author(s):  
Rizal Syarifuddin ◽  
Rosmiati Rosmiati

Kecelakaan laut yang mengakibatkan musibah tenggelamnya kapal laut angkutan barang dan orang diakibatkan salah satunya adalah faktor cuaca. Akses akan informasi perkiraan cuaca menjadi penting sebelum kapten kapal laut memutuskan untuk melakukan pelayaran. Oleh karena itu, penelitian ini bertujuan melakukan penghitungan menggunakan algoritma naïve bayes dalam membantu kapten kapal mengambil keputusan untuk berlayar atau tidak. Penelitian ini dilakukan pada kapal roro penyeberangan laut dari pelabuhan bira Kabupaten Bulukumba ke Pelabuhan Benteng Kepulauan Selayar. Kriteria atau atribut yang digunakan untuk mengklasifikasi diperoleh dari data badan meterologi dan geofisika terkait parameter cuaca seperti angina didaratan dan buih gelombang laut sebagai atribut. Hasil pengujian penghitungan menunjukkan bahwa data set tersebut dapat diimplementasikan pada penghitungan algorithma naïve bayes untuk dipakai mengambil keputusan untuk melakukan pelayaran.


2020 ◽  
Vol 4 (2) ◽  
pp. 318
Author(s):  
Mayya Tania Wewengkang ◽  
Dana Sulistiyo Kusumo ◽  
Widi Astuti

Textbooks and storybooks are the ones used as a source of knowledge. When children read a book, they will try to interpret each word and sentence in it. However, it will be a problem if the book contains vulgar words and indecent sentences. For children at the elementary school level, it is not allowed. For this research, we called that content as gereflekter content. Based on these problems, this research was conducted by building a system to detect gereflekter content in the text of the child's stories that were used as a data set. A system is built by using Naïve Bayes Classifier (NBC) and then evaluated in two scenarios using accuracy, precision, and recall metrics because the characteristics of the data set are imbalanced with the amount of data in the negative class are greater than the data in the positive class. From evaluation results, test scenario produced a high average precision of 99.01%, whereas the recall value has an average of above 50%. From these two values, it can be concluded that the model built by the system has not detected the class properly, but highly trusted when it does.


Sign in / Sign up

Export Citation Format

Share Document